planning - 2026-03-11

Probing Physics Beyond the Standard Model through Combined Analyses of Next-Generation Type Ia Supernova, CMB, and BAO Surveys

Authors:Srinivasan Raghunathan, Ayan Mitra, Nikolina Šarčević, Fei Ge, Corentin Ravoux, Christos Georgiou, Renée Hložek, Richard Kessler, Gautham Narayan, Paul Rogozenski, Paul Shah, Georgios Valogiannis, Joaquin Vieira, the LSST Dark Energy Science Collaboration
Date:2026-03-10 17:59:20

Observations of Type Ia supernovae (SNIa), baryon acoustic oscillations (BAO), and the cosmic microwave background (CMB), which probe the late-, intermediate-, and early-universe epochs, respectively, provide complementary constraints on the expansion history of the Universe. In this work, we forecast constraints on dark energy and other extensions to the standard cosmological model by combining the SNIa sample expected from the Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST), data from current and forthcoming CMB surveys, and BAO measurements from the Dark Energy Spectroscopic Instrument (DESI). For the CMB, we use temperature, polarization, and lensing power spectra ($TT/EE/TE/φφ$) from South Pole Telescope, the planned Advanced Simons Observatory, and a CMB-S4-like experiment. We derive constraints on $Λ{\rm CDM}$ and its extensions involving the dark energy equation of state parameters $(w_{0}, w_{a})$ and the sum of neutrino masses $\sum m_ν$, using a Markov Chain Monte Carlo (MCMC) sampling framework. We find that the LSST Year-3 SNIa sample can improve upon the DES Year-5 dark energy constraints by a factor of $\times2-\times2.5$, with the gains driven primarily by the significantly higher SNIa density in the LSST sample. Similarly, DESI-DR3 shows up to a $\times1.8$ improvement on dark energy parameters over DR2, driven largely by the substantial increase in low-redshift sample. Combining CMB with LSST-Y3-SNIa and DESI-DR3-BAO yields $σ(w_{0}) = 0.028$ and $σ(w_{a}) = 0.11$ for $w_{0} w_{a} {\rm CDM}$ cosmology with the results being largely independent of the CMB dataset. The constraints weaken by 10%-30% when freeing $\sum m_ν$ and spatial curvature. Moreover, the joint analysis of the three datasets can enable a $2-3σ$ detection of $\sum m_ν$.

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

Authors:William Shen, Nishanth Kumar, Sahit Chintalapudi, Jie Wang, Christopher Watson, Edward Hu, Jing Cao, Dinesh Jayaraman, Leslie Pack Kaelbling, Tomás Lozano-Pérez
Date:2026-03-10 17:59:00

We present TiPToP, an extensible modular system that combines pretrained vision foundation models with an existing Task and Motion Planner (TAMP) to solve multi-step manipulation tasks directly from input RGB images and natural-language instructions. Our system aims to be simple and easy-to-use: it can be installed and run on a standard DROID setup in under one hour and adapted to new embodiments with minimal effort. We evaluate TiPToP -- which requires zero robot data -- over 28 tabletop manipulation tasks in simulation and the real world and find it matches or outperforms $π_{0.5}\text{-DROID}$, a vision-language-action (VLA) model fine-tuned on 350 hours of embodiment-specific demonstrations. TiPToP's modular architecture enables us to analyze the system's failure modes at the component level. We analyze results from an evaluation of 173 trials and identify directions for improvement. We release TiPToP open-source to further research on modular manipulation systems and tighter integration between learning and planning. Project website and code: https://tiptop-robot.github.io

AI-Enabled Data-driven Intelligence for Spectrum Demand Estimation

Authors:Colin Brown, Mohamad Alkadamani, Halim Yanikomeroglu
Date:2026-03-10 17:11:36

Accurately forecasting spectrum demand is a key component for efficient spectrum resource allocation and management. With the rapid growth in demand for wireless services, mobile network operators and regulators face increasing challenges in ensuring adequate spectrum availability. This paper presents a data-driven approach leveraging artificial intelligence (AI) and machine learning (ML) to estimate and manage spectrum demand. The approach uses multiple proxies of spectrum demand, drawing from site license data and derived from crowdsourced data. These proxies are validated against real-world mobile network traffic data to ensure reliability, achieving an R$^2$ value of 0.89 for an enhanced proxy. The proposed ML models are tested and validated across five major Canadian cities, demonstrating their generalizability and robustness. These contributions assist spectrum regulators in dynamic spectrum planning, enabling better resource allocation and policy adjustments to meet future network demands.

Going Wide and Deep with Roman: The z~6-9 UV luminosity function in a Roman Deep Field

Authors:Micaela B. Bagley, Steven L. Finkelstein, James Rhoads, Sangeeta Malhotra, L. Y. Aaron Yung, Rachel S. Somerville, Casey Papovich
Date:2026-03-10 15:49:20

We present a trade study of possible ultra-deep surveys with the Nancy Grace Roman Space Telescope, optimizing the depth-area-filter parameter space for high-redshift galaxy science. Using a mock galaxy catalog derived from a 2 sq. degree lightcone created using the Santa Cruz semi-analytic model and populated with over 7.6 million galaxies at 09 and is critical for stellar contamination removal at all redshifts. Based on these results, we recommend that a Roman ultra-deep survey cover at least two Roman pointings (0.56 sq. degrees) with all six filters (R062, Z087, Y106, J129, H158, F184), reducing uncertainties on the rest-UV luminosity density by factors of 2-4 relative to the deepest existing JWST programs. Building off of the Deep Tier of the High Latitude Time Domain Survey to add depth and filter coverage to existing (or planned) observations is an excellent option.

A Hybrid Model-Assisted Approach for Path Loss Prediction in Suburban Scenarios

Authors:Chenlong Wang, Bo Ai, Ruiming Chen, Ruisi He, Mi Yang, Yuxin Zhang, Weirong Liu, Liu Liu
Date:2026-03-10 15:37:20

Accurate path loss prediction is crucial for wireless network planning and optimization in suburban environments with complex terrain variation and diverse land cover. This paper proposes a model assisted hybrid path loss prediction method that introduces an environment adaptive compensation on top of the classic close-in free-space reference distance (CI) path loss model. By jointly predicting the path loss exponent and a compensation term, the proposed approach dynamically adjusts the empirical trend. To improve the effectiveness of environmental representation, three environmental image organization schemes are constructed and evaluated. Experiments on measurement data collected in Pingtan Island show that the proposed method outperforms the CI model and a conventional model assisted baseline, achieving a test root mean square error of 4.04 dB.

LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos

Authors:Lei Shi, Victor Aregbede, Andreas Persson, Martin Längkvist, Amy Loutfi, Stephanie Lowry
Date:2026-03-10 14:48:24

Procedure planning requires a model to predict a sequence of actions that transform a start visual observation into a goal in instructional videos. While most existing methods rely primarily on visual observations as input, they often struggle with the inherent ambiguity where different actions can appear visually similar. In this work, we argue that language descriptions offer a more distinctive representation in the latent space for procedure planning. We introduce Language-Aware Planning (LAP), a novel method that leverages the expressiveness of language to bridge visual observation and planning. LAP uses a finetuned Vision Language Model (VLM) to translate visual observations into text descriptions and to predict actions and extract text embeddings. These text embeddings are more distinctive than visual embeddings and are used in a diffusion model for planning action sequences. We evaluate LAP on three procedure planning benchmarks: CrossTask, Coin, and NIV. LAP achieves new state-of-the-art performance across multiple metrics and time horizons by large margin, demonstrating the significant advantage of language-aware planning.

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

Authors:Vera V. Vishnyakova
Date:2026-03-10 12:58:31

As artificial intelligence (AI) systems evolve from stateless chatbots to autonomous multi-step agents, prompt engineering (PE), the discipline of crafting individual queries, proves necessary but insufficient. This paper introduces context engineering (CE) as a standalone discipline concerned with designing, structuring, and managing the entire informational environment in which an AI agent makes decisions. Drawing on vendor architectures (Google ADK, Anthropic, LangChain), current academic work (ACE framework, Google DeepMind's intelligent delegation), enterprise research (Deloitte, 2026; KPMG, 2026), and the author's experience building a multi-agent system, the paper proposes five context quality criteria: relevance, sufficiency, isolation, economy, and provenance, and frames context as the agent's operating system. Two higher-order disciplines follow. Intent engineering (IE) encodes organizational goals, values, and trade-off hierarchies into agent infrastructure. Specification engineering (SE) creates a machine-readable corpus of corporate policies and standards enabling autonomous operation of multi-agent systems at scale. Together these four disciplines form a cumulative pyramid maturity model of agent engineering, in which each level subsumes the previous one as a necessary foundation. Enterprise data reveals a gap: while 75% of enterprises plan agentic AI deployment within two years (Deloitte, 2026), deployment has surged and retreated as organizations confront scaling complexity (KPMG, 2026). The Klarna case illustrates a dual deficit, contextual and intentional. Whoever controls the agent's context controls its behavior; whoever controls its intent controls its strategy; whoever controls its specifications controls its scale.

Low-Rank Cyclostationarity Predictive Routing Is Almost as Good as Real-Time Data-based Routing

Authors:Oriel-Singer, Ilai-Bistritz, Giseung-Park, Woohyeon-Byeon, Youngchul-Sung, Amir-Leshem
Date:2026-03-10 12:23:53

Dynamic shortest-path routing, using real-time traffic data, enables path selection responsive to evolving conditions. Nevertheless, transportation planning tasks such as adaptive congestion pricing, fleet routing, and long-term operational decisions rely on offline traffic estimators. To address this problem, we develop a spatiotemporal predictor based on a low-rank decomposition of the traffic matrix and the temporal subspace coefficients. Using a recent large-scale measurement campaign over the Seoul road network, we show that our proposed predictor incurs an average excess travel time of less than 1.5 minutes. Moreover, our predictor's tail of the excess travel time distribution matches that of a near-real-time predictor. Results based on one year of traffic data are also demonstrated in simulations.

RESBev: Making BEV Perception More Robust

Authors:Lifeng Zhuo, Kefan Jin, Zhe Liu, Hesheng Wang
Date:2026-03-10 11:36:52

Bird's-eye-view (BEV) perception has emerged as a cornerstone of autonomous driving systems, providing a structured, ego-centric representation critical for downstream planning and control. However, real-world deployment faces challenges from sensor degradation and adversarial attacks, which can cause severe perceptual anomalies and ultimately compromise the safety of autonomous driving systems. To address this, we propose a resilient and plug-and-play BEV perception method, RESBev, which can be easily applied to existing BEV perception methods to enhance their robustness to diverse disturbances. Specifically, we reframe perception robustness as a latent semantic prediction problem. A latent world model is constructed to extract spatiotemporal correlations across sequential BEV observations, thereby learning the underlying BEV state transitions to predict clean BEV features for reconstructing corrupted observations. The proposed framework operates at the semantic feature level of the Lift-Splat-Shoot pipeline, enabling recovery that generalizes across both natural disturbances and adversarial attacks without modifying the underlying backbone. Extensive experiments on the nuScenes dataset demonstrate that, with few-shot fine-tuning, RESBev significantly improves the robustness of existing BEV perception models against various external disturbances and adversarial attacks.

Towards Viewpoint-centric Artifact-based Regulatory Requirements Engineering for Compliance by Design

Authors:Oleksandr Kosenkov
Date:2026-03-10 10:51:40

Processing regulations and resulting requirements to achieve regulatory compliance in software engineering (SE) is a developing challenge due to the continuously growing amount, complexity, and expanding scope of regulations. Despite the growing amount of newly suggested regulatory requirements engineering (RE) approaches by the research community, industry remains under pressure to assure their integration into their RE and overall software development life cycle (SDLC) practices to facilitate a seamless and legally valid compliance by design. As of today, we still have limited empirical understanding of how this can be achieved. Such integration should avoid additional burdens and address the demands of legal knowledge intensity, cross-functional communication and consistency between different involved viewpoints. Intermediary results of this doctoral study showed that regulatory RE has peculiarities distinguishing it from the engineering of other requirements. Oftentimes, organizations establish standalone regulatory RE processes on the organizational level. However, software development teams usually approach compliance by design in an ad-hoc manner, rather than in a systematic way. Among other, because of the complexity of the coordination between the involved viewpoints. The goal of this paper is to report and get feedback about the synthesis and future evaluation of our Artefact Model for Regulatory Requirements Engineering (AM4RRE) for a integrated compliance by design. We hope this paper will spark discussions about regulatory RE and help us refine plans for the final stage of the doctoral study.

EvoDriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation

Authors:Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Wang Zijian, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang, Xianming Liu, Shuchang Zhou Liu, Yang Wang, Shanghang Zhang
Date:2026-03-10 10:19:07

Vision-Language-Action models have shown great promise for autonomous driving, yet they suffer from degraded perception after unfreezing the visual encoder and struggle with accumulated instability in long-term planning. To address these challenges, we propose EvoDriveVLA-a novel collaborative perception-planning distillation framework that integrates self-anchored perceptual constraints and oracle-guided trajectory optimization. Specifically, self-anchored visual distillation leverages self-anchor teacher to deliver visual anchoring constraints, regularizing student representations via trajectory-guided key-region awareness. In parallel, oracle-guided trajectory distillation employs a future-aware oracle teacher with coarse-to-fine trajectory refinement and Monte Carlo dropout sampling to produce high-quality trajectory candidates, thereby selecting the optimal trajectory to guide the student's prediction. EvoDriveVLA achieves SOTA performance in open-loop evaluation and significantly enhances performance in closed-loop evaluation. Our code is available at: https://github.com/hey-cjj/EvoDriveVLA.

Fairness in Robust Unit Commitment Problem Considering Suppression of Renewable Energy

Authors:Ichiro Toyoshima, Pierre-Louis Poirion, Tomohide Yamazaki, Kota Yaguchi, Masayuki Kubota, Ryota Mizutani, Akiko Takeda
Date:2026-03-10 10:18:57

Power company operators make power generation plans one day in advance, in what is known as the Unit Commitment (UC) problem. UC is exposed to uncertainties, such as unknown electricity load and disturbances caused by renewable energy sources, especially PVs. In previous research, we proposed the Renewable Energy Robust Optimization Problem (RE-RP), which solves these uncertainties by considering suppression. In this paper, we propose a new model called RE-RP with fairness (RE-RPfair), which aims to achieve fair allocation among PVs allocation. This model is an expansion of the original RE-RP, and we prove its effectiveness through simulation. To measure the degree of fairness, we use the Gini Index, which is well-known in social science.

Declarative Scenario-based Testing with RoadLogic

Authors:Ezio Bartocci, Alessio Gambi, Felix Gigler, Cristinel Mateis, Dejan Ničković
Date:2026-03-10 10:11:09

Scenario-based testing is a key method for cost-effective and safe validation of autonomous vehicles (AVs). Existing approaches rely on imperative scenario definitions, requiring developers to manually enumerate numerous variants to achieve coverage. Declarative languages, such as OpenSCENARIO DSL (OS2), raise the abstraction level but lack systematic methods for instantiating concrete, specification-compliant scenarios as simulations. To our knowledge, currently, no open-source solution provides this capability. We present RoadLogic that bridges declarative OS2 specifications and executable simulations. It uses Answer Set Programming to generate abstract plans satisfying scenario constraints, motion planning to refine the plans into feasible trajectories, and specification-based monitoring to verify correctness. We evaluate RoadLogic on instantiating representative OS2 scenarios as simulations in the CommonRoad framework. Results show that RoadLogic consistently produces realistic, specification-satisfying simulations within minutes and captures diverse behavioral variants through parameter sampling, thus opening the door to systematic scenario-based testing for autonomous driving systems.

A Guideline-Aware AI Agent for Zero-Shot Target Volume Auto-Delineation

Authors:Yoon Jo Kim, Wonyoung Cho, Jongmin Lee, Han Joo Chae, Hyunki Park, Sang Hoon Seo, Noh Jae Myung, Kyungmi Yang, Dongryul Oh, Jin Sung Kim
Date:2026-03-10 10:00:01

Delineating the clinical target volume (CTV) in radiotherapy involves complex margins constrained by tumor location and anatomical barriers. While deep learning models automate this process, their rigid reliance on expert-annotated data requires costly retraining whenever clinical guidelines update. To overcome this limitation, we introduce OncoAgent, a novel guideline-aware AI agent framework that seamlessly converts textual clinical guidelines into three-dimensional target contours in a training-free manner. Evaluated on esophageal cancer cases, the agent achieves a zero-shot Dice similarity coefficient of 0.842 for the CTV and 0.880 for the planning target volume, demonstrating performance highly comparable to a fully supervised nnU-Net baseline. Notably, in a blinded clinical evaluation, physicians strongly preferred OncoAgent over the supervised baseline, rating it higher in guideline compliance, modification effort, and clinical acceptability. Furthermore, the framework generalizes zero-shot to alternative esophageal guidelines and other anatomical sites (e.g., prostate) without any retraining. Beyond mere volumetric overlap, our agent-based paradigm offers near-instantaneous adaptability to alternative guidelines, providing a scalable and transparent pathway toward interpretability in radiotherapy treatment planning.

Open-World Motion Forecasting

Authors:Nicolas Schischka, Nikhil Gosala, B Ravi Kiran, Senthil Yogamani, Abhinav Valada
Date:2026-03-10 09:35:08

Motion forecasting aims to predict the future trajectories of dynamic agents in the scene, enabling autonomous vehicles to effectively reason about scene evolution. Existing approaches operate under the closed-world regime and assume fixed object taxonomy as well as access to high-quality perception. Therefore, they struggle in real-world settings where perception is imperfect and object taxonomy evolves over time. In this work, we bridge this fundamental gap by introducing open-world motion forecasting, a novel setting in which new object classes are sequentially introduced over time and future object trajectories are estimated directly from camera images. We tackle this setting by proposing the first end-to-end class-incremental motion forecasting framework to mitigate catastrophic forgetting while simultaneously learning to forecast newly introduced classes. When a new class is introduced, our framework employs a pseudo-labeling strategy to first generate motion forecasting pseudo-labels for all known classes which are then processed by a vision-language model to filter inconsistent and over-confident predictions. Parallelly, our approach further mitigates catastrophic forgetting by using a novel replay sampling strategy that leverages query feature variance to sample previous sequences with informative motion patterns. Extensive evaluation on the nuScenes and Argoverse 2 datasets demonstrates that our approach successfully resists catastrophic forgetting and maintains performance on previously learned classes while improving adaptation to novel ones. Further, we demonstrate that our approach supports zero-shot transfer to real-world driving and naturally extends to end-to-end class-incremental planning, enabling continual adaptation of the full autonomous driving system. We provide the code at https://omen.cs.uni-freiburg.de .

From Flow to One Step: Real-Time Multi-Modal Trajectory Policies via Implicit Maximum Likelihood Estimation-based Distribution Distillation

Authors:Ju Dong, Liding Zhang, Lei Zhang, Yu Fu, Kaixin Bai, Zoltan-Csaba Marton, Zhenshan Bing, Zhaopeng Chen, Alois Christian Knoll, Jianwei Zhang
Date:2026-03-10 09:30:05

Generative policies based on diffusion and flow matching achieve strong performance in robotic manipulation by modeling multi-modal human demonstrations. However, their reliance on iterative Ordinary Differential Equation (ODE) integration introduces substantial latency, limiting high-frequency closed-loop control. Recent single-step acceleration methods alleviate this overhead but often exhibit distributional collapse, producing averaged trajectories that fail to execute coherent manipulation strategies. We propose a framework that distills a Conditional Flow Matching (CFM) expert into a fast single-step student via Implicit Maximum Likelihood Estimation (IMLE). A bi-directional Chamfer distance provides a set-level objective that promotes both mode coverage and fidelity, enabling preservation of the teacher multi-modal action distribution in a single forward pass. A unified perception encoder further integrates multi-view RGB, depth, point clouds, and proprioception into a geometry-aware representation. The resulting high-frequency control supports real-time receding-horizon re-planning and improved robustness under dynamic disturbances.

On the topological complexity of non-simply connected spaces

Authors:Yuki Minowa
Date:2026-03-10 09:23:08

Topological complexity is a numerical homotopy invariant that measures the instability of motion planning in a space. To study the topological complexity of non-simply connected spaces, Costa and Farber introduced a cohomology class whose nilpotency gives a lower bound of topological complexity. Farber and Mescher constructed a spectral sequence that evaluates this nilpotency without direct computation. We extend these results with respect to a group homomorphism. As an application, we determine the topological complexity of some 3-manifolds with nonabelian fundamental group.

Interactive 3D visualization of surface roughness predictions in additive manufacturing: A data-driven framework

Authors:Engin Deniz Erkan, Elif Surer, Ulas Yaman
Date:2026-03-10 08:34:09

Surface roughness in Material Extrusion Additive Manufacturing varies across a part and is difficult to anticipate during process planning because it depends on both printing parameters and local surface inclination, which governs the staircase effect. A data-driven framework is presented to predict the arithmetic mean roughness (Ra) prior to fabrication using process parameters and surface angle. A structured experimental dataset was created using a three-level Box-Behnken design: 87 specimens were printed, each with multiple planar faces spanning different inclination angles, yielding 1566 Ra measurements acquired with a contact profilometer. A multilayer perceptron regressor was trained to capture nonlinear relationships between manufacturing conditions, inclination, and Ra. To mitigate limited experimental data, a conditional generative adversarial network was used to generate additional condition-specific tabular samples, thereby improving predictive performance. Model performance was assessed on a hold-out test set. A web-based decision-support interface was also developed to enable interactive process planning by loading a 3D model, specifying printing parameters, and adjusting the part's orientation. The system computes face-wise inclination from the model geometry and visualizes predicted Ra as an interactive colormap over the surface, enabling rapid identification of regions prone to high roughness and immediate comparison of parameter and orientation choices.

Second order asymptotics for the number of times an estimator is more than epsilon from its target value

Authors:Nils Lid Hjort, Grete Fenstad
Date:2026-03-10 07:46:57

Suppose $\{\widehatθ_n\colon n\ge1\}$ is a strongly consistent sequence of estimators for a parameter $θ$, where $\widehatθ_n$ is based on the first $n$ observations. Consider $Q_\varepsilon$, the number of times $|\widehatθ_n-θ|\ge\varepsilon$. In another paper (Hjort and Fenstad, 1992) we have shown that $\varepsilon^2 Q_\varepsilon$ has a limit distribution as $\varepsilon\rightarrow0$, depending only on $σ$, the standard deviation of the limit distribution for $\sqrt{n}(\widehatθ_n-θ)$, under natural regularity conditions. The present paper investigates some second order asymptotics for differences between $Q_\varepsilon$ variables. The limit of ${\rm E}(Q_{1,\varepsilon}-Q_{2,\varepsilon})$ is calculated in cases where ${\rm E} Q_{1,\varepsilon}/{\rm E} Q_{2,\varepsilon}$ goes to 1, leading to a notion of `asymptotic relative deficiency' in cases where the asymptotic relative efficiency is 1. This is used to distinguish between competing estimators with identical limit distributions. Thus using denominator $n-{1\over3}$ in the familiar formula for estimating a normal variance is better than both $n$ and $n-1$ and indeed all other choices, for example, in the sense of leading to the smallest possible expected number of $\varepsilon$ errors. Results of this type are found in a selection of familiar estimation problems, using limit results for expected differences, and are compared to corresponding asymptotic relative deficiency analysis in the sense of Hodges and Lehmann. Some second order distributional results are reached as well. It is shown how $\varepsilon$ times a $Q_\varepsilon$-difference tends to a variable which is related to some exponential distributions associated with Brownian motion, and that have recently been investigated by Hjort and Khasminskii (1993).

See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation

Authors:Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zhihui Li, Salman Khan, Jun Yu, Xiaojun Chang
Date:2026-03-10 07:22:51

Measurement of task progress through explicit, actionable milestones is critical for robust robotic manipulation. This progress awareness enables a model to ground its current task status, anticipate verifiable intermediate states, and detect and recover from failures when progress stalls. To embody this capability, we introduce See, Plan, Rewind (SPR), a progress-aware vision-language-action framework that dynamically grounds language instructions into a sequence of spatial subgoals. SPR operates through a continuous core cycle, Seeing the current state and upcoming milestone, Planning a trajectory towards the next 2D waypoint, and Rewinding to a recoverable state upon failure by monitoring progress against the expected sequence. This closed-loop approach enables robust error correction without requiring additional training data or auxiliary models. Extensive experiments demonstrate the framework's effectiveness, generalization and robustness: SPR outperforms the MolmoAct baseline by 5\% on the LIBERO benchmark. On the challenging LIBERO-Plus benchmark with unseen instructions and initial states, SPR achieves state-of-the-art robustness with the smallest performance drop, surpassing OpenVLA-OFT and UniVLA, demonstrating superior out-of-distribution robustness.

Mock Catalogs of Strongly Lensed Gravitational Waves via A Halo Model Approach with Ground-based Detectors

Authors:Youkai Li, Kai Liao, Mingqi Sun, Lilan Yang, Xuheng Ding, Marek Biesiada, Tonghua Liu
Date:2026-03-10 07:18:19

As plans for the construction of third-generation gravitational wave (GW) detectors advance, research into strongly lensed GWs has become increasingly critical. It is anticipated that hundreds of multi-image lensed GWs will be detected annually. We present a comprehensive suite of lensed GW mock catalog derived from a composite lens mass model incorporating dark matter halos, galaxies, and subhalos. We analyze three source populations with four detector network configurations considering the earth rotation. Our simulations encompass not only conventional doublets and quadruplets but also subhalo-lensed events, highly magnified systems, and complete three or five image systems with a detectable central image, a feature distinct from optical lensing. For the joint ET+CE network, we forecast an annual detection rate of approximately 400 doublets and 36 quadruplets. Notably, this population includes roughly 107 events lensed by subhalos and 20 complete systems with detectable central images. Furthermore, we analyze high-magnification events ($μ> 3$), predicting approximately 360 such cases. Under a more relaxed selection criterion that requires only at least one lensed signal to exceed the detection threshold, we estimate a total of approximately 617 lensed events. We also investigate the impact of variations in lens mass models and stellar evolution models on event rates, as well as the distributions of SNR pairs and time delays. These results establish a more physically grounded statistical prior for the future identification and authentication of lensed GW signals. The Gravitational Waves-Lensing Mock Catalog (GW-LMC) have been made publicly available.

Multi-model approach for autonomous driving: A comprehensive study on traffic sign-, vehicle- and lane detection and behavioral cloning

Authors:Kanishkha Jaisankar, Pranav M. Pawar, Diana Susane Joseph, Raja Muthalagu, Mithun Mukherjee
Date:2026-03-10 06:40:52

Deep learning and computer vision techniques have become increasingly important in the development of self-driving cars. These techniques play a crucial role in enabling self-driving cars to perceive and understand their surroundings, allowing them to safely navigate and make decisions in real-time. Using Neural Networks self-driving cars can accurately identify and classify objects such as pedestrians, other vehicles, and traffic signals. Using deep learning and analyzing data from sensors such as cameras and radar, self-driving cars can predict the likely movement of other objects and plan their own actions accordingly. In this study, a novel approach to enhance the performance of selfdriving cars by using pre-trained and custom-made neural networks for key tasks, including traffic sign classification, vehicle detection, lane detection, and behavioral cloning is provided. The methodology integrates several innovative techniques, such as geometric and color transformations for data augmentation, image normalization, and transfer learning for feature extraction. These techniques are applied to diverse datasets,including the German Traffic Sign Recognition Benchmark (GTSRB), road and lane segmentation datasets, vehicle detection datasets, and data collected using the Udacity selfdriving car simulator to evaluate the model efficacy. The primary objective of the work is to review the state-of-the-art in deep learning and computer vision for self-driving cars. The findings of the work are effective in solving various challenges related to self-driving cars like traffic sign classification, lane prediction, vehicle detection, and behavioral cloning, and provide valuable insights into improving the robustness and reliability of autonomous systems, paving the way for future research and deployment of safer and more efficient self-driving technologies.

RAE-NWM: Navigation World Model in Dense Visual Representation Space

Authors:Mingkun Zhang, Wangtian Shen, Fan Zhang, Haijian Qin, Zihao Pei, Ziyang Meng
Date:2026-03-10 06:16:23

Visual navigation requires agents to reach goals in complex environments through perception and planning. World models address this task by simulating action-conditioned state transitions to predict future observations. Current navigation world models typically learn state evolution under actions within the compressed latent space of a Variational Autoencoder, where spatial compression often discards fine-grained structural information and hinders precise control. To better understand the propagation characteristics of different representations, we conduct a linear dynamics probe and observe that dense DINOv2 features exhibit stronger linear predictability for action-conditioned transitions. Motivated by this observation, we propose the Representation Autoencoder-based Navigation World Model (RAE-NWM), which models navigation dynamics in a dense visual representation space. We employ a Conditional Diffusion Transformer with Decoupled Diffusion Transformer head (CDiT-DH) to model continuous transitions, and introduce a separate time-driven gating module for dynamics conditioning to regulate action injection strength during generation. Extensive evaluations show that modeling sequential rollouts in this space improves structural stability and action accuracy, benefiting downstream planning and navigation.

WESPR: Wind-adaptive Energy-Efficient Safe Perception & Planning for Robust Flight with Quadrotors

Authors:Khuzema Habib, Pranav Deshakulkarni Manjunath, Kasra Torshizi, Troi Williams, Pratap Tokekar
Date:2026-03-10 05:09:43

Local wind conditions strongly influence drone performance: headwinds increase flight time, crosswinds and wind shear hinder agility in cluttered spaces, while tailwinds reduce travel time. Although adaptive controllers can mitigate turbulence, they remain unaware of the surrounding geometry that generates it, preventing proactive avoidance. Existing methods that model how wind interacts with the environment typically rely on computationally expensive fluid dynamics simulations, limiting real-time adaptation to new environments and conditions. To bridge this gap, we present WESPR, a fast framework that predicts how environmental geometry affects local wind conditions, enabling proactive path planning and control adaptation. Our lightweight pipeline integrates geometric perception and local weather data to estimate wind fields, compute cost-efficient paths, and adjust control strategies-all within 10 seconds. We validate WESPR on a Crazyflie drone navigating turbulent obstacle courses. Our results show a 12.5-58.7% reduction in maximum trajectory deviation and a 24.6% improvement in stability compared to a wind-agnostic adaptive controller.

Robust Spatiotemporal Motion Planning for Multi-Agent Autonomous Racing via Topological Gap Identification and Accelerated MPC

Authors:Mingyi Zhang, Cheng Hu, Yiqin Wang, Haotong Qin, Hongye Su, Lei Xie
Date:2026-03-10 04:55:30

High-speed multi-agent autonomous racing demands robust spatiotemporal planning and precise control under strict computational limits. Current methods often oversimplify interactions or abandon strict kinematic constraints. We resolve this by proposing a Topological Gap Identification and Accelerated MPC framework. By predicting opponent behaviors via SGPs, our method constructs dynamic occupancy corridors to robustly select optimal overtaking gaps. We ensure strict kinematic feasibility using a Linear Time-Varying MPC powered by a customized Pseudo-Transient Continuation (PTC) solver for high-frequency execution. Experimental results on the F1TENTH platform show that our method significantly outperforms state-of-the-art baselines: it reduces total maneuver time by 51.6% in sequential scenarios, consistently maintains an overtaking success rate exceeding 81% in dense bottlenecks, and lowers average computational latency by 20.3%, pushing the boundaries of safe and high-speed autonomous racing.

Latent-DARM: Bridging Discrete Diffusion And Autoregressive Models For Reasoning

Authors:Lina Berrayana, Ahmed Heakl, Abdullah Sohail, Thomas Hofmann, Salman Khan, Wei Chen
Date:2026-03-10 04:45:26

Most multi-agent systems rely exclusively on autoregressive language models (ARMs) that are based on sequential generation. Although effective for fluent text, ARMs limit global reasoning and plan revision. On the other hand, Discrete Diffusion Language Models (DDLMs) enable non-sequential, globally revisable generation and have shown strong planning capabilities, but their limited text fluency hinders direct collaboration with ARMs. We introduce Latent-DARM, a latent-space communication framework bridging DDLM (planners) and ARM (executors), maximizing collaborative benefits. Across mathematical, scientific, and commonsense reasoning benchmarks, Latent-DARM outperforms text-based interfaces on average, improving accuracy from 27.0% to 36.0% on DART-5 and from 0.0% to 14.0% on AIME2024. Latent-DARM approaches the results of state-of-the-art reasoning models while using less than 2.2% of its token budget. This work advances multi-agent collaboration among agents with heterogeneous models.

SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation

Authors:Jiahang Liu, Tianyu Xu, Jiawei Chen, Lu Yue, Jiazhao Zhang, Zhiyong Wang, Minghan Li, Qisheng Zhao, Anqi Li, Qi Su, Zhizheng Zhang, He Wang
Date:2026-03-10 03:59:34

Recent embodied navigation approaches leveraging Vision-Language Models (VLMs) demonstrate strong generalization in versatile Vision-Language Navigation (VLN). However, reliable path planning in complex environments remains challenging due to insufficient spatial awareness. In this work, we introduce SPAN-Nav, an end-to-end foundation model designed to infuse embodied navigation with universal 3D spatial awareness using RGB video streams. SPAN-Nav extracts spatial priors across diverse scenes through an occupancy prediction task on extensive indoor and outdoor environments. To mitigate the computational burden, we introduce a compact representation for spatial priors, finding that a single token is sufficient to encapsulate the coarse-grained cues essential for navigation tasks. Furthermore, inspired by the Chain-of-Thought (CoT) mechanism, SPAN-Nav utilizes this single spatial token to explicitly inject spatial cues into action reasoning through an end-to end framework. Leveraging multi-task co-training, SPAN-Nav captures task-adaptive cues from generalized spatial priors, enabling robust spatial awareness to generalize even to the task lacking explicit spatial supervision. To support comprehensive spatial learning, we present a massive dataset of 4.2 million occupancy annotations that covers both indoor and outdoor scenes across multi-type navigation tasks. SPAN-Nav achieves state-of-the-art performance across three benchmarks spanning diverse scenarios and varied navigation tasks. Finally, real-world experiments validate the robust generalization and practical reliability of our approach across complex physical scenarios.

Deep Tabular Research via Continual Experience-Driven Execution

Authors:Junnan Dong, Chuang Zhou, Zheng Yuan, Yifei Yu, Siyu An, Di Yin, Xing Sun, Feiyue Huang
Date:2026-03-10 03:42:54

Large language models often struggle with complex long-horizon analytical tasks over unstructured tables, which typically feature hierarchical and bidirectional headers and non-canonical layouts. We formalize this challenge as Deep Tabular Research (DTR), requiring multi-step reasoning over interdependent table regions. To address DTR, we propose a novel agentic framework that treats tabular reasoning as a closed-loop decision-making process. We carefully design a coupled query and table comprehension for path decision making and operational execution. Specifically, (i) DTR first constructs a hierarchical meta graph to capture bidirectional semantics, mapping natural language queries into an operation-level search space; (ii) To navigate this space, we introduce an expectation-aware selection policy that prioritizes high-utility execution paths; (iii) Crucially, historical execution outcomes are synthesized into a siamese structured memory, i.e., parameterized updates and abstracted texts, enabling continual refinement. Extensive experiments on challenging unstructured tabular benchmarks verify the effectiveness and highlight the necessity of separating strategic planning from low-level execution for long-horizon tabular reasoning.

Agentic AI as a Network Control-Plane Intelligence Layer for Federated Learning over 6G

Authors:Loc X. Nguyen, Ji Su Yoon, Huy Q. Le, Yu Qiao, Avi Deb Raha, Eui-Nam Huh, Nguyen H. Tran, Choong Seon Hong
Date:2026-03-10 03:27:33

The shift toward user-customized on-device learning places new demands on wireless systems: models must be trained on diverse, distributed data while meeting strict latency, bandwidth, and reliability constraints. To address this, we propose an Agentic AI as the control layer for managing federated learning (FL) over 6G networks, which translates high-level task goals into actions that are aware of network conditions. Rather than simply viewing FL as a learning challenge, our system sees it as a combined task of learning and network management. A set of specialized agents focused on retrieval, planning, coding, and evaluation utilizes monitoring tools and optimization methods to handle client selection, incentive structuring, scheduling, resource allocation, adaptive local training, and code generation. The use of closed-loop evaluation and memory allows the system to consistently refine its decisions, taking into account varying signal-to-noise ratios, bandwidth conditions, and device capabilities. Finally, our case study has demonstrated the effectiveness of the Agentic AI system's use of tools for achieving high performance.

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

Authors:Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang Ma
Date:2026-03-10 02:46:47

Existing language-driven embodied navigation paradigms face challenges in functional buildings (FBs) with highly similar features, as they lack the ability to effectively utilize priori spatial knowledge. To tackle this issue, we propose a Priori-Map Guided Embodied Navigation (PM-Nav), wherein environmental maps are transformed into navigation-friendly semantic priori-maps, a hierarchical chain-of-thought prompt template with an annotation priori-map is designed to enable precise path planning, and a multi-model collaborative action output mechanism is built to accomplish positioning decisions and execution control for navigation planning. Comprehensive tests using a home-made FB dataset show that the PM-Nav obtains average improvements of 511\% and 1175\%, and 650\% and 400\% over the SG-Nav and the InstructNav in simulation and real-world, respectively. These tremendous boosts elucidate the great potential of using the PM-Nav as a backbone navigation framework for FBs.