planning - 2026-01-26

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

Authors:Zirui Wang, Junyi Zhang, Jiaxin Ge, Long Lian, Letian Fu, Lisa Dunlap, Ken Goldberg, XuDong Wang, Ion Stoica, David M. Chan, Sewon Min, Joseph E. Gonzalez

Date:2026-01-23 18:43:34

Modern Vision-Language Models (VLMs) remain poorly characterized in multi-step visual interactions, particularly in how they integrate perception, memory, and action over long horizons. We introduce VisGym, a gymnasium of 17 environments for evaluating and training VLMs. The suite spans symbolic puzzles, real-image understanding, navigation, and manipulation, and provides flexible controls over difficulty, input representation, planning horizon, and feedback. We also provide multi-step solvers that generate structured demonstrations, enabling supervised finetuning. Our evaluations show that all frontier models struggle in interactive settings, achieving low success rates in both the easy (46.6%) and hard (26.0%) configurations. Our experiments reveal notable limitations: models struggle to effectively leverage long context, performing worse with an unbounded history than with truncated windows. Furthermore, we find that several text-based symbolic tasks become substantially harder once rendered visually. However, explicit goal observations, textual feedback, and exploratory demonstrations in partially observable or unknown-dynamics settings for supervised finetuning yield consistent gains, highlighting concrete failure modes and pathways for improving multi-step visual decision-making. Code, data, and models can be found at: https://visgym.github.io/.

On the transportation cost norm on finite metric graphs

Authors:Georges Skandalis, Alain Valette

Date:2026-01-23 16:07:02

For a finite metric graph $X=(V,E,\ell)$, where $V$ is endowed with the shortest path metric, we consider the transportation cost problem associated with the distance $d$ on $V$. Namely, for $f$ a function with total sum 0 on $V$, write $f=\sum_{a,b\in V}P(a,b)(δ_a-δ_b)$ where the transportation plan $P$ satisfies $P(a,b)\geq 0$ for $(a,b)\in V\times V$. The cost of $P$ is $W(P):=\sum_{a,b\in V}P(a,b)d(a,b)$ and the transportation norm of $f$ is $\|f\|_{TC}=\min_P W(P)$ where $P$ runs over all transportation plans for $f$. In this semi-survey paper, we give short proofs for the following statements: 1)There always exists an optimal transportation plan supported in $V_+\times V_-$ where $V_+=\{x\in V: f(x)>0\}$ and $V_-=\{x\in V: f(x)<0\}$. If $X$ is a metric tree, we may moreover assume that this plan involves at most $|Supp(f)|-1$ transports. 2) There always exists an optimal transportation plan supported in the set of edges of $X$. 3) Better, there always exists an optimal transportation plan supported in some spanning tree of $X$. We use this to reprove known formulae for the transportation norm when $X$ is either a tree or a cycle.

LUMINA: Long-horizon Understanding for Multi-turn Interactive Agents

Authors:Amin Rakhsha, Thomas Hehn, Pietro Mazzaglia, Fabio Valerio Massoli, Arash Behboodi, Tribhuvanesh Orekondy

Date:2026-01-23 11:13:12

Large language models can perform well on many isolated tasks, yet they continue to struggle on multi-turn, long-horizon agentic problems that require skills such as planning, state tracking, and long context processing. In this work, we aim to better understand the relative importance of advancing these underlying capabilities for success on such tasks. We develop an oracle counterfactual framework for multi-turn problems that asks: how would an agent perform if it could leverage an oracle to perfectly perform a specific task? The change in the agent's performance due to this oracle assistance allows us to measure the criticality of such oracle skill in the future advancement of AI agents. We introduce a suite of procedurally generated, game-like tasks with tunable complexity. These controlled environments allow us to provide precise oracle interventions, such as perfect planning or flawless state tracking, and make it possible to isolate the contribution of each oracle without confounding effects present in real-world benchmarks. Our results show that while some interventions (e.g., planning) consistently improve performance across settings, the usefulness of other skills is dependent on the properties of the environment and language model. Our work sheds light on the challenges of multi-turn agentic environments to guide the future efforts in the development of AI agents and language models.

Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab

Authors:Julius Beerwerth, Jianye Xu, Simon Schäfer, Fynn Belderink, Bassam Alrifaee

Date:2026-01-23 09:26:36

We present a reproducible benchmark for evaluating sim-to-real transfer of Multi-Agent Reinforcement Learning (MARL) policies for Connected and Automated Vehicles (CAVs). The platform, based on the Cyber-Physical Mobility Lab (CPM Lab) [1], integrates simulation, a high-fidelity digital twin, and a physical testbed, enabling structured zero-shot evaluation of MARL motion-planning policies. We demonstrate its use by deploying a SigmaRL-trained policy [2] across all three domains, revealing two complementary sources of performance degradation: architectural differences between simulation and hardware control stacks, and the sim-to-real gap induced by increasing environmental realism. The open-source setup enables systematic analysis of sim-to-real challenges in MARL under realistic, reproducible conditions.

Bayesian Experimental Design for Model Discrepancy Calibration: A Rivalry between Kullback--Leibler Divergence and Wasserstein Distance

Authors:Huchen Yang, Xinghao Dong, Jin-Long Wu

Date:2026-01-23 03:34:10

Designing experiments that systematically gather data from complex physical systems is central to accelerating scientific discovery. While Bayesian experimental design (BED) provides a principled, information-based framework that integrates experimental planning with probabilistic inference, the selection of utility functions in BED is a long-standing and active topic, where different criteria emphasize different notions of information. Although Kullback--Leibler (KL) divergence has been one of the most common choices, recent studies have proposed Wasserstein distance as an alternative. In this work, we first employ a toy example to illustrate an issue of Wasserstein distance - the value of Wasserstein distance of a fixed-shape posterior depends on the relative position of its main mass within the support and can exhibit false rewards unrelated to information gain, especially with a non-informative prior (e.g., uniform distribution). We then further provide a systematic comparison between these two criteria through a classical source inversion problem in the BED literature, revealing that the KL divergence tends to lead to faster convergence in the absence of model discrepancy, while Wasserstein metrics provide more robust sequential BED results if model discrepancy is non-negligible. These findings clarify the trade-offs between KL divergence and Wasserstein metrics for the utility function and provide guidelines for selecting suitable criteria in practical BED applications.

RENEW: Risk- and Energy-Aware Navigation in Dynamic Waterways

Authors:Mingi Jeong, Alberto Quattrini Li

Date:2026-01-23 03:33:52

We present RENEW, a global path planner for Autonomous Surface Vehicle (ASV) in dynamic environments with external disturbances (e.g., water currents). RENEW introduces a unified risk- and energy-aware strategy that ensures safety by dynamically identifying non-navigable regions and enforcing adaptive safety constraints. Inspired by maritime contingency planning, it employs a best-effort strategy to maintain control under adverse conditions. The hierarchical architecture combines high-level constrained triangulation for topological diversity with low-level trajectory optimization within safe corridors. Validated with real-world ocean data, RENEW is the first framework to jointly address adaptive non-navigability and topological path diversity for robust maritime navigation.

Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture

Authors:Beining Wu, Zihao Ding, Leo Ostigaard, Jun Huang

Date:2026-01-23 02:33:14

Coverage Path Planning (CPP) is a fundamental capability for agricultural robots; however, existing solutions often overlook energy constraints, resulting in incomplete operations in large-scale or resource-limited environments. This paper proposes an energy-aware CPP framework grounded in Soft Actor-Critic (SAC) reinforcement learning, designed for grid-based environments with obstacles and charging stations. To enable robust and adaptive decision-making under energy limitations, the framework integrates Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for temporal dynamics. A dedicated reward function is designed to jointly optimize coverage efficiency, energy consumption, and return-to-base constraints. Experimental results demonstrate that the proposed approach consistently achieves over 90% coverage while ensuring energy safety, outperforming traditional heuristic algorithms such as Rapidly-exploring Random Tree (RRT), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO) baselines by 13.4-19.5% in coverage and reducing constraint violations by 59.9-88.3%. These findings validate the proposed SAC-based framework as an effective and scalable solution for energy-constrained CPP in agricultural robotics.

Long-Term Probabilistic Forecast of Vegetation Conditions Using Climate Attributes in the Four Corners Region

Authors:Erika McPhillips, Hyeongseong Lee, Xiangyu Xie, Kathy Baylis, Chris Funk, Mengyang Gu

Date:2026-01-22 22:10:29

Weather conditions can drastically alter the state of crops and rangelands, and in turn, impact the incomes and food security of individuals worldwide. Satellite-based remote sensing offers an effective way to monitor vegetation and climate variables on regional and global scales. The annual peak Normalized Difference Vegetation Index (NDVI), derived from satellite observations, is closely associated with crop development, rangeland biomass, and vegetation growth. Although various machine learning methods have been developed to forecast NDVI over short time ranges, such as one-month-ahead predictions, long-term forecasting approaches, such as one-year-ahead predictions of vegetation conditions, are not yet available. To fill this gap, we develop a two-phase machine learning model to forecast the one-year-ahead peak NDVI over high-resolution grids, using the Four Corners region of the Southwestern United States as a testbed. In phase one, we identify informative climate attributes, including precipitation and maximum vapor pressure deficit, and develop the generalized parallel Gaussian process that captures the relationship between climate attributes and NDVI. In phase two, we forecast these climate attributes using historical data at least one year before the NDVI prediction month, which then serve as inputs to forecast the peak NDVI at each spatial grid. We developed open-source tools that outperform alternative methods for both gross NDVI and grid-based NDVI one-year forecasts, providing information that can help farmers and ranchers make actionable plans a year in advance.

DSGym: A Holistic Framework for Evaluating and Training Data Science Agents

Authors:Fan Nie, Junlin Wang, Harper Hua, Federico Bianchi, Yongchan Kwon, Zhenting Qi, Owen Queen, Shang Zhu, James Zou

Date:2026-01-22 22:03:29

Data science agents promise to accelerate discovery and insight-generation by turning data into executable analyses and findings. Yet existing data science benchmarks fall short due to fragmented evaluation interfaces that make cross-benchmark comparison difficult, narrow task coverage and a lack of rigorous data grounding. In particular, we show that a substantial portion of tasks in current benchmarks can be solved without using the actual data. To address these limitations, we introduce DSGym, a standardized framework for evaluating and training data science agents in self-contained execution environments. Unlike static benchmarks, DSGym provides a modular architecture that makes it easy to add tasks, agent scaffolds, and tools, positioning it as a live, extensible testbed. We curate DSGym-Tasks, a holistic task suite that standardizes and refines existing benchmarks via quality and shortcut solvability filtering. We further expand coverage with (1) DSBio: expert-derived bioinformatics tasks grounded in literature and (2) DSPredict: challenging prediction tasks spanning domains such as computer vision, molecular prediction, and single-cell perturbation. Beyond evaluation, DSGym enables agent training via execution-verified data synthesis pipeline. As a case study, we build a 2,000-example training set and trained a 4B model in DSGym that outperforms GPT-4o on standardized analysis benchmarks. Overall, DSGym enables rigorous end-to-end measurement of whether agents can plan, implement, and validate data analyses in realistic scientific context.

proto-Lightspeed: a high-speed, ultra-low read noise imager on the Magellan Clay Telescope

Authors:Christopher Layden, Kevin Burdge, Gabor Furesz, Juliana Garcia-Mejia, Jack Dinsmore, Geoffrey Mo, David Osip, John J. Piotrowski, Roger W. Romani, August Berne, Deepro Chakrabarty, Emma Chickles

Date:2026-01-22 19:03:47

proto-Lightspeed is a new instrument that has been commissioned on the Nasmyth East port of the Magellan Clay Telescope at Las Campanas Observatory to deliver high-speed optical imaging with deep sub-electron read noise. Making use of commercial re-imaging lenses and the ORCA-Quest 2 camera from Hamamatsu, proto-Lightspeed images a field $1'$ in diameter at up to $200$ Hz or windowed fields at higher rates, up to 6600 Hz for a $1.6''\times 1'$ field of view. proto-Lightspeed delivers seeing-limited image quality in the $g'$, $r'$, and $i'$ bands and adjustable magnification for pixel scales between $0.017''-0.050''$. proto-Lightspeed is well suited to studying compact binary systems, exoplanet transits, rapid flaring associated with accretion, periodic optical emission from pulsars, occultations of background stars by small trans-Neptunian Objects, and any other rapidly variable source. proto-Lightspeed will be a P.I. instrument beginning in 2026B, available for use by members of the Magellan Consortium. In this paper, we discuss the design and performance of the instrument, results from its two commissioning runs, and plans for a facility instrument, Lightspeed, to support simultaneous multicolor imaging across a $7'\times4'$ field.

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Authors:Moo Jin Kim, Yihuai Gao, Tsung-Yi Lin, Yen-Chen Lin, Yunhao Ge, Grace Lam, Percy Liang, Shuran Song, Ming-Yu Liu, Chelsea Finn, Jinwei Gu

Date:2026-01-22 18:09:30

Recent video generation models demonstrate remarkable ability to capture complex physical interactions and scene evolution over time. To leverage their spatiotemporal priors, robotics works have adapted video models for policy learning but introduce complexity by requiring multiple stages of post-training and new architectural components for action generation. In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effective robot policy through a single stage of post-training on the robot demonstration data collected on the target platform, with no architectural modifications. Cosmos Policy learns to directly generate robot actions encoded as latent frames within the video model's latent diffusion process, harnessing the model's pretrained priors and core learning algorithm to capture complex action distributions. Additionally, Cosmos Policy generates future state images and values (expected cumulative rewards), which are similarly encoded as latent frames, enabling test-time planning of action trajectories with higher likelihood of success. In our evaluations, Cosmos Policy achieves state-of-the-art performance on the LIBERO and RoboCasa simulation benchmarks (98.5% and 67.1% average success rates, respectively) and the highest average score in challenging real-world bimanual manipulation tasks, outperforming strong diffusion policies trained from scratch, video model-based policies, and state-of-the-art vision-language-action models fine-tuned on the same robot demonstrations. Furthermore, given policy rollout data, Cosmos Policy can learn from experience to refine its world model and value function and leverage model-based planning to achieve even higher success rates in challenging tasks. We release code, models, and training data at https://research.nvidia.com/labs/dir/cosmos-policy/

Bivariate topological complexity: a framework for coordinated motion planning

Authors:Jose Manuel Garcia Calcines, Jose Antonio Vilches Alarcon

Date:2026-01-22 15:51:39

We introduce a bivariate version of topological complexity, $\mathrm{TC}(f,g)$, associated with two continuous maps $f\colon X\to Z$ and $g\colon Y\to Z$. This invariant measures the minimal number of continuous motion planning rules required to coordinate trajectories in $X$ and $Y$ through a shared target space $Z$. It recovers Farber's classical topological complexity when $f=g=\mathrm{id}_X$ and Pavešić's map-based invariant when one of the maps is the identity. We develop a structural theory for $\mathrm{TC}(f,g)$, including symmetry, product inequalities, stability properties, and a collaboration principle showing that, when one of the maps is a fibration, the complexity of synchronization is controlled by the other. We also introduce a homotopy-invariant bivariate complexity $\mathrm{TC}_H(f,g)$ of Scott type, defined via homotopic distance, and study its relationship with the strict invariant. Concrete examples reveal rigidity phenomena with no analogue in the classical case, including strict gaps between $\mathrm{TC}(f,g)$ and $\mathrm{TC}_H(f,g)$ and situations where synchronization becomes impossible. Cohomological estimates provide computable obstructions in both the strict and homotopy-invariant settings.

AgriPINN: A Process-Informed Neural Network for Interpretable and Scalable Crop Biomass Prediction Under Water Stress

Authors:Yue Shi, Liangxiu Han, Xin Zhang, Tam Sobeih, Thomas Gaiser, Nguyen Huu Thuy, Dominik Behrend, Amit Kumar Srivastava, Krishnagopal Halder, Frank Ewert

Date:2026-01-22 15:20:00

Accurate prediction of crop above-ground biomass (AGB) under water stress is critical for monitoring crop productivity, guiding irrigation, and supporting climate-resilient agriculture. Data-driven models scale well but often lack interpretability and degrade under distribution shift, whereas process-based crop models (e.g. DSSAT, APSIM, LINTUL5) require extensive calibration and are difficult to deploy over large spatial domains. To address these limitations, we propose AgriPINN, a process-informed neural network that integrates a biophysical crop-growth differential equation as a differentiable constraint within a deep learning backbone. This design encourages physiologically consistent biomass dynamics under water-stress conditions while preserving model scalability for spatially distributed AGB prediction. AgriPINN recovers latent physiological variables, including leaf area index (LAI), absorbed photosynthetically active radiation (PAR), radiation use efficiency (RUE), and water-stress factors, without requiring direct supervision. We pretrain AgriPINN on 60 years of historical data across 397 regions in Germany and fine-tune it on three years of field experiments under controlled water treatments. Results show that AgriPINN consistently outperforms state-of-the-art deep-learning baselines (ConvLSTM-ViT, SLTF, CNN-Transformer) and the process-based LINTUL5 model in terms of accuracy (RMSE reductions up to $43\%$) and computational efficiency. By combining the scalability of deep learning with the biophysical rigor of process-based modeling, AgriPINN provides a robust and interpretable framework for spatio-temporal AGB prediction, offering practical value for planning of irrigation infrastructure, yield forecasting, and climate-adaptation planning.

Time-Optimal Switching Surfaces for Triple Integrator under Full Box Constraints

Authors:Yunan Wang, Chuxiong Hu, Zhao Jin

Date:2026-01-22 14:28:37

Time-optimal control for triple integrator under full box constraints is a fundamental problem in the field of optimal control, which has been widely applied in the industry. However, scenarios involving asymmetric constraints, non-stationary boundary conditions, and active position constraints pose significant challenges. This paper provides a complete characterization of time-optimal switching surfaces for the problem, leading to novel insights into the geometric structure of the optimal control. The active condition of position constraints is derived, which is absent from the literature. An efficient algorithm is proposed, capable of planning time-optimal trajectories under asymmetric full constraints and arbitrary boundary states, with a 100% success rate. Computational time for each trajectory is within approximately 10$μ$s, achieving a 5-order-of-magnitude reduction compared to optimization-based baselines.

NL4ST: A Natural Language Query Tool for Spatio-Temporal Databases

Authors:Xieyang Wang, Mengyi Liu, Weijia Yi, Jianqiu Xu, Raymond Chi-Wing Wong

Date:2026-01-22 08:48:32

The advancement of mobile computing devices and positioning technologies has led to an explosive growth of spatio-temporal data managed in databases. Representative queries over such data include range queries, nearest neighbor queries, and join queries. However, formulating those queries usually requires domain-specific expertise and familiarity with executable query languages, which would be a challenging task for non-expert users. It leads to a great demand for well-supported natural language queries (NLQs) in spatio-temporal databases. To bridge the gap between non-experts and query plans in databases, we present NL4ST, an interactive tool that allows users to query spatio-temporal databases in natural language. NL4ST features a three-layer architecture: (i) knowledge base and corpus for knowledge preparation, (ii) natural language understanding for entity linking, and (iii) generating physical plans. Our demonstration will showcase how NL4ST provides effective spatio-temporal physical plans, verified by using four real and synthetic datasets. We make NL4ST online and provide the demo video at https://youtu.be/-J1R7R5WoqQ.

Materealize: a multi-agent deliberation system for end-to-end material design and synthesis

Authors:Seongmin Kim, Jaehwan Choi, Kunik Jang, Junkil Park, Varinia Bernales, Alán Aspuru-Guzik, Yousung Jung

Date:2026-01-22 08:13:19

We propose Materealize, a multi-agent system for end-to-end inorganic materials design and synthesis that orchestrates core domain tools spanning structure generation, property prediction, synthesizability prediction, and synthesis planning within a single unified framework. Through a natural-language interface, Materealize enables non-experts to access computational materials workflows and obtain experimentally actionable outputs for material realization. Materealize provides two complementary modes. In instant mode, the system rapidly composes connected tools to solve diverse inorganic tasks-including property-conditioned synthesizable candidate design with synthesis recipes, diagnosis, and redesign of unsynthesizable structures, and synthesizable data augmentation-within a few minutes. In thinking mode, Materealize applies multi-agent debate to deliver more refined and information-rich synthesis recommendations, including reasoning- and model-driven synthesis routes and mechanistic hypotheses. The mechanistic hypotheses are validated by direct comparison with the literature for known mechanisms and further supported by physics-grounded simulations for novel synthesis pathways. By combining tool-level accuracy with reasoning-level integration, Materealize can bridge the gap between computational discovery and practical experimental realization.

DualShield: Safe Model Predictive Diffusion via Reachability Analysis for Interactive Autonomous Driving

Authors:Rui Yang, Lei Zheng, Ruoyu Yao, Jun Ma

Date:2026-01-22 07:56:36

Diffusion models have emerged as a powerful approach for multimodal motion planning in autonomous driving. However, their practical deployment is typically hindered by the inherent difficulty in enforcing vehicle dynamics and a critical reliance on accurate predictions of other agents, making them prone to safety issues under uncertain interactions. To address these limitations, we introduce DualShield, a planning and control framework that leverages Hamilton-Jacobi (HJ) reachability value functions in a dual capacity. First, the value functions act as proactive guidance, steering the diffusion denoising process towards safe and dynamically feasible regions. Second, they form a reactive safety shield using control barrier-value functions (CBVFs) to modify the executed actions and ensure safety. This dual mechanism preserves the rich exploration capabilities of diffusion models while providing principled safety assurance under uncertain and even adversarial interactions. Simulations in challenging unprotected U-turn scenarios demonstrate that DualShield significantly improves both safety and task efficiency compared to leading methods from different planning paradigms under uncertainty.

Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Authors:Zhitao He, Zongwei Lyu, Yi R Fung

Date:2026-01-22 07:36:48

Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) pipeline that models reviewer mental state, formulates persuasion strategy, and generates strategy-grounded response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations. Disclaimer: the generated rebuttal content is for reference only to inspire authors and assist in drafting. It is not intended to replace the author's own critical analysis and response.

Connect the Dots: Knowledge Graph-Guided Crawler Attack on Retrieval-Augmented Generation Systems

Authors:Mengyu Yao, Ziqi Zhang, Ning Luo, Shaofei Li, Yifeng Cai, Xiangqun Chen, Yao Guo, Ding Li

Date:2026-01-22 05:59:42

Retrieval-augmented generation (RAG) systems integrate document retrieval with large language models and have been widely adopted. However, in privacy-related scenarios, RAG introduces a new privacy risk: adversaries can issue carefully crafted queries to exfiltrate sensitive content from the underlying corpus gradually. Although recent studies have demonstrated multi-turn extraction attacks, they rely on heuristics and fail to perform long-term extraction planning. To address these limitations, we formulate the RAG extraction attack as an adaptive stochastic coverage problem (ASCP). In ASCP, each query is treated as a probabilistic action that aims to maximize conditional marginal gain (CMG), enabling principled long-term planning under uncertainty. However, integrating ASCP with practical RAG attack faces three key challenges: unobservable CMG, intractability in the action space, and feasibility constraints. To overcome these challenges, we maintain a global attacker-side state to guide the attack. Building on this idea, we introduce RAGCRAWLER, which builds a knowledge graph to represent revealed information, uses this global state to estimate CMG, and plans queries in semantic space that target unretrieved regions. In comprehensive experiments across diverse RAG architectures and datasets, our proposed method, RAGCRAWLER, consistently outperforms all baselines. It achieves up to 84.4% corpus coverage within a fixed query budget and deliver an average improvement of 20.7% over the top-performing baseline. It also maintains high semantic fidelity and strong content reconstruction accuracy with low attack cost. Crucially, RAGCRAWLER proves its robustness by maintaining effectiveness against advanced RAG systems employing query rewriting and multi-query retrieval strategies. Our work reveals significant security gaps and highlights the pressing need for stronger safeguards for RAG.

From Generative Engines to Actionable Simulators: The Imperative of Physical Grounding in World Models

Authors:Zhikang Chen, Tingting Zhu

Date:2026-01-21 23:35:33

A world model is an AI system that simulates how an environment evolves under actions, enabling planning through imagined futures rather than reactive perception. Current world models, however, suffer from visual conflation: the mistaken assumption that high-fidelity video generation implies an understanding of physical and causal dynamics. We show that while modern models excel at predicting pixels, they frequently violate invariant constraints, fail under intervention, and break down in safety-critical decision-making. This survey argues that visual realism is an unreliable proxy for world understanding. Instead, effective world models must encode causal structure, respect domain-specific constraints, and remain stable over long horizons. We propose a reframing of world models as actionable simulators rather than visual engines, emphasizing structured 4D interfaces, constraint-aware dynamics, and closed-loop evaluation. Using medical decision-making as an epistemic stress test, where trial-and-error is impossible and errors are irreversible, we demonstrate that a world model's value is determined not by how realistic its rollouts appear, but by its ability to support counterfactual reasoning, intervention planning, and robust long-horizon foresight.

NWQWorkflow: The Northwest Quantum Workflow

Authors:Ang Li

Date:2026-01-21 23:15:27

This whitepaper presents NWQWorkflow, an end-to-end workflow for quantum application development, compilation, error correction, benchmarking, numerical simulation, control, and execution on a prototype superconducting testbed. NWQWorkflow integrates NWQStudio (programming GUI environment), NWQASM (intermediate representation), QASMTrans (compiler), NWQEC (quantum error correction), QASMBench (benchmarking and characterization), NWQSim (HPC simulation), NWQLib (algorithm library), NWQData (data sets), NWQControl (quantum control), and NWQSC (superconducting testbed). The system enables closed-loop software-hardware co-design and reflects the past eight years of quantum computing research the author has led at PNNL (2018-2026). By releasing most software components as open source or planning their open-source availability, we aim to cultivate a collaborative quantum information science (QIS) ecosystem and support the transition toward a scalable quantum supercomputing era.

Complexity analysis and practical resolution of the data classification problem with private characteristics

Authors:David Pantoja, Ismael Rodriguez, Fernando Rubio, Clara Segura

Date:2026-01-21 16:55:32

In this work we analyze the problem of, given the probability distribution of a population, questioning an unknown individual that is representative of the distribution so that our uncertainty about certain characteristics is significantly reduced -but the uncertainty about others, deemed private or sensitive, is not. Thus, the goal of the problem is extracting information being relevant to a legitimate purpose while preserving the privacy of individuals, which is crucial to enable non-intrusive selection processes in several areas. For instance, it is essential in the design of non-discriminatory personnel selection, promotion, and layoff processes in companies and institutions; in the retrieval of customer information being relevant to the service provided by a company (and no more); in certifications not revealing sensitive industrial information being irrelevant for the certification itself; etc. Interactive questioning processes are constructed for this purpose, which requires generalizing the notion of decision trees to account the amount of desired and undesired information retrieved for each branch of the plan. Our findings about this problem are both theoretical and practical: on the one hand, we prove its NP-completeness by a reduction from the Set Cover problem; and on the other hand, given this intractability, we provide heuristic solutions to find reasonable solutions in affordable time. In particular, a greedy algorithm and two genetic algorithms are presented. Our experiments indicate that the best results are obtained using a genetic algorithm reinforced with a greedy strategy.

V-CAGE: Context-Aware Generation and Verification for Scalable Long-Horizon Embodied Tasks

Authors:Yaru Liu, Ao-bo Wang, Nanyang Ye

Date:2026-01-21 16:41:51

Learning long-horizon embodied behaviors from synthetic data remains challenging because generated scenes are often physically implausible, language-driven programs frequently "succeed" without satisfying task semantics, and high-level instructions require grounding into executable action sequences. To address these limitations, we introduce V-CAGE, a closed-loop framework for generating robust, semantically aligned manipulation datasets at scale. First, we propose a context-aware instantiation mechanism that enforces geometric consistency during scene synthesis. By dynamically maintaining a map of prohibited spatial areas as objects are placed, our system prevents interpenetration and ensures reachable, conflict-free configurations in cluttered environments. Second, to bridge the gap between abstract intent and low-level control, we employ a hierarchical instruction decomposition module. This decomposes high-level goals (e.g., "get ready for work") into compositional action primitives, facilitating coherent long-horizon planning. Crucially, we enforce semantic correctness through a VLM-based verification loop. Acting as a visual critic, the VLM performs rigorous rejection sampling after each subtask, filtering out "silent failures" where code executes but fails to achieve the visual goal. Experiments demonstrate that V-CAGE yields datasets with superior physical and semantic fidelity, significantly boosting the success rate and generalization of downstream policies compared to non-verified baselines.

Stochastic EMS for Optimal 24/7 Carbon-Free Energy Operations

Authors:Natanon Tongamrak, Kannapha Amaruchkul, Wijarn Wangdee, Jitkomut Songsiri

Date:2026-01-21 16:10:10

This paper proposes a two-stage stochastic optimization formulation to determine optimal operation and procurement plans for achieving a 24/7 carbon-free energy (CFE) compliance at minimized cost. The system in consideration follows primary energy technologies in Thailand including solar power, battery storage, and a diverse portfolio of renewable and carbon-based energy procurement sources. Unlike existing literature focused on long-term planning, this study addresses near real-time operations using a 15-minute resolution. A novel feature of the formulation is the explicit treatment of CFE compliance as a model parameter, enabling flexible targets such as a minimum percentage of hourly matching or a required number of carbon-free days within a multi-day horizon. The mixed-integer linear programming formulation accounts for uncertainties in load and solar generation by integrating deep learning-based forecasting within a receding horizon framework. By optimizing battery profiles and multi-source procurement simultaneously, the proposed system provides a feasible pathway for transitioning to carbon-free operations in emerging energy markets.

Vehicle Routing with Finite Time Horizon using Deep Reinforcement Learning with Improved Network Embedding

Authors:Ayan Maity, Sudeshna Sarkar

Date:2026-01-21 16:05:04

In this paper, we study the vehicle routing problem with a finite time horizon. In this routing problem, the objective is to maximize the number of customer requests served within a finite time horizon. We present a novel routing network embedding module which creates local node embedding vectors and a context-aware global graph representation. The proposed Markov decision process for the vehicle routing problem incorporates the node features, the network adjacency matrix and the edge features as components of the state space. We incorporate the remaining finite time horizon into the network embedding module to provide a proper routing context to the embedding module. We integrate our embedding module with a policy gradient-based deep Reinforcement Learning framework to solve the vehicle routing problem with finite time horizon. We trained and validated our proposed routing method on real-world routing networks, as well as synthetically generated Euclidean networks. Our experimental results show that our method achieves a higher customer service rate than the existing routing methods. Additionally, the solution time of our method is significantly lower than that of the existing methods.

From carbon management strategies to implementation: Modeling and physical simulation of CO2 pipeline infrastructure -- a case study for Germany

Authors:Mehrnaz Anvari, Marius Neuwirth, Okan Akca, Luna Lütz, Simon Lukas Bussmann, Tobias Fleiter, Bernhard Klaassen

Date:2026-01-21 15:30:25

Carbon capture and storage or utilization (CCUS) will play an important role to achieve climate neutrality in many economies. Pipelines are widely regarded as the most efficient means of CO2 transport; however, they are currently non-existent. Policy-makers and companies need to develop large-scale infrastructure under substantial uncertainty. Methods and analyses are needed to support pipeline planning and strategy development. This paper presents an integrated method for designing CO2 pipeline networks by combining energy system scenarios with physical network simulation. Using Germany as a case study, we derive spatially highly resolved CO2 balances to develop a dense-phase CO2 pipeline topology that follows existing gas pipeline corridors. The analyzed system includes existing sites for cement and lime production, waste incineration, carbon users, four coastal CO2 hubs, and border crossing points. We then apply the multiphysical network simulator MYNTS to assess the technical feasibility of this network. We determine pipeline diameters, pump locations, and operating conditions that ensure stable dense-phase transport. The method explicitly accounts for elevation and possible impurities. The results indicate that a system of about 7000 km pipeline length and a mixed normed diameter of DN700 on main corridors and of DN500/DN400 on branches presents a feasible solution to connect most sites. Investment costs for the optimized pipeline system are calculated to be about 17 billion Euros. The method provides a reproducible framework and is transferable to other countries and to European scope.

ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data

Authors:Marian Renz, Martin Günther, Felix Igelbrink, Oscar Lima, Martin Atzmueller

Date:2026-01-21 14:27:38

While deep learning has significantly advanced robotic object recognition, purely data-driven approaches often lack semantic consistency and fail to leverage valuable, pre-existing knowledge about the environment. This report presents the ExPrIS project, which addresses this challenge by investigating how knowledge-level expectations can serve as to improve object interpretation from sensor data. Our approach is based on the incremental construction of a 3D Semantic Scene Graph (3DSSG). We integrate expectations from two sources: contextual priors from past observations and semantic knowledge from external graphs like ConceptNet. These are embedded into a heterogeneous Graph Neural Network (GNN) to create an expectation-biased inference process. This method moves beyond static, frame-by-frame analysis to enhance the robustness and consistency of scene understanding over time. The report details this architecture, its evaluation, and outlines its planned integration on a mobile robotic platform.

Risk Estimation for Automated Driving

Authors:Leon Tolksdorf, Arturo Tejada, Jonas Bauernfeind, Christian Birkner, Nathan van de Wouw

Date:2026-01-21 14:14:50

Safety is a central requirement for automated vehicles. As such, the assessment of risk in automated driving is key in supporting both motion planning technologies and safety evaluation. In automated driving, risk is characterized by two aspects. The first aspect is the uncertainty on the state estimates of other road participants by an automated vehicle. The second aspect is the severity of a collision event with said traffic participants. Here, the uncertainty aspect typically causes the risk to be non-zero for near-collision events. This makes risk particularly useful for automated vehicle motion planning. Namely, constraining or minimizing risk naturally navigates the automated vehicle around traffic participants while keeping a safety distance based on the level of uncertainty and the potential severity of the impending collision. Existing approaches to calculate the risk either resort to empirical modeling or severe approximations, and, hence, lack generalizability and accuracy. In this paper, we combine recent advances in collision probability estimation with the concept of collision severity to develop a general method for accurate risk estimation. The proposed method allows us to assign individual severity functions for different collision constellations, such as, e.g., frontal or side collisions. Furthermore, we show that the proposed approach is computationally efficient, which is beneficial, e.g., in real-time motion planning applications. The programming code for an exemplary implementation of Gaussian uncertainties is also provided.

Graph-Based Adaptive Planning for Coordinated Dual-Arm Robotic Disassembly of Electronic Devices (eGRAP)

Authors:Adip Ranjan Das, Maria Koskinopoulou

Date:2026-01-21 13:57:03

E-waste is growing rapidly while recycling rates remain low. We propose an electronic-device Graph-based Adaptive Planning (eGRAP) that integrates vision, dynamic planning, and dual-arm execution for autonomous disassembly. A camera-equipped arm identifies parts and estimates their poses, and a directed graph encodes which parts must be removed first. A scheduler uses topological ordering of this graph to select valid next steps and assign them to two robot arms, allowing independent tasks to run in parallel. One arm carries a screwdriver (with an eye-in-hand depth camera) and the other holds or handles components. We demonstrate eGRAP on 3.5in hard drives: as parts are unscrewed and removed, the system updates its graph and plan online. Experiments show consistent full disassembly of each HDD, with high success rates and efficient cycle times, illustrating the method's ability to adaptively coordinate dual-arm tasks in real time.

HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV

Authors:Faryal Batool, Iana Zhura, Valerii Serpiva, Roohan Ahmed Khan, Ivan Valuev, Issatay Tokmurziyev, Dzmitry Tsetserukou

Date:2026-01-21 13:22:22

Reliable human--robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents HumanDiffusion, a lightweight image-conditioned diffusion planner that generates human-aware navigation trajectories directly from RGB imagery. The system combines YOLO-11 based human detection with diffusion-driven trajectory generation, enabling a quadrotor to approach a target person and deliver medical assistance without relying on prior maps or computationally intensive planning pipelines. Trajectories are predicted in pixel space, ensuring smooth motion and a consistent safety margin around humans. We evaluate HumanDiffusion in simulation and real-world indoor mock-disaster scenarios. On a 300-sample test set, the model achieves a mean squared error of 0.02 in pixel-space trajectory reconstruction. Real-world experiments demonstrate an overall mission success rate of 80% across accident-response and search-and-locate tasks with partial occlusions. These results indicate that human-conditioned diffusion planning offers a practical and robust solution for human-aware UAV navigation in time-critical assistance settings.