planning - 2026-01-12

Resilient UAV Data Mule via Adaptive Sensor Association under Timing Constraints

Authors:Md Sharif Hossen, Anil Gurses, Ozgur Ozdemir, Mihail Sichitiu, Ismail Guvenc
Date:2026-01-09 18:33:26

Unmanned aerial vehicles (UAVs) can be critical for time-sensitive data collection missions, yet existing research often relies on simulations that fail to capture real-world complexities. Many studies assume ideal wireless conditions or focus only on path planning, neglecting the challenge of making real-time decisions in dynamic environments. To bridge this gap, we address the problem of adaptive sensor selection for a data-gathering UAV, considering both the buffered data at each sensor and realistic propagation conditions. We introduce the Hover-based Greedy Adaptive Download (HGAD) strategy, designed to maximize data transfer by intelligently hovering over sensors during periods of peak signal quality. We validate HGAD using both a digital twin (DT) and a real-world (RW) testbed at the NSF-funded AERPAW platform. Our experiments show that HGAD significantly improves download stability and successfully meets per-sensor data targets. When compared with the traditional Greedy approach that simply follows the strongest signal, HGAD is shown to outperform in the cumulative data download. This work demonstrates the importance of integrating signal-to-noise ratio (SNR)-aware and buffer-aware scheduling with DT and RW signal traces to design resilient UAV data-mule strategies for realistic deployments.

Non Destructive Testing

Authors:Gonzalo Arnau Izquierdo
Date:2026-01-09 15:55:43

The contribution introduces the principles, methods, and applications of non-destructive testing in the context of particle accelerators and related technologies is presented. Both surface inspection methods (visual testing, penetrant testing, magnetic particle testing, eddy current testing) and volumetric methods (radiographic testing, ultrasonic testing) are presented, with examples drawn from CERN projects. Emphasis is placed on the capabilities, limitations, and practical considerations of each technique, highlighting their role in ensuring quality and safety during fabrication, procurement, and in-service inspections. The contribution also addresses standards, codes, and personnel qualification schemes, underlining their regulatory impact in fields such as pressure equipment and industrial piping. Finally, the importance of early integration of NDT into project planning is stressed, not only for compliance but also as a proactive quality and efficiency measure.

Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

Authors:Nate Gillman, Yinghua Zhou, Zitian Tang, Evan Luo, Arjan Chakravarthy, Daksh Aggarwal, Michael Freeman, Charles Herrmann, Chen Sun
Date:2026-01-09 15:23:36

Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.

Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning

Authors:Sheng-Kai Chen, Jyh-Horng Wu
Date:2026-01-09 15:10:23

This paper presents a comprehensive approach to singularity detection and avoidance in UR10 robotic arm path planning through the integration of fuzzy logic safety systems and reinforcement learning algorithms. The proposed system addresses critical challenges in robotic manipulation where singularities can cause loss of control and potential equipment damage. Our hybrid approach combines real-time singularity detection using manipulability measures, condition number analysis, and fuzzy logic decision-making with a stable reinforcement learning framework for adaptive path planning. Experimental results demonstrate a 90% success rate in reaching target positions while maintaining safe distances from singular configurations. The system integrates PyBullet simulation for training data collection and URSim connectivity for real-world deployment.

From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation

Authors:Zezhou Wang, Ziyun Zhang, Xiaoyi Zhang, Zhuzhong Qian, Yan Lu
Date:2026-01-09 13:26:38

Vision-language models are increasingly deployed as computer-use agents (CUAs) that operate desktops and browsers. Top-performing CUAs are framework-based systems that decompose planning and execution, while end-to-end screenshot-to-action policies are easier to deploy but lag behind on benchmarks such as OSWorld-Verified. GUI datasets like OSWorld pose two bottlenecks: they expose only a few hundred interactive, verifiable tasks and environments, and expert trajectories must be gathered by interacting with these environments, making such data hard to scale. We therefore ask how reinforcement learning from verifiable rewards (RLVR) can best exploit a small pool of exist expert trajectories to train end-to-end policies. Naively mixing these off-policy traces into on-policy RLVR is brittle: even after format conversion, expert trajectories exhibit structural mismatch and distribution shift from the learner. We propose BEPA (Bi-Level Expert-to-Policy Assimilation), which turns static expert traces into policy-aligned guidance via self-rolled reachable trajectories under the base policy (LEVEL-1) and a per-task, dynamically updated cache used in RLVR (LEVEL-2). On OSWorld-Verified, BEPA improves UITARS1.5-7B success from 22.87% to 32.13% and raises a held-out split from 5.74% to 10.30%, with consistent gains on MMBench-GUI and Online-Mind2Web. Our code and data are available at: https://github.com/LEON-gittech/Verl_GUI.git

Explicit Reward Mechanisms for Local Flexibility in Renewable Energy Communities

Authors:Thomas Stegen, Julien Allard, Noé Diffels, François Vallée, Mevludin Glavic, Zacharie De Grève, Bertrand Cornélusse
Date:2026-01-09 12:20:05

Incentivizing flexible consumption of end-users is key to maximizing the value of local exchanges within Renewable Energy Communities. If centralized coordination for flexible resources planning raises concerns regarding data privacy and fair benefits distribution, state-of-the-art approaches (e.g., bi-level, ADMM) often face computational complexity and convexity challenges, limiting the precision of embedded flexible models. This work proposes an iterative resolution procedure to solve the decentralized flexibility planning with a central operator as a coordinator within a community. The community operator asks for upward or downward flexibility depending on the global needs, while members can individually react with an offer for flexible capacity. This approach ensures individual optimality while converging towards a global optimum, as validated on a 20-member domestic case study for which the gap in terms of collective bill is not more than 3.5% between the decentralized and centralized coordination schemes.

Motion Compensation for Real Time Ultrasound Scanning in Robotically Assisted Prostate Biopsy Procedures

Authors:Matija Markulin, Luka Matijević, Luka Siktar, Janko Jurdana, Branimir Caran, Marko Švaco, Filip Šuligoj, Bojan Šekoranja
Date:2026-01-09 09:30:55

Prostate cancer is one of the most common types of cancer in men. Its diagnosis by biopsy requires a high level of expertise and precision from the surgeon, so the results are highly operator-dependent. The aim of this work is to develop a robotic system for assisted ultrasound (US) examination of the prostate, a prebiopsy step that could reduce the dexterity requirements and enable faster, more accurate and more available prostate biopsy. We developed and validated a laboratory setup with a collaborative robotic arm that can autonomously scan a prostate phantom and attached the phantom to a medical robotic arm that mimics the patient's movements. The scanning robot keeps the relative position of the US probe and the prostate constant, ensuring a consistent and robust approach to reconstructing the prostate. To reconstruct the prostate, each slice is segmented to generate a series of prostate contours converted into a 3D point cloud used for biopsy planning. The average scan time of the prostate was 30 s, and the average 3D reconstruction of the prostate took 3 s. We performed four motion scenarios: the phantom was scanned in a stationary state (S), with horizontal motion (H), with vertical motion (V), and with a combination of the two (C). System validation is performed by registering the prostate point cloud reconstructions acquired during different motions (H, V, C) with those obtained in the stationary state. ICP registration with a threshold of 0.8 mm yields mean 83.2\% fitness and 0.35 mm RMSE for S-H registration, 84.1\% fitness and 0.37 mm RMSE for S-V registration and 79.4\% fitness and 0.37 mm RMSE for S-C registration. Due to the elastic and soft material properties of the prostate phantom, the maximum robot tracking error was 3 mm, which can be sufficient for prostate biopsy according to medical literature. The maximum delay in motion compensation was 0.5 s.

SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving

Authors:Jingyu Li, Junjie Wu, Dongnan Hu, Xiangkai Huang, Bin Sun, Zhihui Hao, Xianpeng Lang, Xiatian Zhu, Li Zhang
Date:2026-01-09 08:55:42

Recent end-to-end autonomous driving approaches have leveraged Vision-Language Models (VLMs) to enhance planning capabilities in complex driving scenarios. However, VLMs are inherently trained as generalist models, lacking specialized understanding of driving-specific reasoning in 3D space and time. When applied to autonomous driving, these models struggle to establish structured spatial-temporal representations that capture geometric relationships, scene context, and motion patterns critical for safe trajectory planning. To address these limitations, we propose SGDrive, a novel framework that explicitly structures the VLM's representation learning around driving-specific knowledge hierarchies. Built upon a pre-trained VLM backbone, SGDrive decomposes driving understanding into a scene-agent-goal hierarchy that mirrors human driving cognition: drivers first perceive the overall environment (scene context), then attend to safety-critical agents and their behaviors, and finally formulate short-term goals before executing actions. This hierarchical decomposition provides the structured spatial-temporal representation that generalist VLMs lack, integrating multi-level information into a compact yet comprehensive format for trajectory planning. Extensive experiments on the NAVSIM benchmark demonstrate that SGDrive achieves state-of-the-art performance among camera-only methods on both PDMS and EPDMS, validating the effectiveness of hierarchical knowledge structuring for adapting generalist VLMs to autonomous driving.

PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering

Authors:Yu Liu, Wenxiao Zhang, Cong Cao, Wenxuan Lu, Fangfang Yuan, Diandian Guo, Kun Peng, Qiang Sun, Kaiyan Zhang, Yanbing Liu, Jin B. Hong, Bowen Zhou, Zhiyuan Ma
Date:2026-01-09 01:38:38

Answering real-world open-domain multi-hop questions over massive corpora is a critical challenge in Retrieval-Augmented Generation (RAG) systems. Recent research employs reinforcement learning (RL) to end-to-end optimize the retrieval-augmented reasoning process, directly enhancing its capacity to resolve complex queries. However, reliable deployment is hindered by two obstacles. 1) Retrieval Collapse: iterative retrieval over large corpora fails to locate intermediate evidence containing bridge answers without reasoning-guided planning, causing downstream reasoning to collapse. 2) Learning Instability: end-to-end trajectory training suffers from weak credit assignment across reasoning chains and poor error localization across modules, causing overfitting to benchmark-specific heuristics that limit transferability and stability. To address these problems, we propose PRISMA, a decoupled RL-guided framework featuring a Plan-Retrieve-Inspect-Solve-Memoize architecture. PRISMA's strength lies in reasoning-guided collaboration: the Inspector provides reasoning-based feedback to refine the Planner's decomposition and fine-grained retrieval, while enforcing evidence-grounded reasoning in the Solver. We optimize individual agent capabilities via Two-Stage Group Relative Policy Optimization (GRPO). Stage I calibrates the Planner and Solver as specialized experts in planning and reasoning, while Stage II utilizes Observation-Aware Residual Policy Optimization (OARPO) to enhance the Inspector's ability to verify context and trigger targeted recovery. Experiments show that PRISMA achieves state-of-the-art performance on ten benchmarks and can be deployed efficiently in real-world scenarios.

PRISM: Protocol Refinement through Intelligent Simulation Modeling

Authors:Brian Hsu, Priyanka V Setty, Rory M Butler, Ryan Lewis, Casey Stone, Rebecca Weinberg, Thomas Brettin, Rick Stevens, Ian Foster, Arvind Ramanathan
Date:2026-01-08 20:15:28

Automating experimental protocol design and execution remains as a fundamental bottleneck in realizing self-driving laboratories. We introduce PRISM (Protocol Refinement through Intelligent Simulation Modeling), a framework that automates the design, validation, and execution of experimental protocols on a laboratory platform composed of off-the-shelf robotic instruments. PRISM uses a set of language-model-based agents that work together to generate and refine experimental steps. The process begins with automatically gathering relevant procedures from web-based sources describing experimental workflows. These are converted into structured experimental steps (e.g., liquid handling steps, deck layout and other related operations) through a planning, critique, and validation loop. The finalized steps are translated into the Argonne MADSci protocol format, which provides a unified interface for coordinating multiple robotic instruments (Opentrons OT-2 liquid handler, PF400 arm, Azenta plate sealer and peeler) without requiring human intervention between steps. To evaluate protocol-generation performance, we benchmarked both single reasoning models and multi-agent workflow across constrained and open-ended prompting paradigms. The resulting protocols were validated in a digital-twin environment built in NVIDIA Omniverse to detect physical or sequencing errors before execution. Using Luna qPCR amplification and Cell Painting as case studies, we demonstrate PRISM as a practical end-to-end workflow that bridges language-based protocol generation, simulation-based validation, and automated robotic execution.

Feasibility of a General-Purpose Deep Learning Dose Engine: A Multi-Site Validation Study

Authors:Yao Zhao, Ka Ho Tam, Raphael Douglas, Kyuhak Oh, Xin Wang, Ergys Subashi, Jinzhong Yang, Laurence Court, Dong Joo Rhee
Date:2026-01-08 19:55:33

Conventional radiotherapy dose calculation algorithms are often computationally slow and non-differentiable, creating bottlenecks for online adaptive radiotherapy (ART) and limiting end-to-end automatic planning. Deep learning provides consistent inference performance and a differentiable framework essential for rapid optimization. In this study, we developed a generalized, site-independent deep learning dose engine using a beamlet-based input strategy. This establishes a computationally consistent and differentiable module that enables end-to-end training for autoplanning while maintaining accuracy across diverse geometries. A dataset of 3,600 plans from 120 patients across six anatomical sites was used to train two 3D convolutional neural networks, a standard U-Net and a Cascade U-Net, to predict 3D dose distributions from CT images and divergent MLC/jaw projections. Performance was validated via 3D gamma analysis on an independent cohort of 60 VMAT plans. The optimal model (U-Net with MAE loss) achieved a mean gamma passing rate of $98.9 \pm 1.6\%$ (3%/2mm, 10% threshold). Performance remained robust across all sites (passing rates $>98\%$), demonstrating that the beamlet-based strategy generalizes effectively to complex geometries without site-specific training. These results indicate that a single, site-independent model can calculate radiotherapy dose distributions with clinical accuracy. This differentiable engine is highly suitable for integration into end-to-end automatic planning, online ART, and secondary dose verification workflows.

Intent at a Glance: Gaze-Guided Robotic Manipulation via Foundation Models

Authors:Tracey Yee Hsin Tay, Xu Yan, Jonathan Ouyang, Daniel Wu, William Jiang, Jonathan Kao, Yuchen Cui
Date:2026-01-08 19:33:03

Designing intuitive interfaces for robotic control remains a central challenge in enabling effective human-robot interaction, particularly in assistive care settings. Eye gaze offers a fast, non-intrusive, and intent-rich input modality, making it an attractive channel for conveying user goals. In this work, we present GAMMA (Gaze Assisted Manipulation for Modular Autonomy), a system that leverages ego-centric gaze tracking and a vision-language model to infer user intent and autonomously execute robotic manipulation tasks. By contextualizing gaze fixations within the scene, the system maps visual attention to high-level semantic understanding, enabling skill selection and parameterization without task-specific training. We evaluate GAMMA on a range of table-top manipulation tasks and compare it against baseline gaze-based control without reasoning. Results demonstrate that GAMMA provides robust, intuitive, and generalizable control, highlighting the potential of combining foundation models and gaze for natural and scalable robot autonomy. Project website: https://gamma0.vercel.app/

Directed Nano-antennas for Laser Fusion

Authors:FUSENOW, NAPLIFE Collaborations, :, Zsuzsanna Márton, Imene Benabdelghani, Márk Aladi, Judit Budai, Aldo Bonasera, Attila Bonyár, Mária Csete, Martin Greve, Jan-Petter Hansen, Gergely Hegedűs, Ádám Inger, Miklos Kedves, István Papp, Péter Rácz, András Szenes, Ágnes Szokol, Dávid Vass, Miklós Veres, Konstantin Zsukovszki, Tamás S. Bíró, Norbert Kroó, Laszlo P. Csernai
Date:2026-01-08 19:20:01

Why do we use nano-antennas for fusion? In three sentences: The present laser induced fusion plans use extreme mechanical shock compression to get one hotspot and then ignition. Still fusion burning spreads slower than expansion, and mechanical instabilities may also develop. With nano-antennas in radiation dominated systems, simultaneous ignition can be achieved in the whole target volume and there is no time left for mechanical instabilities. Ignition is achieved with protons accelerated in the direction of the nanoantennas that are orthogonal to the direction of laser irradiation. Present laser fusion methods are based on extreme and slow mechanical compression with an ablator surface on the fuel target pellet to increase compression and eliminate penetration of laser electromagnetic energy into the target. This arises from a mistaken assumption, [1] that the detonation normal 4-vector should have vanishing time-like component, and this assumption eliminates the possibility to rapid or even simultaneous, radiation dominated detonations, (which are well known in the burning (or hadronization) of Quark Gluon Plasma).

Learning Latent Action World Models In The Wild

Authors:Quentin Garrido, Tushar Nagarajan, Basile Terver, Nicolas Ballas, Yann LeCun, Michael Rabbat
Date:2026-01-08 18:55:39

Agents capable of reasoning and planning in the real world require the ability of predicting the consequences of their actions. While world models possess this capability, they most often require action labels, that can be complex to obtain at scale. This motivates the learning of latent action models, that can learn an action space from videos alone. Our work addresses the problem of learning latent actions world models on in-the-wild videos, expanding the scope of existing works that focus on simple robotics simulations, video games, or manipulation data. While this allows us to capture richer actions, it also introduces challenges stemming from the video diversity, such as environmental noise, or the lack of a common embodiment across videos. To address some of the challenges, we discuss properties that actions should follow as well as relevant architectural choices and evaluations. We find that continuous, but constrained, latent actions are able to capture the complexity of actions from in-the-wild videos, something that the common vector quantization does not. We for example find that changes in the environment coming from agents, such as humans entering the room, can be transferred across videos. This highlights the capability of learning actions that are specific to in-the-wild videos. In the absence of a common embodiment across videos, we are mainly able to learn latent actions that become localized in space, relative to the camera. Nonetheless, we are able to train a controller that maps known actions to latent ones, allowing us to use latent actions as a universal interface and solve planning tasks with our world model with similar performance as action-conditioned baselines. Our analyses and experiments provide a step towards scaling latent action models to the real world.

Sparsity and uniform regularity for regularised optimal transport

Authors:Rishabh S. Gvalani, Lukas Koch
Date:2026-01-08 17:20:00

We consider regularised quadratic optimal transport with subquadratic polynomial or entropic regularisation. In both cases, we prove interior Lipschitz-estimates on a transport-like map and interior gradient Lipschitz-estimates on the potentials, under the assumption that the transport map solving the unregularised problem is bi-$C^{1,α}$-regular. For strictly subquadratic and entropic regularisation, the estimates improve to interior $C^1$ and $C^2$ estimates for the transport-like map and the potentials, respectively. Our estimates are uniform in the regularisation parameter. As a consequence of this, we obtain convergence of the transport-like map (resp. the potentials) to the unregularised transport map (resp. Kantorovich potentials) in $C^{0,1-}_{\mathrm{loc}}$ (resp. $C^{1,1-}_{\mathrm{loc}}$). Central to our approach are sharp local bounds on the size of the support for regularised optimal transport which we derive for a general convex, superlinear regularisation term. These bounds are of independent interest and imply global bias bounds for the regularised transport plans. Our global bounds, while not necessarily sharp, improve on the best known results in the literature for quadratic regularisation.

Dosimetric Impact of Hidden Input Parameters in Inverse Optimization Algorithms for GYN HDR Brachytherapy

Authors:YeongHyeon Park, Shiqin Su, Sarath Vijayan, Zhiqian Henry Yu, Mandy Cunningham, Yusung Kim
Date:2026-01-08 15:51:24

Inverse optimization (IO) algorithms are used in GYN HDR brachytherapy planning, with user parameter settings embedded in commercial TPS. To examine the dosimetric influence of hidden input parameters in three IO algorithms-IPSA, HIPO, and MCO-for GYN HDR brachytherapy across two applicator types. In-house implementations of IPSA, HIPO, and MCO were implemented and evaluated against retrospectively generated commercial TPS plans (Oncentra Brachy) using identical clinical input parameters across 24 cervical cancer cases (18 T&O; 6 T&O+Needles (T&O+N)). Each IO algorithm was assessed using 1k combinations of hidden parameters (e.g., dwell-time modulation constraints, convergence thresholds). Cumulative DVH curves and dosimetric indices (HR-CTV D98/D90, OAR D2cc) were compared with commercial plans. Standard deviations (SD) of DVH differences were used to characterize sensitivity to hidden parameters. For HR-CTV, SD values in T&O+N cases reached 23.0 Gy and 7.1 Gy for MCO and HIPO, respectively, with corresponding average values of 55.8 Gy and 19.7 Gy. In T&O cases, HR-CTV SD values reached 4.9 Gy and 3.3 Gy for HIPO and IPSA, respectively, with average values of 20.1 Gy and 8.6 Gy. MCO exhibited the highest sensitivity, followed by HIPO and IPSA. T&O+N cases showed greater sensitivity than T&O cases. Absolute differences in HR-CTV D90 (D98) relative to commercial algorithms reached up to 33.3 Gy (28.4) for T&O+N cases and 10.8 Gy (8.5) for T&O cases. For OARs, absolute D2cc differences in T&O+N (T&O) cases reached up to 8.6 Gy (2.3) for rectum, 17 Gy (10.2) for bladder, 14.8 Gy (3.9) for sigmoid, and 7.0 Gy (8.1) for bowel. Hidden input parameter settings significantly impact on GYN HDR plans, with target coverage up to 28.4 Gy across IO algorithms for both T&O and T&O+N cases. The findings in this study shown the potential to improve plans through hidden input parameter optimization.

From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning

Authors:Mustafa F. Abdelwahed, Joan Espasa, Alice Toniolo, Ian P. Gent
Date:2026-01-08 13:09:43

The primary objective of a diverse planning approach is to generate a set of plans that are distinct from one another. Such an approach is applied in a variety of real-world domains, including risk management, automated stream data analysis, and malware detection. More recently, a novel diverse planning paradigm, referred to as behaviour planning, has been proposed. This approach extends earlier methods by explicitly incorporating a diversity model into the planning process and supporting multiple planning categories. In this paper, we demonstrate the usefulness of behaviour planning in real-world settings by presenting three case studies. The first case study focuses on storytelling, the second addresses urban planning, and the third examines game evaluation.

Precomputing Multi-Agent Path Replanning using Temporal Flexibility: A Case Study on the Dutch Railway Network

Authors:Issa Hanou, Eric Kemmeren, Devin Wild Thomas, Mathijs de Weerdt
Date:2026-01-08 12:30:36

Executing a multi-agent plan can be challenging when an agent is delayed, because this typically creates conflicts with other agents. So, we need to quickly find a new safe plan. Replanning only the delayed agent often does not result in an efficient plan, and sometimes cannot even yield a feasible plan. On the other hand, replanning other agents may lead to a cascade of changes and delays. We show how to efficiently replan by tracking and using the temporal flexibility of other agents while avoiding cascading delays. This flexibility is the maximum delay an agent can take without changing the order of or further delaying more agents. Our algorithm, FlexSIPP, precomputes all possible plans for the delayed agent, also returning the changes for the other agents, for any single-agent delay within the given scenario. We demonstrate our method in a real-world case study of replanning trains in the densely-used Dutch railway network. Our experiments show that FlexSIPP provides effective solutions, relevant to real-world adjustments, and within a reasonable timeframe.

Bi-level Multi-criteria Optimization for Risk-informed Radiotherapy

Authors:Mara Schubert, Katrin Teichert, Zhongxing Liao, Thomas Bortfeld, Ali Ajdari
Date:2026-01-08 10:57:53

In radiation therapy (RT) treatment planning, multi-criteria optimization (MCO) supports efficient plan selection but is usually solved for population-based dosimetric criteria and ignores patient-specific biological risk, potentially compromising outcomes in high-risk patients. We propose risk-guided MCO, a one-shot method that embeds a clinical risk model into conventional MCO, enabling interactive navigation between dosimetric and biological endpoints. The proposed algorithm uses a special order relation to fuse the classical MCO sandwiching algorithm with bi-level optimization, restricting the Pareto set to plans that achieve improvement in the secondary risk objective for user-defined, acceptable loss in primary clinical objectives. Thus, risk-guided MCO generates risk-optimized counterparts of clinical plans in a single run rather than by sequential or lexicographic planning. To assess the performance, we retrospectively analyzed 19 lung cancer patients treated with RT. The endpoint was the risk of grade 2+ radiation pneumonitis (RP), modeled using bootstrapped stepwise logistics regression with interaction terms, including baseline lung function, smoking history, and dosimetric factors. The risk-guided plans yielded a mean reduction of 8.0% in total lung V20 and 9.5% in right lung V5, translating into an average RP risk reduction of 7.7% (range=0.3%-20.1%), with small changes in target coverage (mean -1.2 D98[%] for CTV) and modest increase in heart dose (mean +1.74 Gy). This study presents the first proof-of-concept for integrating biological risk models directly within multi-criteria RT planning, enabling an interactive balance between established population-wide dose protocols and individualized outcome prediction. Our results demonstrate that the risk-informed MCO can reduce the risk of RP while maintaining target coverage.

SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

Authors:Zebin Han, Xudong Wang, Baichen Liu, Qi Lyu, Zhenduo Shang, Jiahua Dong, Lianqing Liu, Zhi Han
Date:2026-01-08 08:09:24

Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.

TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

Authors:Yinuo Wang, Mining Tan, Wenxiang Jiao, Xiaoxi Li, Hao Wang, Xuanyu Zhang, Yuan Lu, Weiming Dong
Date:2026-01-08 08:08:35

Travel planning is a sophisticated decision-making process that requires synthesizing multifaceted information to construct itineraries. However, existing travel planning approaches face several challenges: (1) Pruning candidate points of interest (POIs) while maintaining a high recall rate; (2) A single reasoning path restricts the exploration capability within the feasible solution space for travel planning; (3) Simultaneously optimizing hard constraints and soft constraints remains a significant difficulty. To address these challenges, we propose TourPlanner, a comprehensive framework featuring multi-path reasoning and constraint-gated reinforcement learning. Specifically, we first introduce a Personalized Recall and Spatial Optimization (PReSO) workflow to construct spatially-aware candidate POIs' set. Subsequently, we propose Competitive consensus Chain-of-Thought (CCoT), a multi-path reasoning paradigm that improves the ability of exploring the feasible solution space. To further refine the plan, we integrate a sigmoid-based gating mechanism into the reinforcement learning stage, which dynamically prioritizes soft-constraint satisfaction only after hard constraints are met. Experimental results on travel planning benchmarks demonstrate that TourPlanner achieves state-of-the-art performance, significantly surpassing existing methods in both feasibility and user-preference alignment.

Tape: A Cellular Automata Benchmark for Evaluating Rule-Shift Generalization in Reinforcement Learning

Authors:Enze Pan
Date:2026-01-08 08:05:42

We present Tape, a controlled reinforcement-learning benchmark designed to isolate out-of-distribution (OOD) failure under latent rule shifts.Tape is derived from one-dimensional cellular automata, enabling precise train/test splits where observation and action spaces are held fixed while transition rules change. Using a reproducible evaluation pipeline, we compare model-free baselines, model-based planning with learned world models, and task-inference (meta-RL) methods. A consistent pattern emerges: methods that are strong in-distribution (ID) can collapse under heldout-rule OOD, and high-variance OOD evaluation can make rankings unstable unless experiments are sufficiently replicated.We provide (i) standardized OOD protocols, (ii) statistical reporting requirements (seeds, confidence intervals, and hypothesis tests), and (iii) information-theoretic identities connecting entropy reduction to conditional mutual information and expected posterior KL divergence, clarifying what "uncertainty reduction" objectives can and cannot guarantee under rule shifts.

Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead

Authors:Oluwatosin Oseni, Shengjie Wang, Jun Zhu, Micah Corah
Date:2026-01-08 07:55:07

Reinforcement Learning (RL) has shown remarkable success in real-world applications, particularly in robotics control. However, RL adoption remains limited due to insufficient safety guarantees. We introduce Nightmare Dreamer, a model-based Safe RL algorithm that addresses safety concerns by leveraging a learned world model to predict potential safety violations and plan actions accordingly. Nightmare Dreamer achieves nearly zero safety violations while maximizing rewards. Nightmare Dreamer outperforms model-free baselines on Safety Gymnasium tasks using only image observations, achieving nearly a 20x improvement in efficiency.

Optimizing Path Planning using Deep Reinforcement Learning for UGVs in Precision Agriculture

Authors:Laukik Patade, Rohan Rane, Sandeep Pillai
Date:2026-01-08 07:28:11

This study focuses on optimizing path planning for unmanned ground vehicles (UGVs) in precision agriculture using deep reinforcement learning (DRL) techniques in continuous action spaces. The research begins with a review of traditional grid-based methods, such as A* and Dijkstra's algorithms, and discusses their limitations in dynamic agricultural environments, highlighting the need for adaptive learning strategies. The study then explores DRL approaches, including Deep Q-Networks (DQN), which demonstrate improved adaptability and performance in two-dimensional simulations. Enhancements such as Double Q-Networks and Dueling Networks are evaluated to further improve decision-making. Building on these results, the focus shifts to continuous action space models, specifically Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3), which are tested in increasingly complex environments. Experiments conducted in a three-dimensional environment using ROS and Gazebo demonstrate the effectiveness of continuous DRL algorithms in navigating dynamic agricultural scenarios. Notably, the pretrained TD3 agent achieves a 95 percent success rate in dynamic environments, demonstrating the robustness of the proposed approach in handling moving obstacles while ensuring safety for both crops and the robot.

Cluster-Based Bayesian SIRD Modeling of Chickenpox Epidemiology in India

Authors:Nayana Mukherjee, Chitradipa Chakraborty
Date:2026-01-08 06:41:45

This study presents a cluster-based Bayesian SIRD model to analyze the epidemiology of chickenpox (varicella) in India, utilizing data from 1990 to 2021. We employed an age-structured approach, dividing the population into juvenile, adult, and elderly groups, to capture the disease's transmission dynamics across diverse demographic groups. The model incorporates a Holling-type incidence function, which accounts for the saturation effect of transmission at high prevalence levels, and applies Bayesian inference to estimate key epidemiological parameters, including transmission rates, recovery rates, and mortality rates. The study further explores cluster analysis to identify regional clusters within India based on the similarities in chickenpox transmission dynamics, using criteria like incidence, prevalence, and mortality rates. We perform K-means clustering to uncover three distinct epidemiological regimes, which vary in terms of outbreak potential and age-specific dynamics. The findings highlight juveniles as the primary drivers of transmission, while the elderly face a disproportionately high mortality burden. Our results underscore the importance of age-targeted interventions and suggest that regional heterogeneity should be considered in public health strategies for disease control. The model offers a transparent, reproducible framework for understanding long-term transmission dynamics and supports evidence-based planning for chickenpox control in India. The practical utility of the model is further validated through a simulation study.

Adaptive Retrieval for Reasoning-Intensive Retrieval

Authors:Jongho Kim, Jaeyoung Kim, Seung-won Hwang, Jihyuk Kim, Yu Jin Kim, Moontae Lee
Date:2026-01-08 05:46:50

We study leveraging adaptive retrieval to ensure sufficient "bridge" documents are retrieved for reasoning-intensive retrieval. Bridge documents are those that contribute to the reasoning process yet are not directly relevant to the initial query. While existing reasoning-based reranker pipelines attempt to surface these documents in ranking, they suffer from bounded recall. Naive solution with adaptive retrieval into these pipelines often leads to planning error propagation. To address this, we propose REPAIR, a framework that bridges this gap by repurposing reasoning plans as dense feedback signals for adaptive retrieval. Our key distinction is enabling mid-course correction during reranking through selective adaptive retrieval, retrieving documents that support the pivotal plan. Experimental results on reasoning-intensive retrieval and complex QA tasks demonstrate that our method outperforms existing baselines by 5.6%pt.

Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries

Authors:Saad Alqithami
Date:2026-01-08 04:29:26

Advances in large language models have enabled agentic AI systems that can reason, plan, and interact with external tools to execute multi-step workflows, while public blockchains have evolved into a programmable substrate for value transfer, access control, and verifiable state transitions. Their convergence introduces a high-stakes systems challenge: designing standard, interoperable, and secure interfaces that allow agents to observe on-chain state, formulate transaction intents, and authorize execution without exposing users, protocols, or organizations to unacceptable security, governance, or economic risks. This survey systematizes the emerging landscape of agent-blockchain interoperability through a systematic literature review, identifying 317 relevant works from an initial pool of over 3000 records. We contribute a five-part taxonomy of integration patterns spanning read-only analytics, simulation and intent generation, delegated execution, autonomous signing, and multi-agent workflows; a threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion; and a comparative capability matrix analyzing more than 20 representative systems across 13 dimensions, including custody models, permissioning, policy enforcement, observability, and recovery. Building on the gaps revealed by this analysis, we outline a research roadmap centered on two interface abstractions: a Transaction Intent Schema for portable and unambiguous goal specification, and a Policy Decision Record for auditable, verifiable policy enforcement across execution environments. We conclude by proposing a reproducible evaluation suite and benchmarks for assessing the safety, reliability, and economic robustness of agent-mediated on-chain execution.

Data-Driven Terramechanics Approach Towards a Realistic Real-Time Simulator for Lunar Rovers

Authors:Jakob M. Kern, James M. Hurrell, Shreya Santra, Keisuke Takehana, Kentaro Uno, Kazuya Yoshida
Date:2026-01-08 03:23:31

High-fidelity simulators for the lunar surface provide a digital environment for extensive testing of rover operations and mission planning. However, current simulators focus on either visual realism or physical accuracy, which limits their capability to replicate lunar conditions comprehensively. This work addresses that gap by combining high visual fidelity with realistic terrain interaction for a realistic representation of rovers on the lunar surface. Because direct simulation of wheel-soil interactions is computationally expensive, a data-driven approach was adopted, using regression models for slip and sinkage from data collected in both full-rover and single-wheel experiments and simulations. The resulting regression-based terramechanics model accurately reproduced steady-state and dynamic slip, as well as sinkage behavior, on flat terrain and slopes up to 20 degrees, with validation against field test results. Additionally, improvements were made to enhance the realism of terrain deformation and wheel trace visualization. This method supports real-time applications that require physically plausible terrain response alongside high visual fidelity.

A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes

Authors:Sahaya Jestus Lazer, Kshitiz Aryal, Maanak Gupta, Elisa Bertino
Date:2026-01-08 02:46:06

Agentic AI marks an important transition from single-step generative models to systems capable of reasoning, planning, acting, and adapting over long-lasting tasks. By integrating memory, tool use, and iterative decision cycles, these systems enable continuous, autonomous workflows in real-world environments. This survey examines the implications of agentic AI for cybersecurity. On the defensive side, agentic capabilities enable continuous monitoring, autonomous incident response, adaptive threat hunting, and fraud detection at scale. Conversely, the same properties amplify adversarial power by accelerating reconnaissance, exploitation, coordination, and social-engineering attacks. These dual-use dynamics expose fundamental gaps in existing governance, assurance, and accountability mechanisms, which were largely designed for non-autonomous and short-lived AI systems. To address these challenges, we survey emerging threat models, security frameworks, and evaluation pipelines tailored to agentic systems, and analyze systemic risks including agent collusion, cascading failures, oversight evasion, and memory poisoning. Finally, we present three representative use-case implementations that illustrate how agentic AI behaves in practical cybersecurity workflows, and how design choices shape reliability, safety, and operational effectiveness.

GUITester: Enabling GUI Agents for Exploratory Defect Discovery

Authors:Yifei Gao, Jiang Wu, Xiaoyi Chen, Yifan Yang, Zhe Cui, Tianyi Ma, Jiaming Zhang, Jitao Sang
Date:2026-01-08 02:07:53

Exploratory GUI testing is essential for software quality but suffers from high manual costs. While Multi-modal Large Language Model (MLLM) agents excel in navigation, they fail to autonomously discover defects due to two core challenges: \textit{Goal-Oriented Masking}, where agents prioritize task completion over reporting anomalies, and \textit{Execution-Bias Attribution}, where system defects are misidentified as agent errors. To address these, we first introduce \textbf{GUITestBench}, the first interactive benchmark for this task, featuring 143 tasks across 26 defects. We then propose \textbf{GUITester}, a multi-agent framework that decouples navigation from verification via two modules: (i) a \textit{Planning-Execution Module (PEM)} that proactively probes for defects via embedded testing intents, and (ii) a \textit{Hierarchical Reflection Module (HRM)} that resolves attribution ambiguity through interaction history analysis. GUITester achieves an F1-score of 48.90\% (Pass@3) on GUITestBench, outperforming state-of-the-art baselines (33.35\%). Our work demonstrates the feasibility of autonomous exploratory testing and provides a robust foundation for future GUI quality assurance~\footnote{Our code is now available in~\href{https://github.com/ADaM-BJTU/GUITestBench}{https://github.com/ADaM-BJTU/GUITestBench}}.