planning - 2026-03-09

Boosting deep Reinforcement Learning using pretraining with Logical Options

Authors:Zihan Ye, Phil Chau, Raban Emunds, Jannis Blüml, Cedric Derstroff, Quentin Delfosse, Oleg Arenz, Kristian Kersting

Date:2026-03-06 18:55:15

Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed these challenges by encoding sparse objectives along with aligned plans. However, purely symbolic architectures are complex to scale and difficult to apply to continuous settings. Hence, we propose a hybrid approach, inspired by humans' ability to acquire new skills. We use a two-stage framework that injects symbolic structure into neural-based reinforcement learning agents without sacrificing the expressivity of deep policies. Our method, called Hybrid Hierarchical RL (H^2RL), introduces a logical option-based pretraining strategy to steer the learning policy away from short-term reward loops and toward goal-directed behavior while allowing the final policy to be refined via standard environment interaction. Empirically, we show that this approach consistently improves long-horizon decision-making and yields agents that outperform strong neural, symbolic, and neuro-symbolic baselines.

Hierarchical Industrial Demand Forecasting with Temporal and Uncertainty Explanations

Authors:Harshavardhan Kamarthi, Shangqing Xu, Xinjie Tong, Xingyu Zhou, James Peters, Joseph Czyzyk, B. Aditya Prakash

Date:2026-03-06 18:44:37

Hierarchical time-series forecasting is essential for demand prediction across various industries. While machine learning models have obtained significant accuracy and scalability on such forecasting tasks, the interpretability of their predictions, informed by application, is still largely unexplored. To bridge this gap, we introduce a novel interpretability method for large hierarchical probabilistic time-series forecasting, adapting generic interpretability techniques while addressing challenges associated with hierarchical structures and uncertainty. Our approach offers valuable interpretative insights in response to real-world industrial supply chain scenarios, including 1) the significance of various time-series within the hierarchy and external variables at specific time points, 2) the impact of different variables on forecast uncertainty, and 3) explanations for forecast changes in response to modifications in the training dataset. To evaluate the explainability method, we generate semi-synthetic datasets based on real-world scenarios of explaining hierarchical demands for over ten thousand products at a large chemical company. The experiments showed that our explainability method successfully explained state-of-the-art industrial forecasting methods with significantly higher explainability accuracy. Furthermore, we provide multiple real-world case studies that show the efficacy of our approach in identifying important patterns and explanations that help stakeholders better understand the forecasts. Additionally, our method facilitates the identification of key drivers behind forecasted demand, enabling more informed decision-making and strategic planning. Our approach helps build trust and confidence among users, ultimately leading to better adoption and utilization of hierarchical forecasting models in practice.

Unified Learning of Temporal Task Structure and Action Timing for Bimanual Robot Manipulation

Authors:Christian Dreher, Patrick Dormanns, Andre Meixner, Tamim Asfour

Date:2026-03-06 18:25:42

Temporal task structure is fundamental for bimanual manipulation: a robot must not only know that one action precedes or overlaps another, but also when each action should occur and how long it should take. While symbolic temporal relations enable high-level reasoning about task structure and alternative execution sequences, concrete timing parameters are equally essential for coordinating two hands at the execution level. Existing approaches address these two levels in isolation, leaving a gap between high-level task planning and low-level movement synchronization. This work presents an approach for learning both symbolic and subsymbolic temporal task constraints from human demonstrations and deriving executable, temporally parametrized plans for bimanual manipulation. Our contributions are (i) a 3-dimensional representation of timings between two actions with methods based on multivariate Gaussian Mixture Models to represent temporal relationships between actions on a subsymbolic level, (ii) a method based on the Davis-Putnam-Logemann-Loveland (DPLL) algorithm that finds and ranks all contradiction-free assignments of Allen relations to action pairs, representing different modes of a task, and (iii) an optimization-based planning system that combines the identified symbolic and subsymbolic temporal task constraints to derive temporally parametrized plans for robot execution. We evaluate our approach on several datasets, demonstrating that our method generates temporally parametrized plans closer to human demonstrations than the most characteristic demonstration baseline.

SG-DOR: Learning Scene Graphs with Direction-Conditioned Occlusion Reasoning for Pepper Plants

Authors:Rohit Menon, Niklas Mueller-Goldingen, Sicong Pan, Gokul Krishna Chenchani, Maren Bennewitz

Date:2026-03-06 17:52:51

Robotic harvesting in dense crop canopies requires effective interventions that depend not only on geometry, but also on explicit, direction-conditioned relations identifying which organs obstruct a target fruit. We present SG-DOR (Scene Graphs with Direction-Conditioned Occlusion Reasoning), a relational framework that, given instance-segmented organ point clouds, infers a scene graph encoding physical attachments and direction-conditioned occlusion. We introduce an occlusion ranking task for retrieving and ranking candidate leaves for a target fruit and approach direction, and propose a direction-aware graph neural architecture with per-fruit leaf-set attention and union-level aggregation. Experiments on a multi-plant synthetic pepper dataset show improved occlusion prediction (F1=0.73, NDCG@3=0.85) and attachment inference (edge F1=0.83) over strong ablations, yielding a structured relational signal for downstream intervention planning.

Construction and Science of SURF

Authors:Jaret Heise

Date:2026-03-06 17:37:00

The Sanford Underground Research Facility (SURF) began operation in 2007 as a facility dedicated to advancing compelling multidisciplinary scientific research. SURF is one of the deepest laboratory sites and offers the largest footprint in the world for scientific pursuits, including physics campuses on the 4850-foot level where the LUX-ZEPLIN, MAJORANA DEMONSTRATOR, and CASPAR experiments are located. SURF is also home to the Long-Baseline Neutrino Facility (LBNF) that will host the international Deep Underground Neutrino Experiment (DUNE). SURF provides ultra-low background environments, low-background assay capabilities, and electroformed copper is produced at the facility. In this review, we discuss the history, features and status of the facility, as well as the current scientific program and future evolution and plans.

Control Barrier Corridors: From Safety Functions to Safe Sets

Authors:Ömür Arslan, Nikolay Atanasov

Date:2026-03-06 17:23:56

Safe autonomy is a critical requirement and a key enabler for robots to operate safely in unstructured complex environments. Control barrier functions and safe motion corridors are two widely used but technically distinct safety methods, functional and geometric, respectively, for safe motion planning and control. Control barrier functions are applied to the safety filtering of control inputs to limit the decay rate of system safety, whereas safe motion corridors are geometrically constructed to define a local safe zone around the system state for use in motion optimization and reference-governor design. This paper introduces a new notion of control barrier corridors, which unifies these two approaches by converting control barrier functions into local safe goal regions for reference goal selection in feedback control systems. We show, with examples on fully actuated systems, kinematic unicycles, and linear output regulation systems, that individual state safety can be extended locally over control barrier corridors for convex barrier functions, provided the control convergence rate matches the barrier decay rate, highlighting a trade-off between safety and reactiveness. Such safe control barrier corridors enable safely reachable persistent goal selection over continuously changing barrier corridors during system motion, which we demonstrate for verifiably safe and persistent path following in autonomous exploration of unknown environments.

How students use generative AI for computational modeling in physics

Authors:Karl Henrik Fredly, Tor Ole Odden, Benjamin M. Zwickl

Date:2026-03-06 14:48:04

Generative artificial intelligence (genAI) is becoming increasingly prevalent and capable in physics, particularly for programming-related tasks. How, then, does genAI affect students' computational modeling? We interviewed 19 students who had recently completed an open-ended computational assignment that encouraged the use of genAI, asking them how they used it. We then conducted a thematic analysis of these interviews using a framework for computational modeling in physics. We found that genAI significantly impacts several aspects of students' computational modeling, such as the planning, implementing, and debugging of computational models. GenAI can also help students find resources and introduce them to new computational tools. Productive use of genAI was associated with students limiting its use to small steps in the modeling process and consistently double-checking the formulas, explanations, and code it provided. We also identified challenges students faced due to an over-reliance on genAI, such as working from false model assumptions and not learning the fundamentals of computational modeling, especially debugging. Finally, we discuss implications for teaching, such as the need to teach students how to use genAI productively and to urge them to plan before they code. We also highlight the continued value of low-stakes assessment and teaching assistants for teaching computational modeling, as the task remains difficult even with the introduction of genAI.

AI End-to-End Radiation Treatment Planning Under One Second

Authors:Simon Arberet, Riqiang Gao, Martin Kraus, Florin C. Ghesu, Wilko Verbakel, Mamadou Diallo, Anthony Magliari, Venkatesan Karuppusamy, Sushil Beriwal, REQUITE Consortium, Ali Kamen, Dorin Comaniciu

Date:2026-03-06 14:45:44

Artificial intelligence-based radiation therapy (RT) planning has the potential to reduce planning time and inter-planner variability, improving efficiency and consistency in clinical workflows. Most existing automated approaches rely on multiple dose evaluations and corrections, resulting in plan generation times of several minutes. We introduce AIRT (Artificial Intelligence-based Radiotherapy), an end-to-end deep-learning framework that directly infers deliverable treatment plans from CT images and structure contours. AIRT generates single-arc VMAT prostate plans, from imaging and anatomical inputs to leaf sequencing, in under one second on a single Nvidia A100 GPU. The framework includes a differentiable dose feedback, an adversarial fluence map shaping, and a plan generation augmentation to improve plan quality and robustness. The model was trained on more than 10,000 intact prostate cases. Non-inferiority to RapidPlan Eclipse was demonstrated across target coverage and OAR sparing metrics. Target homogeneity (HI = 0.10 $\pm$ 0.01) and OAR sparing were similar to reference plans when evaluated using AcurosXB. These results represent a significant step toward ultra-fast standardized RT planning and a streamlined clinical workflow.

Sparse probabilistic evaluation for treatment planning: a feasibility study in IMPT head & neck patients

Authors:Jenneke I. de Jong, Steven J. M. Habraken, Albin Fredriksson, Johan Sundström, Erik Engwall, Sebastiaan Breedveld, Mischa S. Hoogeman

Date:2026-03-06 14:16:38

Probabilistic evaluation improves the trade-off between target coverage and OAR sparing in IMPT but remains computationally demanding. This study proposes sparse probabilistic evaluation (SPE), a computationally efficient approach integrated into a clinical TPS. Clinical plans of 20 IMPT HNC patients treated in 2024 were included. SPE used a predefined setup and range error grid with Monte Carlo computed dose distributions. Two grid settings were evaluated: the maximum error Emax (3$σ$ or 4$σ$) and the number of setup error points nsetup (7, 33, 123). Accuracy and duration of SPE with each grid were evaluated in the calibration group (5 patients). 1000 treatments with normally distributed random ($σ$ = 1 mm) and systematic ($σ$ = 0.92 mm) setup and range ($σ$ = 1.5%) errors were simulated. The dose distribution of the nearest error point in the grid was assigned to each fraction. Probability distributions derived from SPE were compared with those from a reference based on 35,000 Monte Carlo calculations. The found optimal grid (Emax = 3$σ$, nsetup = 33) was applied to the validation group (15 patients). Accuracy of SPE in the calibration group increased significantly as the number of error points increased from 7 (tavg = 2 minutes) to 33 (tavg = 9 minutes), with no further improvement between 33 and 123 (tavg = 27 minutes) error points. Increasing Emax only improved accuracy for values above the 98th percentile. Applying SPE to the validation group resulted in median errors of 0.02 Gy RBE (range:-0.11 to 0.07) for the 10th percentile of the D99.8%, CTV distribution and 0.0 Gy RBE (range:-0.14 to 0.23) for the 95th percentile of the D0.03cc,SpinalCord Core distribution. Sparse probabilistic evaluation achieves sufficient accuracy while requiring clinically acceptable computation times, paving the way for probabilistic evaluation in clinical practice.

Latin American HECAP Physics Briefing Book 2025

Authors:Mario A. Acero, Alexis A. Aguilar-Arevalo, Belén Andrada, Andrés Baquero Larriva, Mauro Cambiaso, Edgar Carrera, Melissa Cruz, Lucía Duarte, Juan Estrada, Alberto Gago, Esteban Jimenez, Diana López Nacir, José A. López, Marta Losada, Fernando Monticelli, Deywis Moreno, Martjin Mulders, Luis A. Núñez, Arturo S. Pineda, Juan Ponciano, Farinaldo Queiroz, Rogerio Rosenfeld, Sandro F. de Souza, Martin Alfonso Subieta Vasquez, Maria Elena Tejeda-Yeomans, Luis Ureña, Alfonso Zerwekh

Date:2026-03-06 13:49:36

The first process for the Latin American Strategy Forum for Research Infrastructure for High Energy, Cosmology and Astroparticle Physics (LASF4RI-HECAP) came to a conclusion in October 2020, with a Physics Briefing Book (PBB) presented in (2104.06852). Here we present an updated PBB, the result of the first update of LASF4RI-HECAP. The update process began with a call for White Papers from the HECAP community. The submitted contributions were presented at the III LASF4RI for HECAP Symposium: Update of the Strategic Plan, held at ICTP-SAIFR in São Paulo in August 26-29, 2024, with the participation of the Preparatory Group, High Level Strategy Group, Funding Agencies and representatives of similar efforts from around the globe. This updated PBB was written by the Preparatory Group based mainly on 46 White Papers submitted by the community and is organized around seven working groups: Astronomy, Astrophysics and Astroparticle Physics; Cosmology; Dark Matter; Neutrinos; Electroweak and Strong Interactions, Higgs Physics, CP and Flavour Physics and BSM; Instrumentation and Computing; Advanced Training and Capacity Building. It is intended to provide the essential input for the creation of a long-term HECAP strategy in the region.

Optimizing Complex Health Intervention Packages through the Learn-As-you-GO (LAGO) Design

Authors:Donna Spiegelman, Dong Roman Xu, Ante Bing, Guangyu Tong, Mona Abdo, Jingyu Cui, Charles Goss, John Baptist Kiggundu, Chris T. Longenecker, LaRon Nelson, Drew Cameron, Fred Semitala, Xin Zhou, Judith J. Lok

Date:2026-03-06 13:45:18

In the face of vast numbers of preventable deaths worldwide and gaping disparities in their distribution, we cannot afford to conduct null and inconclusive effectiveness and implementation trials of evidence-based interventions. The gold standard in biomedical research, the individually randomized clinical trial, is ill-suited as the primary tool for knowledge generation for contextually relevant, scalable, complex public health interventions of multi-component strategies. In this paper, we discuss the new Learn-As-you-GO (LAGO) design. In LAGO trials, the components of a complex intervention package are repeatedly optimized in pre-planned stages, until the package achieves its outcome and power goals with minimized cost and/or other optimization criteria, such as maximizing patient satisfaction. In this paper, the inputs to, and outputs of, LAGO are described, along with its general methodology. The methods are illustrated in the BetterBirth study, a large trial that aimed to reduce maternal and neonatal mortality in Uttar Pradesh, India, using the WHO essential birth practices checklist. Despite its scale, the BetterBirth study failed to demonstrate a significant effect of the intervention package on the primary health endpoint that included maternal mortality. We show how this unfortunate outcome could have been remedied had LAGO been used. LAGO is further illustrated through the discussion of several ongoing LAGO-informed implementation trials of HIV and non-communicable diseases in the United States and Sub-Saharan Africa. The Learn-As-you-GO (LAGO) design optimizes a complex, multi-level intervention for minimum cost, pre-specified power, and a pre-specified effectiveness goal, by adapting the intervention as the study is conducted, reducing risk of trial failure.

Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport

Authors:Miguel Costa, Arthur Vandervoort, Carolin Schmidt, João Miranda, Morten W. Petersen, Martin Drews, Karyn Morrisey, Francisco C. Pereira

Date:2026-03-06 13:38:06

Climate change is expected to intensify rainfall and, consequently, pluvial flooding, leading to increased disruptions in urban transportation systems over the coming decades. Designing effective adaptation strategies is challenging due to the long-term, sequential nature of infrastructure investments, deep climate uncertainty, and the complex interactions between flooding, infrastructure, and mobility impacts. In this work, we propose a novel decision-support framework using reinforcement learning (RL) for long-term flood adaptation planning. Formulated as an integrated assessment model (IAM), the framework combines rainfall projection and flood modeling, transport simulation, and quantification of direct and indirect impacts on infrastructure and mobility. Our RL-based approach learns adaptive strategies that balance investment and maintenance costs against avoided impacts. We evaluate the framework through a case study of Copenhagen's inner city over the 2024-2100 period, testing multiple adaptation options, and different belief and realized climate scenarios. Results show that the framework outperforms traditional optimization approaches by discovering coordinated spatial and temporal adaptation pathways and learning trade-offs between impact reduction and adaptation investment, yielding more resilient strategies. Overall, our results showcase the potential of reinforcement learning as a flexible decision-support tool for adaptive infrastructure planning under climate uncertainty.

HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models

Authors:Lincen Bai, Hedi Tabia, Raul Santos-Rodriguez

Date:2026-03-06 13:31:54

Pruning vision-language models (VLMs) for efficient deployment is challenging because compression can affect not only task utility but also visual grounding, often amplifying object hallucinations even at the same sparsity level. We present HiPP-Prune, a hierarchical preference-conditioned structured pruning framework that treats pruning as conditional resource allocation under multiple objectives. HiPP-Prune makes plan-level decisions: a single policy invocation outputs a global pruning blueprint by factorizing decisions into an overall sparsity budget and a layer-wise allocation, enabling queryable trade-offs via a user-specified preference vector. To account for VLM-specific failure modes, our policy state integrates a visual sensitivity signal derived from attention flow between vision tokens and language hidden states, discouraging over-pruning of vision-critical layers that facilitate cross-modal fusion. We optimize pruning plans with plan-level Group Relative Policy Optimization (GRPO) under a multi-objective return that combines task utility, hallucination robustness (POPE), compression, and a synaptic-flow-inspired stability proxy to reduce unproductive exploration in high-sparsity regimes. Experiments on LLaVA with POPE and ScienceQA demonstrate that HiPP-Prune discovers diverse non-dominated pruning plans and provides controllable robustness--utility trade-offs under matched sparsity budgets.

TaPD: Temporal-adaptive Progressive Distillation for Observation-Adaptive Trajectory Forecasting in Autonomous Driving

Authors:Mingyu Fan, Yi Liu, Hao Zhou, Deheng Qian, Mohammad Haziq Khan, Matthias Raetsch

Date:2026-03-06 12:51:32

Trajectory prediction is essential for autonomous driving, enabling vehicles to anticipate the motion of surrounding agents to support safe planning. However, most existing predictors assume fixed-length histories and suffer substantial performance degradation when observations are variable or extremely short in real-world settings (e.g., due to occlusion or a limited sensing range). We propose TaPD (Temporal-adaptive Progressive Distillation), a unified plug-and-play framework for observation-adaptive trajectory forecasting under variable history lengths. TaPD comprises two cooperative modules: an Observation-Adaptive Forecaster (OAF) for future prediction and a Temporal Backfilling Module (TBM) for explicit reconstruction of the past. OAF is built on progressive knowledge distillation (PKD), which transfers motion pattern knowledge from long-horizon "teachers" to short-horizon "students" via hierarchical feature regression, enabling short observations to recover richer motion context. We further introduce a cosine-annealed distillation weighting scheme to balance forecasting supervision and feature alignment, improving optimization stability and cross-length consistency. For extremely short histories where implicit alignment is insufficient, TBM backfills missing historical segments conditioned on scene evolution, producing context-rich trajectories that strengthen PKD and thereby improve OAF. We employ a decoupled pretrain-reconstruct-finetune protocol to preserve real-motion priors while adapting to backfilled inputs. Extensive experiments on Argoverse 1 and Argoverse 2 show that TaPD consistently outperforms strong baselines across all observation lengths, delivers especially large gains under very short inputs, and improves other predictors (e.g., HiVT) in a plug-and-play manner. Code will be available at https://github.com/zhouhao94/TaPD.

Topological descriptors of foot clearance gait dynamics improve differential diagnosis of Parkinsonism

Authors:Jhonathan Barrios, Wolfram Erlhagen, Miguel F. Gago, Estela Bicho, Flora Ferreira

Date:2026-03-06 12:29:05

Differential diagnosis among parkinsonian syndromes remains a clinical challenge due to overlapping motor symptoms and subtle gait abnormalities. Accurate differentiation is crucial for treatment planning and prognosis. While gait analysis is a well established approach for assessing motor impairments, conventional methods often overlook hidden nonlinear and structural features embedded in foot clearance patterns. We evaluated Topological Data Analysis (TDA) as a complementary tool for Parkinsonism classification using foot clearance time series. Persistent homology produced Betti curves, persistence landscapes, and silhouettes, which were used as features for a Random Forest classifier. The dataset comprised 15 controls (CO), 15 idiopathic Parkinson's disease (IPD), and 14 vascular Parkinsonism (VaP). Models were assessed with leave-one-out cross-validation (LOOCV). Betti-curve descriptors consistently yielded the strongest results. For IPD vs VaP, foot clearance variables minimum toe clearance, maximum toe late swing, and maximum heel clearance achieved 83% accuracy and AUC=0.89 under LOOCV in the medicated (On) state. Performance improved in the On state and further when both Off and On states were considered, indicating sensitivity of the topological features to levodopa related gait changes. These findings support integrating TDA with machine learning to improve clinical gait analysis and aid differential diagnosis across parkinsonian disorders.

DreamToNav: Generalizable Navigation for Robots via Generative Video Planning

Authors:Valerii Serpiva, Jeffrin Sam, Chidera Simon, Hajira Amjad, Iana Zhura, Artem Lykov, Dzmitry Tsetserukou

Date:2026-03-06 11:57:10

We present DreamToNav, a novel autonomous robot framework that uses generative video models to enable intuitive, human-in-the-loop control. Instead of relying on rigid waypoint navigation, users provide natural language prompts (e.g. ``Follow the person carefully''), which the system translates into executable motion. Our pipeline first employs Qwen 2.5-VL-7B-Instruct to refine vague user instructions into precise visual descriptions. These descriptions condition NVIDIA Cosmos 2.5, a state-of-the-art video foundation model, to synthesize a physically consistent video sequence of the robot performing the task. From this synthetic video, we extract a valid kinematic path using visual pose estimation, robot detection and trajectory recovery. By treating video generation as a planning engine, DreamToNav allows robots to visually "dream" complex behaviors before executing them, providing a unified framework for obstacle avoidance and goal-directed navigation without task-specific engineering. We evaluate the approach on both a wheeled mobile robot and a quadruped robot in indoor navigation tasks. DreamToNav achieves a success rate of 76.7%, with final goal errors typically within 0.05-0.10 m and trajectory tracking errors below 0.15 m. These results demonstrate that trajectories extracted from generative video predictions can be reliably executed on physical robots across different locomotion platforms.

Multimodal Behavior Tree Generation: A Small Vision-Language Model for Robot Task Planning

Authors:Cristiano Battistini, Riccardo Andrea Izzo, Gianluca Bardaro, Matteo Matteucci

Date:2026-03-06 09:36:29

Large and small language models have been widely used for robotic task planning. At the same time, vision-language models (VLMs) have successfully tackled problems such as image captioning, scene understanding, and visual question answering. In this work, we combine these two approaches by deploying a compact, open-source multimodal model to generate behavior trees for robotic task planning. The main obstacle to achieving this goal is the lack of an existing dataset that links visual observations and instructions to executable behavior trees. We propose a method to construct such a dataset starting from existing robotic episodes (i.e., Open X-Embodiment), in which a large model serves as a teacher in a multi-stage generation pipeline. We use this dataset to fine-tune VLMs ranging from 500M to 4B parameters via parameter-efficient fine-tuning (PEFT). The generated behavior trees, compatible with the BehaviorTree.CPP library, are evaluated both offline, using structural and lexical metrics, and online through the execution of household tasks in a state-of-the-art embodied simulator. Our results demonstrate that our fine-tuned 4B-parameter VLM approaches the performance of state-of-the-art closed-source models, achieving an 87\% success rate while requiring only a fraction of the computational resources.

TADPO: Reinforcement Learning Goes Off-road

Authors:Zhouchonghao Wu, Raymond Song, Vedant Mundheda, Luis E. Navarro-Serment, Christof Schoenborn, Jeff Schneider

Date:2026-03-06 07:55:01

Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Reinforcement Learning (RL) offers a promising solution by learning control policies directly from interaction. However, because off-road driving is a long-horizon task with low-signal rewards, standard RL methods are challenging to apply in this setting. We introduce TADPO, a novel policy gradient formulation that extends Proximal Policy Optimization (PPO), leveraging off-policy trajectories for teacher guidance and on-policy trajectories for student exploration. Building on this, we develop a vision-based, end-to-end RL system for high-speed off-road driving, capable of navigating extreme slopes and obstacle-rich terrain. We demonstrate our performance in simulation and, importantly, zero-shot sim-to-real transfer on a full-scale off-road vehicle. To our knowledge, this work represents the first deployment of RL-based policies on a full-scale off-road platform.

Moving Through Clutter: Scaling Data Collection and Benchmarking for 3D Scene-Aware Humanoid Locomotion via Virtual Reality

Authors:Beichen Wang, Yuanjie Lu, Linji Wang, Liuchuan Yu, Xuesu Xiao

Date:2026-03-06 07:54:01

Recent advances in humanoid locomotion have enabled dynamic behaviors such as dancing, martial arts, and parkour, yet these capabilities are predominantly demonstrated in open, flat, and obstacle-free settings. In contrast, real-world environments such as homes, offices, and public spaces, are densely cluttered, three-dimensional, and geometrically constrained, requiring scene-aware whole-body coordination, precise balance control, and reasoning over spatial constraints imposed by furniture and household objects. However, humanoid locomotion in cluttered 3D environments remains underexplored, and no public dataset systematically couples full-body human locomotion with the scene geometry that shapes it. To address this gap, we present Moving Through Clutter (MTC), an opensource Virtual Reality (VR) based data collection and evaluation framework for scene-aware humanoid locomotion in cluttered environments. Our system procedurally generates scenes with controllable clutter levels and captures embodiment-consistent, whole-body human motion through immersive VR navigation, which is then automatically retargeted to a humanoid robot model. We further introduce benchmarks that quantify environment clutter level and locomotion performance, including stability and collision safety. Using this framework, we compile a dataset of 348 trajectories across 145 diverse 3D cluttered scenes. The dataset provides a foundation for studying geometry-induced adaptation in humanoid locomotion and developing scene-aware planning and control methods.

OD-RASE: Ontology-Driven Risk Assessment and Safety Enhancement for Autonomous Driving

Authors:Kota Shimomura, Masaki Nambata, Atsuya Ishikawa, Ryota Mimura, Takayuki Kawabuchi, Takayoshi Yamashita, Koki Inoue

Date:2026-03-06 06:08:40

Although autonomous driving systems demonstrate high perception performance, they still face limitations when handling rare situations or complex road structures. Such road infrastructures are designed for human drivers, safety improvements are typically introduced only after accidents occur. This reactive approach poses a significant challenge for autonomous systems, which require proactive risk mitigation. To address this issue, we propose OD-RASE, a framework for enhancing the safety of autonomous driving systems by detecting road structures that cause traffic accidents and connecting these findings to infrastructure development. First, we formalize an ontology based on specialized domain knowledge of road traffic systems. In parallel, we generate infrastructure improvement proposals using a large-scale visual language model (LVLM) and use ontology-driven data filtering to enhance their reliability. This process automatically annotates improvement proposals on pre-accident road images, leading to the construction of a new dataset. Furthermore, we introduce the Baseline approach (OD-RASE model), which leverages LVLM and a diffusion model to produce both infrastructure improvement proposals and generated images of the improved road environment. Our experiments demonstrate that ontology-driven data filtering enables highly accurate prediction of accident-causing road structures and the corresponding improvement plans. We believe that this work contributes to the overall safety of traffic environments and marks an important step toward the broader adoption of autonomous driving systems.

Iterative Convex Optimization with Control Barrier Functions for Obstacle Avoidance among Polytopes

Authors:Shuo Liu, Zhe Huang, Calin A. Belta

Date:2026-03-06 05:10:44

Obstacle avoidance of polytopic obstacles by polytopic robots is a challenging problem in optimization-based control and trajectory planning. Many existing methods rely on smooth geometric approximations, such as hyperspheres or ellipsoids, which allow differentiable distance expressions but distort the true geometry and restrict the feasible set. Other approaches integrate exact polytope distances into nonlinear model predictive control (MPC), resulting in nonconvex programs that limit real-time performance. In this paper, we construct linear discrete-time control barrier function (DCBF) constraints by deriving supporting hyperplanes from exact closest-point computations between convex polytopes. We then propose a novel iterative convex MPC-DCBF framework, where local linearization of system dynamics and robot geometry ensures convexity of the finite-horizon optimization at each iteration. The resulting formulation reduces computational complexity and enables fast online implementation for safety-critical control and trajectory planning of general nonlinear dynamics. The framework extends to multi-robot and three-dimensional environments. Numerical experiments demonstrate collision-free navigation in cluttered maze scenarios with millisecond-level solve times.

TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis

Authors:Sijing Li, Zhongwei Qiu, Jiang Liu, Wenqiao Zhang, Tianwei Lin, Yihan Xie, Jianxiang An, Boxiang Yun, Chenglin Yang, Jun Xiao, Guangyu Guo, Jiawen Yao, Wei Liu, Yuan Gao, Ke Yan, Weiwei Cao, Zhilin Zheng, Tony C. W. Mok, Kai Cao, Yu Shi, Jiuyu Zhang, Jian Zhou, Beng Chin Ooi, Yingda Xia, Ling Zhang

Date:2026-03-06 03:42:22

Accurate tumor analysis is central to clinical radiology and precision oncology, where early detection, reliable lesion characterization, and pathology-level risk assessment guide diagnosis and treatment planning. Chain-of-Thought (CoT) reasoning is particularly important in this setting because it enables step-by-step interpretation from imaging findings to clinical impressions and pathology conclusions, improving traceability and reducing diagnostic errors. Here, we target the clinical tumor analysis task and build a large-scale benchmark that operationalizes a multimodal reasoning pipeline, spanning findings, impressions, and pathology predictions. We curate TumorCoT, a large-scale dataset of 1.5M CoT-labeled VQA instructions paired with 3D CT scans, with step-aligned rationales and cross-modal alignments along the trajectory from findings to impression to pathology, enabling evaluation of both answer accuracy and reasoning consistency. We further propose TumorChain, a multimodal interleaved reasoning framework that tightly couples 3D imaging encoders, clinical text understanding, and organ-level vision-language alignment. Through cross-modal alignment and iterative interleaved causal reasoning, TumorChain grounds visual evidence, aggregates conclusions, and issues pathology predictions after multiple rounds of self-refinement, improving traceability and reducing hallucination risk. Experiments show consistent improvements over strong baselines in lesion detection, impression generation, and pathology classification, and demonstrate strong generalization on the DeepTumorVQA benchmark. These results highlight the potential of multimodal reasoning for reliable and interpretable tumor analysis in clinical practice. Detailed information about our project can be found on our project homepage at https://github.com/ZJU4HealthCare/TumorChain.

CBCT-Based Synthetic CT Generation Using Conditional Flow Matching Model

Authors:Junbo Peng, Huiqiao Xie, Tonghe Wang, Xiangyang Tang, Xiaofeng Yang

Date:2026-03-06 01:04:47

Daily or weekly cone-beam computed tomography (CBCT) is employed in image-guided radiotherapy (IGRT) for precise patient alignment. However, its clinical utility in quantitative tasks is hindered by severe artifacts and inaccurate Hounsfeld unit (HU). It is essential to enhance CBCT image quality to a level comparable with that of conventional CT scans. This study proposed a conditional flow matching model that gradually transforms a sample from normal distribution to the corresponding CT sample conditioned on the input CBCT image. The proposed model was trained using CBCT and deformed planning CT (dpCT) image pairs in a supervised learning scheme. The feasibility of the conditional flow matching model was verified using studies of brain, head-and-neck (HN), and lung patients. The quantitative performance was evaluated using three metrics, including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and normalized cross-correlation (NCC). The proposed flow matching model was also compared to other flow matching and diffusion-based generative models for sCT generation. The proposed flow matching model effectively reduced multiple types of artifacts on CBCT images in all the studies. In the study of brain patient, the MAE, PSNR, and NCC of the sCT were improved to 26.02 HU, 32.35 dB, and 0.99, respectively, from 40.63 HU, 27.87 dB, and 0.98 on the CBCT images. In the study of HN patient, the metrics were improved to 33.17 HU, 28.68 dB, 0.98 from 38.99 HU, 27.00 dB, 0.98. In the lung patient study, the metrics were 25.09 HU, 32.81 dB, 0.99 and 32.90 HU, 30.48 dB, 0.98 for sCT and CBCT, respectively. The proposed conditional flow matching model effectively synthesizes high-quality CT-like images from CBCT, achieving accurate HU representation and artifact reduction. This enables more reliable organ segmentation and dose calculation in CBCT-guided online ART workflows.

Multi-Robot Trajectory Planning via Constrained Bayesian Optimization and Local Cost Map Learning with STL-Based Conflict Resolution

Authors:Sourav Raxit, Abdullah Al Redwan Newaz, Jose Fuentes, Paulo Padrao, Ana Cavalcanti, Leonardo Bobadilla

Date:2026-03-06 00:03:18

We address multi-robot motion planning under Signal Temporal Logic (STL) specifications with kinodynamic constraints. Exact approaches face scalability bottlenecks and limited adaptability, while conventional sampling-based methods require excessive samples to construct optimal trajectories. We propose a two-stage framework integrating sampling-based online learning with formal STL reasoning. At the single-robot level, our constrained Bayesian Optimization-based Tree search (cBOT) planner uses a Gaussian process as a surrogate model to learn local cost maps and feasibility constraints, generating shorter collision-free trajectories with fewer samples. At the multi-robot level, our STL-enhanced Kinodynamic Conflict-Based Search (STL-KCBS) algorithm incorporates STL monitoring into conflict detection and resolution, ensuring specification satisfaction while maintaining scalability and probabilistic completeness. Benchmarking demonstrates improved trajectory efficiency and safety over existing methods. Real-world experiments with autonomous surface vehicles validate robustness and practical applicability in uncertain environments. The STLcBOT Planner will be released as an open-source package, and videos of real-world and simulated experiments are available at https://stlbot.github.io/.

Environment-Aware Path Generation for Robotic Additive Manufacturing of Structures

Authors:Mahsa Rabiei, Reza Moini

Date:2026-03-05 23:20:15

Robotic Additive Manufacturing (AM) has emerged as a scalable and customizable construction method in the last decade. However, current AM design methods rely on pre-conceived (A priori) toolpath of the structure, often developed via offline slicing software. Moreover, considering the dynamic construction environments involving obstacles on terrestrial and extraterrestrial environments, there is a need for online path generation methods. Here, an environment-aware path generation framework (PGF) is proposed for the first time in which structures are designed in an online fashion by utilizing four path planning (PP) algorithms (two search-based and two sampling-based). To evaluate the performance of the proposed PGF in different obstacle arrangements (periodic, random) for two types of structures (closed and open), structural (path roughness, turns, offset, Root Mean Square Error (RMSE), deviation) and computational (run time) performance metrics are developed. Most challenging environments (i.e., dense with high number of obstacles) are considered to saturate the feasibility limits of PP algorithms. The capability of each of the four path planners used in the PGF in finding a feasible path is assessed. Finally, the effectiveness of the proposed structural performance metrics is evaluated individually and comparatively, and most essential metrics necessary for evaluation of toolpath of the resulting structures are prescribed. Consequently, the most promising path planners in challenging environments are identified for robotic additive manufacturing applications.

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Authors:Dongwon Kim, Gawon Seo, Jinsung Lee, Minsu Cho, Suha Kwak

Date:2026-03-05 18:00:02

World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning remains computationally prohibitive for real-time control. A key bottleneck lies in latent representations: conventional tokenizers encode each observation into hundreds of tokens, making planning both slow and resource-intensive. To address this, we propose CompACT, a discrete tokenizer that compresses each observation into as few as 8 tokens, drastically reducing computational cost while preserving essential information for planning. An action-conditioned world model that occupies CompACT tokenizer achieves competitive planning performance with orders-of-magnitude faster planning, offering a practical step toward real-world deployment of world models.

A Novel Hybrid Heuristic-Reinforcement Learning Optimization Approach for a Class of Railcar Shunting Problems

Authors:Ruonan Zhao, Joseph Geunes

Date:2026-03-05 17:49:17

Railcar shunting is a core planning task in freight railyards, where yard planners need to disassemble and reassemble groups of railcars to form outbound trains. Classification tracks with access from one side only can be considered as stack structures, where railcars are added and removed from only one end, leading to a last-in-first-out (LIFO) retrieval order. In contrast, two-sided tracks function like queue structures, allowing railcars to be added from one end and removed from the opposite end, following a first-in-first-out (FIFO) order. We consider a problem requiring assembly of multiple outbound trains using two locomotives in a railyard with two-sided classification track access. To address this combinatorially challenging problem class, we decompose the problem into two subproblems, each with one-sided classification track access and a locomotive on each side. We present a novel Hybrid Heuristic-Reinforcement Learning (HHRL) framework that integrates railway-specific heuristic solution approaches with a reinforcement learning method, specifically Q-learning. The proposed framework leverages methods to decrease the state-action space and guide exploration during reinforcement learning. The results of a series of numerical experiments demonstrate the efficiency and quality of the HHRL algorithm in both one-sided access, single-locomotive problems and two-sided access, two-locomotive problems.

The Spatial and Temporal Resolution of Motor Intention in Multi-Target Prediction

Authors:Marie Dominique Schmidt, Ioannis Iossifidis

Date:2026-03-05 17:40:30

Reaching for grasping, and manipulating objects are essential motor functions in everyday life. Decoding human motor intentions is a central challenge for rehabilitation and assistive technologies. This study focuses on predicting intentions by inferring movement direction and target location from multichannel electromyography (EMG) signals, and investigating how spatially and temporally accurate such information can be detected relative to movement onset. We present a computational pipeline that combines data-driven temporal segmentation with classical and deep learning classifiers in order to analyse EMG data recorded during the planning, early execution, and target contact phases of a delayed reaching task. Early intention prediction enables devices to anticipate user actions, improving responsiveness and supporting active motor recovery in adaptive rehabilitation systems. Random Forest achieves $80\%$ accuracy and Convolutional Neural Network $75\%$ accuracy across $25$ spatial targets, each separated by $14^\circ$ azimuth/altitude. Furthermore, a systematic evaluation of EMG channels, feature sets, and temporal windows demonstrates that motor intention can be efficiently decoded even with drastically reduced data. This work sheds light on the temporal and spatial evolution of motor intention, paving the way for anticipatory control in adaptive rehabilitation systems and driving advancements in computational approaches to motor neuroscience.

Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned

Authors:Nghi D. Q. Bui

Date:2026-03-05 16:21:08

The landscape of AI coding assistance is undergoing a fundamental shift from complex IDE plugins to versatile, terminal-native agents. Operating directly where developers manage source control, execute builds, and deploy environments, CLI-based agents offer unprecedented autonomy for long-horizon development tasks. In this paper, we present OPENDEV, an open-source, command-line coding agent engineered specifically for this new paradigm. Effective autonomous assistance requires strict safety controls and highly efficient context management to prevent context bloat and reasoning degradation. OPENDEV overcomes these challenges through a compound AI system architecture with workload-specialized model routing, a dual-agent architecture separating planning from execution, lazy tool discovery, and adaptive context compaction that progressively reduces older observations. Furthermore, it employs an automated memory system to accumulate project-specific knowledge across sessions and counteracts instruction fade-out through event-driven system reminders. By enforcing explicit reasoning phases and prioritizing context efficiency, OPENDEV provides a secure, extensible foundation for terminal-first AI assistance, offering a blueprint for robust autonomous software engineering.

CT-Enabled Patient-Specific Simulation and Contact-Aware Robotic Planning for Cochlear Implantation

Authors:Lingxiao Xun, Gang Zheng, Alexandre Kruszewski, Renato Torres

Date:2026-03-05 16:13:59

Robotic cochlear-implant (CI) insertion requires precise prediction and regulation of contact forces to minimize intracochlear trauma and prevent failure modes such as locking and buckling. Aligned with the integration of advanced medical imaging and robotics for autonomous, precision interventions, this paper presents a unified CT-to-simulation pipeline for contact-aware insertion planning and validation. We develop a low-dimensional, differentiable Cosserat-rod model of the electrode array coupled with frictional contact and pseudo-dynamics regularization to ensure continuous stick-slip transitions. Patient-specific cochlear anatomy is reconstructed from CT imaging and encoded via an analytic parametrization of the scala-tympani lumen, enabling efficient and differentiable contact queries through closest-point projection. Based on a differentiated equilibrium-constraint formulation, we derive an online direction-update law under an RCM-like constraint that suppresses lateral insertion forces while maintaining axial advancement. Simulations and benchtop experiments validate deformation and force trends, demonstrating reduced locking/buckling risk and improved insertion depth. The study highlights how CT-based imaging enhances modeling, planning, and safety capabilities in robot-assisted inner-ear procedures.