planning - 2026-04-15

Toward Autonomous Long-Horizon Engineering for ML Research

Authors:Guoxin Chen, Jie Chen, Lei Chen, Jiale Zhao, Fanzhe Meng, Wayne Xin Zhao, Ruihua Song, Cheng Chen, Ji-Rong Wen, Kai Jia
Date:2026-04-14 17:55:16

Autonomous AI research has advanced rapidly, but long-horizon ML research engineering remains difficult: agents must sustain coherent progress across task comprehension, environment setup, implementation, experimentation, and debugging over hours or days. We introduce AiScientist, a system for autonomous long-horizon engineering for ML research built on a simple principle: strong long-horizon performance requires both structured orchestration and durable state continuity. To this end, AiScientist combines hierarchical orchestration with a permission-scoped File-as-Bus workspace: a top-level Orchestrator maintains stage-level control through concise summaries and a workspace map, while specialized agents repeatedly re-ground on durable artifacts such as analyses, plans, code, and experimental evidence rather than relying primarily on conversational handoffs, yielding thin control over thick state. Across two complementary benchmarks, AiScientist improves PaperBench score by 10.54 points on average over the best matched baseline and achieves 81.82 Any Medal% on MLE-Bench Lite. Ablation studies further show that File-as-Bus protocol is a key driver of performance, reducing PaperBench by 6.41 points and MLE-Bench Lite by 31.82 points when removed. These results suggest that long-horizon ML research engineering is a systems problem of coordinating specialized work over durable project state, rather than a purely local reasoning problem.

Closed-Form Characterization of Constrained Double-Integrator Optimal Control

Authors:Filippos N. Tzortzoglou, Logan E. Beaver, Andreas A. Malikopoulos
Date:2026-04-14 17:40:02

We present a framework for predicting human driving behavior in mixed traffic where connected and automated vehicles (CAVs) coexist with human-driven vehicles (HDVs), and validate it using an open-source virtual reality (VR) testbed. We estimate the time-shift parameter of Newell's car-following model for individual drivers using Bayesian linear regression and derive analytical expressions for the mean and variance of predicted trajectories. These predictions are integrated into an optimal control framework for CAV trajectory planning. To address the scarcity of mixed-traffic data, we develop a VR platform supporting realistic, multi-user driving scenarios and provide a reproducible experimental framework with a dedicated tutorial website requiring only MATLAB and Unreal Engine. Results show our approach enables efficient HDV predictions, while the VR platform offers an accessible environment for studying human behavior in mixed traffic.

Distributional Convergence of Empirical Entropic Optimal Transport and Statistical Applications

Authors:Santiago Arenas-Velilla, Axel Munk, Luis-Alberto Rodríguez
Date:2026-04-14 16:34:33

Recently, the statistical properties of empirical Entropic Optimal Transport (EOT) have attracted great interest, as this quantity has been shown to be useful for complex data analysis, among other reasons due to its computational efficiency. In several applications, it has been observed that the EOT plan provides valuable information beyond just the optimal value. For example, in cell biology, colocalization analysis based on the EOT plan has been introduced as a measure for quantification of spatial proximity of different protein assemblies. Despite recent progress in the analysis of its risk properties, a precise understanding of its statistical fluctuations to make it accessible for inference remains elusive to a large extent. In this paper, we derive asymptotic weak convergence result for a large class of functionals of the EOT plan, in which the colocalization process is included. The proof is based on Hadamard differentiability and the extended delta method. As an application, we obtain uniform confidence bands for colocalization curves and bootstrap consistency. Our theory is supported by simulation studies and is illustrated by real world data analysis from mitochondrial protein colocalization.

QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence

Authors:Zhichao Lin, Zhichao Liang, Gaoqiang Liu, Meng Xu, Baoyu Xiang, Jian Xu, Guanjun Jiang
Date:2026-04-14 15:17:21

As agentic foundation models continue to evolve, how to further improve their performance in vertical domains has become an important challenge. To this end, building upon Tongyi DeepResearch, a powerful agentic foundation model, we focus on the Chinese medical deep search scenario and propose QuarkMedSearch, systematically exploring a full-pipeline approach spanning medical multi-hop data construction, training strategies, and evaluation benchmarks to further push and assess its performance upper bound in vertical domains. Specifically, for data synthesis, to address the scarcity of deep search training data in the medical domain, we combine a large-scale medical knowledge graph with real-time online exploration to construct long-horizon medical deep search training data; for post-training, we adopt a two-stage SFT and RL training strategy that progressively enhances the model's planning, tool invocation, and reflection capabilities required for deep search, while maintaining search efficiency; for evaluation, we collaborate with medical experts to construct the QuarkMedSearch Benchmark through rigorous manual verification. Experimental results demonstrate that QuarkMedSearch achieves state-of-the-art performance among open-source models of comparable scale on the QuarkMedSearch Benchmark, while also maintaining strong competitiveness on general benchmarks.

VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Authors:Shengding Liu, Qiben Yan
Date:2026-04-14 14:50:43

Indoor fire disasters pose severe challenges to autonomous search and rescue due to dense smoke, high temperatures, and dynamically evolving indoor environments. In such time-critical scenarios, multi-agent cooperative navigation is particularly useful, as it enables faster and broader exploration than single-agent approaches. However, existing multi-agent navigation systems are primarily vision-based and designed for benign indoor settings, leading to significant performance degradation under fire-driven dynamic conditions. In this paper, we present VULCAN, a multi-agent cooperative navigation framework based on multi-modal perception and vision-language models (VLMs), tailored for indoor fire disaster response. We extend the Habitat-Matterport3D benchmark by simulating physically realistic fire scenarios, including smoke diffusion, thermal hazards, and sensor degradation. We evaluate representative multi-agent cooperative navigation baselines under both normal and fire-driven environments. Our results reveal critical failure modes of existing methods in fire scenarios and underscore the necessity of robust perception and hazard-aware planning for reliable multi-agent search and rescue.

A hierarchical spatial-aware algorithm with efficient reinforcement learning for human-robot task planning and allocation in production

Authors:Jintao Xue, Xiao Li, Nianmin Zhang
Date:2026-04-14 12:40:05

In advanced manufacturing systems, humans and robots collaborate to conduct the production process. Effective task planning and allocation (TPA) is crucial for achieving high production efficiency, yet it remains challenging in complex and dynamic manufacturing environments. The dynamic nature of humans and robots, particularly the need to consider spatial information (e.g., humans' real-time position and the distance they need to move to complete a task), substantially complicates TPA. To address the above challenges, we decompose production tasks into manageable subtasks. We then implement a real-time hierarchical human-robot TPA algorithm, including a high-level agent for task planning and a low-level agent for task allocation. For the high-level agent, we propose an efficient buffer-based deep Q-learning method (EBQ), which reduces training time and enhances performance in production problems with long-term and sparse reward challenges. For the low-level agent, a path planning-based spatially aware method (SAP) is designed to allocate tasks to the appropriate human-robot resources, thereby achieving the corresponding sequential subtasks. We conducted experiments on a complex real-time production process in a 3D simulator. The results demonstrate that our proposed EBQ&SAP method effectively addresses human-robot TPA problems in complex and dynamic production processes.

Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

Authors:Jintao Xue, Xiao Li, Nianmin Zhang
Date:2026-04-14 12:38:21

Human-robot collaborative manufacturing, a core aspect of Industry 5.0, emphasizes ergonomics to enhance worker well-being. This paper addresses the dynamic human-robot task planning and allocation (HRTPA) problem, which involves determining when to perform tasks and who should execute them to maximize efficiency while ensuring workers' physical fatigue remains within safe limits. The inclusion of fatigue constraints, combined with production dynamics, significantly increases the complexity of the HRTPA problem. Traditional fatigue-recovery models in HRTPA often rely on static, predefined hyperparameters. However, in practice, human fatigue sensitivity varies daily due to factors such as changed work conditions and insufficient sleep. To better capture this uncertainty, we treat fatigue-related parameters as inaccurate and estimate them online based on observed fatigue progression during production. To address these challenges, we propose PF-CD3Q, a safe reinforcement learning (safe RL) approach that integrates the particle filter with constrained dueling double deep Q-learning for real-time fatigue-predictive HRTPA. Specifically, we first develop PF-based estimators to track human fatigue and update fatigue model parameters in real-time. These estimators are then integrated into CD3Q by making task-level fatigue predictions during decision-making and excluding tasks that exceed fatigue limits, thereby constraining the action space and formulating the problem as a constrained Markov decision process (CMDP).

FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving

Authors:Baoyun Wang, Zhuoren Li, Ming Liu, Xinrui Zhang, Bo Leng, Lu Xiong
Date:2026-04-14 12:24:54

End-to-end diffusion planning has shown strong potential for autonomous driving, but the physical feasibility of generated trajectories remains insufficiently addressed. In particular, generated trajectories may exhibit local geometric irregularities, violate trajectory-level kinematic constraints, or deviate from the drivable area, indicating that the commonly used noise-centric formulation in diffusion planning is not yet well aligned with the trajectory space where feasibility is more naturally characterized. To address this issue, we propose FeaXDrive, a feasibility-aware trajectory-centric diffusion planning method for end-to-end autonomous driving. The core idea is to treat the clean trajectory as the unified object for feasibility-aware modeling throughout the diffusion process. Built on this trajectory-centric formulation, FeaXDrive integrates adaptive curvature-constrained training to improve intrinsic geometric and kinematic feasibility, drivable-area guidance within reverse diffusion sampling to enhance consistency with the drivable area, and feasibility-aware GRPO post-training to further improve planning performance while balancing trajectory-space feasibility. Experiments on the NAVSIM benchmark show that FeaXDrive achieves strong closed-loop planning performance while substantially improving trajectory-space feasibility. These findings highlight the importance of explicitly modeling trajectory-space feasibility in end-to-end diffusion planning and provide a step toward more reliable and physically grounded autonomous driving planners.

Pricing-Driven Resource Allocation in the Computing Continuum

Authors:Alejandro García-Fernández, Boris Sedlak, José Antonio Parejo, Pantelis Frangoudis, Antonio Ruiz-Cortés, Schahram Dustdar
Date:2026-04-14 12:12:14

Deploying applications across the computing continuum requires selecting infrastructure nodes from geographically distributed and heterogeneous environments while satisfying constraints (e.g., performance, location). This decision problem is an important facet of resource allocation. As infrastructures grow in scale and heterogeneity, the resulting decision space becomes inherently combinatorial. Existing approaches typically formulate this problem as a constrained optimization task using ad-hoc representations of infrastructure topologies and demand, which hinders generalization across solutions. In contrast, Software as a Service ecosystems address a structurally similar configuration problem through pricings -structures whose plans and add-ons implicitly define the configuration space of possible subscriptions. Building on this observation, this work explores the potential of pricings as general-purpose representations of configuration spaces, positioning them as a promising alternative for addressing configuration problems, such as resource allocation, across the computing continuum. To this end, the paper presents the following contributions: i) a pricing-based formulation of the resource allocation problem in the computing continuum, enabling infrastructure configuration spaces to be represented using pricings; ii) a workflow that leverages PRIME, a pricing analysis engine, to explore these spaces and compute cost-optimal deployments satisfying functional and non-functional constraints; iii) generation processes for synthetic infrastructure topologies and workload demands; and iv) a dataset comprising 9,600 precomputed resource allocation scenarios to support benchmarking.

A Comparison of Reinforcement Learning and Optimal Control Methods for Path Planning

Authors:Qiang Le, Yaguang Yang, Isaac E. Weintraub
Date:2026-04-14 11:55:15

Path-planning for autonomous vehicles in threat-laden environments is a fundamental challenge. While traditional optimal control methods can find ideal paths, the computational time is often too slow for real-time decision-making. To solve this challenge, we propose a method based on Deep Deterministic Policy Gradient (DDPG) and model the threat as a simple, circular `no-go' zone. A mission failure is claimed if the vehicle enters this `no-go' zone at any time or does not reach a neighborhood of the destination. The DDPG agent is trained to learn a direct mapping from its current state (position and velocity) to a series of feasible actions that guide the agent to safely reach its goal. A reward function and two neural networks, critic and actor, are used to describe the environment and guide the control efforts. The DDPG trains the agent to find the largest possible set of starting points (``feasible set'') wherein a safe path to the goal is guaranteed. This provides critical information for mission planning, showing beforehand whether a task is achievable from a given starting point, assisting pre-mission planning activities. The approach is validated in simulation. A comparison between the DDPG method and a traditional optimal control (pseudo-spectral) method is carried out. The results show that the learning-based agent may produce effective paths while being significantly faster, making it a better fit for real-time applications. However, there are areas (``infeasible set'') where the DDPG agent cannot find paths to the destination, and the paths in the feasible set may not be optimal. These preliminary results guide our future research: (1) improve the reward function to enlarge the DDPG feasible set, (2) examine the feasible set obtained by the pseudo-spectral method, and (3) investigate the arc-search IPM method for the path planning problem.

Scalable Trajectory Generation for Whole-Body Mobile Manipulation

Authors:Yida Niu, Xinhai Chang, Xin Liu, Ziyuan Jiao, Yixin Zhu
Date:2026-04-14 10:47:06

Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than those sufficient for fixed-base manipulation. Yet existing acquisition methods, including teleoperation and planning, are either labor-intensive or computationally prohibitive at scale. The core bottleneck is the lack of a scalable pipeline for generating large-scale, physically valid, coordinated trajectory data across diverse embodiments and environments. Here we introduce AutoMoMa, a GPU-accelerated framework that unifies AKR modeling, which consolidates base, arm, and object kinematics into a single chain, with parallelized trajectory optimization. AutoMoMa achieves 5,000 episodes per GPU-hour (over $80\times$ faster than CPU-based baselines), producing a dataset of over 500k physically valid trajectories spanning 330 scenes, diverse articulated objects, and multiple robot embodiments. Prior datasets were forced to compromise on scale, diversity, or kinematic fidelity; AutoMoMa addresses all three simultaneously. Training downstream IL policies further reveals that even a single articulated-object task requires tens of thousands of demonstrations for SOTA methods to reach $\approx 80\%$ success, confirming that data scarcity -- not algorithmic limitations -- has been the binding constraint. AutoMoMa thus bridges high-performance planning and reliable IL-based control, providing the infrastructure previously missing for coordinated mobile manipulation research. By making large-scale, kinematically valid training data practical, AutoMoMa showcases generalizable whole-body robot policies capable of operating in the diverse, unstructured settings of the real world.

Two Sequence-Form Interior-Point Differentiable Path-Following Method to Compute Nash Equilibria

Authors:Yuqing Hou
Date:2026-04-14 10:41:32

Nash equilibrium is a fundamental solution concept in extensive-form games, while its efficient computation is still far from straightforward. This paper considers finite $n$-player extensive-form games with perfect recall under the sequence-form representation. Unlike existing approaches, which mainly treat the sequence form as a compact computational reformulation, we develop a direct sequence-form definition of Nash equilibrium. Building on this, we rigorously establish the associated sequence-form Nash equilibrium system through an equivalence proof with mixed-strategy Nash equilibrium. On this basis, we propose a single-stage interior-point differentiable path-following method for equilibrium computation. The method uses logarithmic-barrier regularization to generate a differentiable equilibrium path in the interior of the realization-plan space, leading to favorable numerical stability and convergence properties. Numerical results show that the proposed method is effective and computationally efficient.

A Heterogeneous Dual-Network Framework for Emergency Delivery UAVs: Communication Assurance and Path Planning Coordination

Authors:Ping Huang, Bin Duo, Ziedor Godfred, Liuwei Huo, Jin Ning, Xiaojun Yuan, Jun Li
Date:2026-04-14 09:26:53

Natural disasters often damage ground infrastructure, making unmanned aerial vehicles (UAVs) essential for emergency supply delivery. Yet safe operation in complex post-disaster environments requires reliable command-and-control (C2) links; link instability can cause loss of control, delay rescue, and trigger severe secondary harm. To provide continuous three-dimensional (3D) C2 coverage during dynamic missions, we propose a Heterogeneous Dual-Network Framework (HDNF) for safe and reliable emergency delivery. HDNF tightly couples an Emergency Communication Support Network (ECSN), formed by hovering UAV base stations, with a Delivery Path Network (DPN), formed by fast-moving delivery UAVs. The ECSN dynamically safeguards mission-critical flight corridors, while the DPN aligns trajectories with reliable coverage regions. We formulate a joint optimization problem over task assignment, 3D UAV-BS deployment, and DPN path planning to maximize end-to-end C2 reliability while minimizing UAV flight energy consumption and base-station deployment cost. To solve this computationally intractable NP-hard problem, we develop a layered strategy with three components: (i) a multi-layer C2 service model that overcomes 2D-metric limitations and aligns UAV-BS deployment with mission-critical 3D phases; (ii) a 3D coverage-aware multi-agent reinforcement learning algorithm that addresses the high-dimensional search space and improves both training efficiency and topology resilience; and (iii) a 3D communication-aware A* planner that jointly optimizes C2 quality and flight energy, mitigating trajectory--coverage mismatch and improving routing safety. Extensive simulations show that HDNF markedly improves C2 reliability, eliminates outages in critical phases, and sustains high task success rates while reducing hardware deployment cost.

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Authors:Sunyao Zhou, Yunzi Wu, Tianhang Wang, Xinhai Li, Guang Chen, Lizheng Liu, Chenjia Bai, Xuelong Li
Date:2026-04-14 09:11:55

Long-horizon collaborative vision-language navigation (VLN) is critical for multi-robot systems to accomplish complex tasks beyond the capability of a single agent. CoNavBench takes a first step by introducing the first collaborative long-horizon VLN benchmark with relay-style multi-robot tasks, a collaboration taxonomy, along with graph-grounded generation and evaluation to model handoffs and rendezvous in shared environments. However, existing benchmarks and evaluations often do not enforce strictly synchronized dual-robot rollout on a shared world timeline, and they typically rely on static coordination policies that cannot adapt when new cross-agent evidence emerges. We present Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation (DeCoNav), a decentralized framework that couples event-triggered dialogue with dynamic task allocation and replanning for real-time, adaptive coordination. In DeCoNav, robots exchange compact semantic states via dialogue without a central controller. When informative events such as new evidence, uncertainty, or conflicts arise, dialogue is triggered to dynamically reassign subgoals and replan under synchronized execution. Implemented in DeCoNavBench with 1,213 tasks across 176 HM3D scenes, DeCoNav improves the both-success rate (BSR) by 69.2%, demonstrating the effectiveness of dialogue-driven, dynamically reallocated planning for multi-robot collaboration.

From Kinematics to Dynamics: Learning to Refine Hybrid Plans for Physically Feasible Execution

Authors:Lidor Erez, Shahaf S. Shperberg, Ayal Taitler
Date:2026-04-14 09:00:08

In many robotic tasks, agents must traverse a sequence of spatial regions to complete a mission. Such problems are inherently mixed discrete-continuous: a high-level action sequence and a physically feasible continuous trajectory. The resulting trajectory and action sequence must also satisfy problem constraints such as deadlines, time windows, and velocity or acceleration limits. While hybrid temporal planners attempt to address this challenge, they typically model motion using linear (first-order) dynamics, which cannot guarantee that the resulting plan respects the robot's true physical constraints. Consequently, even when the high-level action sequence is fixed, producing a dynamically feasible trajectory becomes a bi-level optimization problem. We address this problem via reinforcement learning in continuous space. We define a Markov Decision Process that explicitly incorporates analytical second-order constraints and use it to refine first-order plans generated by a hybrid planner. Our results show that this approach can reliably recover physical feasibility and effectively bridge the gap between a planner's initial first-order trajectory and the dynamics required for real execution.

A Hybrid Architecture for Benign-Malignant Classification of Mammography ROIs

Authors:Mohammed Asad, Mohit Bajpai, Sudhir Singh, Rahul Katarya
Date:2026-04-14 08:28:21

Accurate characterization of suspicious breast lesions in mammography is important for early diagnosis and treatment planning. While Convolutional Neural Networks (CNNs) are effective at extracting local visual patterns, they are less suited to modeling long-range dependencies. Vision Transformers (ViTs) address this limitation through self-attention, but their quadratic computational cost can be prohibitive. This paper presents a hybrid architecture that combines EfficientNetV2-M for local feature extraction with Vision Mamba, a State Space Model (SSM), for efficient global context modeling. The proposed model performs binary classification of abnormality-centered mammography regions of interest (ROIs) from the CBIS-DDSM dataset into benign and malignant classes. By combining a strong CNN backbone with a linear-complexity sequence model, the approach achieves strong lesion-level classification performance in an ROI-based setting.

Learning step-level dynamic soaring in shear flow

Authors:Lunbing Chen, Jixin Lu, Yufei Yin, Jinpeng Huang, Yang Xiang, Hong Liu
Date:2026-04-14 07:58:07

Dynamic soaring enables sustained flight by extracting energy from wind shear, yet it is commonly understood as a cycle-level maneuver that assumes stable flow conditions. In realistic unsteady environments, however, such assumptions are often violated, raising the question of whether explicit cycle-level planning is necessary. Here, we show that dynamic soaring can emerge from step-level, state-feedback control using only local sensing, without explicit trajectory planning. Using deep reinforcement learning as a tool, we obtain policies that achieve robust omnidirectional navigation across diverse shear-flow conditions. The learned behavior organizes into a structured control law that coordinates turning and vertical motion, giving rise to a two-phase strategy governed by a trade-off between energy extraction and directional progress. The resulting policy generalizes across varying conditions and reproduces key features observed in biological flight and optimal-control solutions. These findings identify a feedback-based control structure underlying dynamic soaring, demonstrating that efficient energy-harvesting flight can emerge from local interactions with the flow without explicit planning, and providing insights for biological flight and autonomous systems in complex, flow-coupled environments.

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

Authors:Zhihua Hua, Junli Wang, Pengfei LI, Qihao Jin, Bo Zhang, Kehua Sheng, Yilun Chen, Zhongxue Gan, Wenchao Ding
Date:2026-04-14 02:34:44

Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over-rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenarios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient representation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG-VLA that fuses local planning with global planning. The SNG-VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA

How to Use Prices for Efficient Online Matching

Authors:Terence Highsmith
Date:2026-04-14 01:23:45

Many matching markets feature unknown, dynamic arrivals of agents that must match immediately. A caseworker must match an abused child to a foster home, a hospital must assign a patient in critical condition to a room, or a city must place a homeless individual into a shelter. We design an online matching algorithm -- the Sequential Equilibrium Mechanism (SEM) -- that approximates large market equilibria to match arriving agents to objects. SEM is asymptotically efficient, fair, and strategy-proof with probability one. Our application plans to deploy a lab-in-the-field experiment where real caseworkers match vulnerable children to host homes, and we provide simulation evidence that SEM can substantially improve welfare.

Robotic Nanoparticle Synthesis via Solution-based Processes

Authors:Dasharadhan Mahalingam, Michael Gallagher, Nilanjan Chakraborty, Stanislaus S. Wong
Date:2026-04-14 00:54:42

We present a screw geometry-based manipulation planning framework for the robotic automation of solution-based synthesis, exemplified through the preparation of gold and magnetite nanoparticles. The synthesis protocols are inherently long-horizon, multi-step tasks, requiring skills such as pick-and-place, pouring, turning a knob, and periodic visual inspection to detect reaction completion. A central challenge is that some skills, notably pouring, transferring containers with solutions, and turning a knob, impose geometric and kinematic constraints on the end-effector motion. To address this, we use a programming by demonstration paradigm where the constraints can be extracted from a single demonstration. This combination of screw-based motion representation and demonstration-driven specification enables domain experts, such as chemists, to readily adapt and reprogram the system for new experimental protocols and laboratory setups without requiring expertise in robotics or motion planning. We extract sequences of constant screws from demonstrations, which compactly encode the motion constraints while remaining coordinate-invariant. This representation enables robust generalization across variations in grasp placement and allows parameterized reuse of a skill learned from a single example. By composing these screw-parameterized primitives according to the synthesis protocol, the robot autonomously generates motion plans that execute the complete experiment over repeated runs. Our results highlight that screw-theoretic planning, combined with programming by demonstration, provides a rigorous and generalizable foundation for long-horizon laboratory automation, thereby enabling fundamental kinematics to have a translational impact on the use of robots in developing scalable solution-based synthesis protocols.

VERITAS: Verifiable Epistemic Reasoning for Image-Derived Hypothesis Testing via Agentic Systems

Authors:Lucas Stoffl, Benedikt Wiestler, Johannes C. Paetzold
Date:2026-04-13 23:48:35

Drawing meaningful conclusions from inherently multimodal clinical data (including medical imaging) requires coordinating expertise across the clinical specialty, radiology, programming, and biostatistics. This fragmented process bottlenecks discovery. We present VERITAS (Verifiable Epistemic Reasoning for Image-Derived Hypothesis Testing via Agentic Systems), a multi-agent system that autonomously tests natural-language hypotheses on multimodal clinical datasets while producing a fully auditable evidence trail: every statistical conclusion traces through inspectable, executable outputs from analysis plan to segmentation masks to statistical code to final verdict. VERITAS decomposes the workflow into four phases handled by role-specialized agents, and introduces an epistemic evidence label framework that mechanically classifies outcomes as Supported, Refuted, Underpowered, or Invalid by jointly evaluating significance, effect direction, and study power. This distinction is critical in medical imaging, where non-significant results often reflect insufficient sample size rather than absent effects. To evaluate the system, we construct a tiered benchmark of 64 hypotheses spanning six complexity levels across cardiac (ACDC, 150 subjects) and brain glioma (UCSF-PDGM, 501 subjects) MRI. VERITAS reaches 81.4% verdict accuracy with frontier models and 71.2% with locally-hosted open-weight models (8-30B), outperforming all five single-model baselines in both classes. It also produces the highest rate of independently verifiable statistical outputs (86.6%), so even its failures remain diagnosable through artifact inspection. Structured multi-agent decomposition thus substitutes for model scale while preserving the verifiability clinical research demands.

Ternary Logic Encodings of Temporal Behavior Trees with Application to Control Synthesis

Authors:Ryan Matheu, John S. Baras, Calin Belta
Date:2026-04-13 22:07:05

Behavior Trees (BTs) provide designers an intuitive graphical interface to construct long-horizon plans for autonomous systems. To ensure their correctness and safety, rigorous formal models and verification techniques are essential. Temporal BTs (TBTs) offer a promising approach by leveraging existing temporal logic formalisms to specify and verify the executions of BTs. However, this analysis is currently limited to offline post hoc analysis and trace repair. In this paper, we reformulate TBTs using a ternary-valued Signal Temporal Logic (STL) amenable for control synthesis. Ternary logic introduces a third truth value \textit{Unknown}, formally capturing cases where a trajectory has neither fully satisfied or dissatisfied a specification. We propose mixed-integer linear encodings for partial trajectory STL and TBTs over ternary logic allowing for correct-by-construction control strategies for linear dynamical systems via mixed-integer optimization. We demonstrate the utility of our framework by solving optimal control problems.

Systematic Design of Local Rules for Directing Emergent Structure in Bottom-Up Systems

Authors:Andrew Slezak, Varda F. Hagh
Date:2026-04-13 20:56:17

Many biological systems collectively construct complex, adaptive, and functional architectures, where function emerges from bottom-up building processes rather than top-down planning or centralized control. However, general strategies for programming and controlling such emergent function in engineered systems remain largely unexplored. In this work, we present a systematic framework for designing local behavioral rule sets for simple builders such that, when adhered to, structures with targeted global properties emerge. Using a minimal model inspired by tent caterpillars, we study how simple agents equipped with limited sensing and no memory or global knowledge construct networked structures through local deposition of line segments. We base our framework on tuning local degrees of freedom in a complex system to alter global behavior. By identifying the degrees of freedom that influence a given property and specifying how they are tuned through local rules, we demonstrate that the corresponding global properties can be directed. We explore this through three geometric properties of the agents' resulting networks, in particular area coverage, average line density, and front curvature. We show that agents can reliably achieve targeted values for these properties while maintaining low variability in the presence of stochasticity. These results establish a generalizable approach for programming emergence in decentralized systems and suggest new pathways for designing adaptive materials and autonomous construction strategies in complex, uncertain environments.

Scalable Optimization for Mobility-Aware Coordinated Electric Vehicle Charging in Distribution Power Networks

Authors:Yi Ju, Lunlong Li, Jingchun Wang, Scott Moura
Date:2026-04-13 19:40:26

Rapid growth in electric-vehicle (EV) charging demand is placing increasing stress on distribution power networks (DPNs), whose hosting capacity is often limited and spatially uneven. Beyond demonstrating that coordination can help, this paper answers an open question that is central for planners: what is the maximal achievable benefit of EV demand flexibility in reducing overload-driven distribution upgrades at a regional scale? Establishing such an upper bound is computationally challenging, as it entails solving and certifying near-optimal solutions to population-scale optimization problems with millions of variables and both spatial and temporal coupling. We introduce MAC (Mobility-Aware Coordinated EV charging), a framework that quantifies the maximum potential of leveraging EV demand flexibility to mitigate DPN overloading risk without interrupting drivers' travel needs. (i) MAC expands feasible scheduling by coupling charging decisions over a full mobility horizon: instead of enforcing per-session energy recovery, it only requires the EV state-of-charge (SOC) to remain sufficient for upcoming trips. (ii) MAC is computationally scalable via an ADMM-based decomposition with custom subproblem solvers, and admits a decentralized interpretation in which dual variables act as locational-temporal price signals that implement the social optimum as a competitive equilibrium. Using high-resolution mobility trajectories and feeder hosting-capacity data in a future-oriented 30% EV adoption scenario for the San Francisco Bay Area, we show that MAC can dramatically reduce overload-driven upgrade requirements relative to unmanaged charging. This paper illustrates how trajectory-coupled flexibility and scalable, certifiable optimization can provide actionable best-case benchmarks for DPN planning and operations.

Complementarity by Construction: A Lie-Group Approach to Solving Quadratic Programs with Linear Complementarity Constraints

Authors:Arun L. Bishop, Micah I. Reich, Zachary Manchester
Date:2026-04-13 19:31:41

Many problems in robotics require reasoning over a mix of continuous dynamics and discrete events, such as making and breaking contact in manipulation and locomotion. These problems are locally well modeled by linear complementarity quadratic programs (LCQPs), an extension to QPs that introduce complementarity constraints. While very expressive, LCQPs are non-convex, and few solvers exist for computing good local solutions for use in planning pipelines. In this work, we observe that complementarity constraints form a Lie group under infinitesimal relaxation, and leverage this structure to perform on-manifold optimization. We introduce a retraction map that is numerically well behaved, and use it to parameterize the constraints so that they are satisfied by construction. The resulting solver avoids many of the classical issues with complementarity constraints. We provide an open-source solver, Marble, that is implemented in C++ with Julia and Python bindings. We demonstrate that Marble is competitive on a suite of benchmark problems, and solves a number of robotics problems where existing approaches fail to converge.

How Transformers Learn to Plan via Multi-Token Prediction

Authors:Jianhao Huang, Zhanpeng Zhou, Renqiu Xia, Baharan Mirzasoleiman, Weijie Su, Wei Huang
Date:2026-04-13 18:04:09

While next-token prediction (NTP) has been the standard objective for training language models, it often struggles to capture global structure in reasoning tasks. Multi-token prediction (MTP) has recently emerged as a promising alternative, yet its underlying mechanisms remain poorly understood. In this paper, we study how MTP facilitates reasoning, with a focus on planning. Empirically, we show that MTP consistently outperforms NTP on both synthetic graph path-finding tasks and more realistic reasoning benchmarks, such as Countdown and boolean satisfiability problems. Theoretically, we analyze a simplified two-layer Transformer on a star graph task. We prove that MTP induces a two-stage reverse reasoning process: the model first attends to the end node and then reconstructs the path by tracing intermediate nodes backward. This behavior arises from a gradient decoupling property of MTP, which provides a cleaner training signal compared to NTP. Ultimately, our results highlight how multi-token objectives inherently bias optimization toward robust and interpretable reasoning circuits.

Consistency of the dark matter halo perturbation parameter from morphological and kinematic lopsidedness of galaxies

Authors:Prerana Biswas, Narendra Nath Patra, Veselina Kalinova
Date:2026-04-13 18:00:03

The lopsidedness of galaxies is a commonly observed phenomenon, and through different studies, it has been observed that nearly 30% of galaxies show this phenomenon. In this work, we study morphological lopsidedness in both stellar and gas disks in the inner and outer regions using Fourier analysis techniques and compare the results for a sample of nearby galaxies with different morphologies and environments. Although lopsidedness can result from diverse factors like tidal interactions, gas accretion, and internal instability, recent studies suggest it is a common feature that is not solely reliant on rare events, and moderate lopsidedness most likely results from the disk's response to a lopsided dark matter halo potential. Assuming lopsidedness originates due to a lopsided halo, we find the morphological and kinematic halo perturbation parameters in the same radial range. Unlike previous studies, we use 3D kinematic modelled rotation curves for finding kinematic lopsidedness and, hence, kinematic halo perturbation parameter. Although the detected linear correlation between them is not statistically significant for our small sample of eleven galaxies, this approach provides a more uniform and physically consistent framework to test the theoretically expected similarity between morphological and kinematic halo perturbation parameters. Further, within this framework, the discrepancy between them does not appear to depend on the nature of the rotation-curve asymmetry of the two sides of the galaxy, in contrast to trends seen in earlier studies. In future work, we plan to extend this analysis to a substantially larger sample in order to robustly assess these findings.

Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net

Authors:Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
Date:2026-04-13 17:58:15

Accurate delineation of the Clinical Target Volume (CTV) is essential for radiotherapy planning, yet remains time-consuming and difficult to assess, especially for complex treatments such as Total Marrow and Lymph Node Irradiation (TMLI). While deep learning-based auto-segmentation can reduce workload, safe clinical deployment requires reliable cues indicating where models may be wrong. In this work, we propose a budget-aware uncertainty-driven quality assurance (QA) framework built on nnU-Net, combining uncertainty quantification and post-hoc calibration to produce voxel-wise uncertainty maps (based on predictive entropy) that can guide targeted manual review. We compare temperature scaling (TS), deep ensembles (DE), checkpoint ensembles (CE), and test-time augmentation (TTA), evaluated both individually and in combination on TMLI as a representative use case. Reliability is assessed through ROI-masked calibration metrics and uncertainty--error alignment under realistic revision constraints, summarized as AUC over the top 0-5% most uncertain voxels. Across configurations, segmentation accuracy remains stable, whereas TS substantially improves calibration. Uncertainty-error alignment improves most with calibrated checkpoint-based inference, leading to uncertainty maps that highlight more consistently regions requiring manual edits. Overall, integrating calibration with efficient ensembling seems a promising strategy to implement a budget-aware QA workflow for radiotherapy segmentation.

Grounded World Model for Semantically Generalizable Planning

Authors:Quanyi Li, Lan Feng, Haonan Zhang, Wuyang Li, Letian Wang, Alexandre Alahi, Harold Soh
Date:2026-04-13 17:25:41

In Model Predictive Control (MPC), world models predict the future outcomes of various action proposals, which are then scored to guide the selection of the optimal action. For visuomotor MPC, the score function is a distance metric between a predicted image and a goal image, measured in the latent space of a pretrained vision encoder like DINO and JEPA. However, it is challenging to obtain the goal image in advance of the task execution, particularly in new environments. Additionally, conveying the goal through an image offers limited interactivity compared with natural language. In this work, we propose to learn a Grounded World Model (GWM) in a vision-language-aligned latent space. As a result, each proposed action is scored based on how close its future outcome is to the task instruction, reflected by the similarity of embeddings. This approach transforms the visuomotor MPC to a VLA that surpasses VLM-based VLAs in semantic generalization. On the proposed WISER benchmark, GWM-MPC achieves a 87% success rate on the test set comprising 288 tasks that feature unseen visual signals and referring expressions, yet remain solvable with motions demonstrated during training. In contrast, traditional VLAs achieve an average success rate of 22%, even though they overfit the training set with a 90% success rate.

Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

Authors:Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma
Date:2026-04-13 17:13:46

Closed-loop cooperative driving requires planners that generate realistic multimodal multi-agent trajectories while improving safety and traffic efficiency. Existing diffusion planners can model multimodal behaviors from demonstrations, but they often exhibit weak scene consistency and remain poorly aligned with closed-loop objectives; meanwhile, stable online post-training in reactive multi-agent environments remains difficult. We present Multi-ORFT, which couples scene-conditioned diffusion pre-training with stable online reinforcement post-training. In pre-training, the planner uses inter-agent self-attention, cross-attention, and AdaLN-Zero-based scene conditioning to improve scene consistency and road adherence of joint trajectories. In post-training, we formulate a two-level MDP that exposes step-wise reverse-kernel likelihoods for online optimization, and combine dense trajectory-level rewards with variance-gated group-relative policy optimization (VG-GRPO) to stabilize training. On the WOMD closed-loop benchmark, Multi-ORFT reduces collision rate from 2.04% to 1.89% and off-road rate from 1.68% to 1.36%, while increasing average speed from 8.36 to 8.61 m/s relative to the pre-trained planner, and it outperforms strong open-source baselines including SMART-large, SMART-tiny-CLSFT, and VBD on the primary safety and efficiency metrics. These results show that coupling scene-consistent denoising with stable online diffusion-policy optimization improves the reliability of closed-loop cooperative driving.