planning - 2025-10-12

Scalable Offline Metrics for Autonomous Driving

Authors:Animikh Aich, Adwait Kulkarni, Eshed Ohn-Bar
Date:2025-10-09 17:59:57

Real-World evaluation of perception-based planning models for robotic systems, such as autonomous vehicles, can be safely and inexpensively conducted offline, i.e., by computing model prediction error over a pre-collected validation dataset with ground-truth annotations. However, extrapolating from offline model performance to online settings remains a challenge. In these settings, seemingly minor errors can compound and result in test-time infractions or collisions. This relationship is understudied, particularly across diverse closed-loop metrics and complex urban maneuvers. In this work, we revisit this undervalued question in policy evaluation through an extensive set of experiments across diverse conditions and metrics. Based on analysis in simulation, we find an even worse correlation between offline and online settings than reported by prior studies, casting doubts on the validity of current evaluation practices and metrics for driving policies. Next, we bridge the gap between offline and online evaluation. We investigate an offline metric based on epistemic uncertainty, which aims to capture events that are likely to cause errors in closed-loop settings. The resulting metric achieves over 13% improvement in correlation compared to previous offline metrics. We further validate the generalization of our findings beyond the simulation environment in real-world settings, where even greater gains are observed.

NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

Authors:Hongyu Li, Lingfeng Sun, Yafei Hu, Duy Ta, Jennifer Barry, George Konidaris, Jiahui Fu
Date:2025-10-09 17:59:55

Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle-based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero-shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/.

FlowSearch: Advancing deep research with dynamic structured knowledge flow

Authors:Yusong Hu, Runmin Ma, Yue Fan, Jinxin Shi, Zongsheng Cao, Yuhao Zhou, Jiakang Yuan, Xiangchao Yan, Wenlong Zhang, Lei Bai, Bo Zhang
Date:2025-10-09 17:48:12

Deep research is an inherently challenging task that demands both breadth and depth of thinking. It involves navigating diverse knowledge spaces and reasoning over complex, multi-step dependencies, which presents substantial challenges for agentic systems. To address this, we propose FlowSearch, a multi-agent framework that actively constructs and evolves a dynamic structured knowledge flow to drive subtask execution and reasoning. FlowSearch is capable of strategically planning and expanding the knowledge flow to enable parallel exploration and hierarchical task decomposition, while also adjusting the knowledge flow in real time based on feedback from intermediate reasoning outcomes and insights. FlowSearch achieves state-of-the-art performance on both general and scientific benchmarks, including GAIA, HLE, GPQA and TRQA, demonstrating its effectiveness in multi-disciplinary research scenarios and its potential to advance scientific discovery. The code is available at https://github.com/Alpha-Innovator/InternAgent.

Doubly Robust Estimation with Stabilized Weights for Binary Proximal Outcomes in Micro-Randomized Trials

Authors:Jinho Cha, Eunchan Cha
Date:2025-10-09 15:44:16

Micro-randomized trials (MRTs) are increasingly used to evaluate mobile health interventions with binary proximal outcomes. Standard inverse probability weighting (IPW) estimators are unbiased but unstable in small samples or under extreme randomization. Estimated mean excursion effect (EMEE) improves efficiency but lacks double robustness. We propose a doubly robust EMEE (DR-EMEE) with stabilized and truncated weights, combining per-decision IPW and outcome regression. We prove double robustness, asymptotic efficiency, and provide finite-sample variance corrections, with extensions to machine learning nuisance estimators. In simulations, DR-EMEE reduces root mean squared error, improves coverage, and achieves up to twofold efficiency gains over IPW and five to ten percent over EMEE. Applications to HeartSteps, PAMAP2, and mHealth datasets confirm stable and efficient inference across both randomized and observational settings.

Reinforcement Learning from Probabilistic Forecasts for Safe Decision-Making via Conditional Value-at-Risk Planning

Authors:Michal Koren, Or Peretz, Tai Dinh, Philip S. Yu
Date:2025-10-09 13:46:32

Sequential decisions in volatile, high-stakes settings require more than maximizing expected return; they require principled uncertainty management. This paper presents the Uncertainty-Aware Markov Decision Process (UAMDP), a unified framework that couples Bayesian forecasting, posterior-sampling reinforcement learning, and planning under a conditional value-at-risk (CVaR) constraint. In a closed loop, the agent updates its beliefs over latent dynamics, samples plausible futures via Thompson sampling, and optimizes policies subject to preset risk tolerances. We establish regret bounds that converge to the Bayes-optimal benchmark under standard regularity conditions. We evaluate UAMDP in two domains-high-frequency equity trading and retail inventory control-both marked by structural uncertainty and economic volatility. Relative to strong deep learning baselines, UAMDP improves long-horizon forecasting accuracy (RMSE decreases by up to 25\% and sMAPE by 32\%), and these gains translate into economic performance: the trading Sharpe ratio rises from 1.54 to 1.74 while maximum drawdown is roughly halved. These results show that integrating calibrated probabilistic modeling, exploration aligned with posterior uncertainty, and risk-aware control yields a robust, generalizable approach to safer and more profitable sequential decision-making.

Satellite Navigation and Control using Physics-Informed Artificial Potential Field and Sliding Mode Controller

Authors:Rakesh Kumar Sahoo, Paridhi Choudhary, Manoranjan Sinha
Date:2025-10-09 13:09:18

Increase in the number of space exploration missions has led to the accumulation of space debris, posing risk of collision with the operational satellites. Addressing this challenge is crucial for the sustainability of space operations. To plan a safe trajectory in the presence of moving space debris, an integrated approach of artificial potential field and sliding mode controller is proposed and implemented in this paper. The relative 6-DOF kinematics and dynamics of the spacecraft is modelled in the framework of geometric mechanics with the relative configuration expressed through exponential coordinates. Various collision avoidance guidance algorithms have been proposed in the literature but the Artificial Potential Field guidance algorithm is computationally efficient and enables real-time path adjustments to avoid collision with obstacles. However, it is prone to issues such as local minima. In literature, local minima issue is typically avoided by either redefining the potential function such as adding vorticity or by employing search techniques which are computationally expensive. To address these challenges, a physics-informed APF is proposed in this paper where Hamiltonian mechanics is used instead of the traditional Newtonian mechanics-based approach. In this approach, instead of relying on attractive and repulsive forces for path planning, the Hamiltonian approach uses the potential field to define a path of minimum potential. Additionally, to track the desired trajectory planned by the guidance algorithm within a fixed-time frame, a non-singular fixed-time sliding mode controller (FTSMC) is used. The proposed fixed-time sliding surface not only ensures fixed-time convergence of system states but also guarantees the global stability of the closed-loop system without singularity. The simulation results presented support the claims made.

Random Window Augmentations for Deep Learning Robustness in CT and Liver Tumor Segmentation

Authors:Eirik A. Østmo, Kristoffer K. Wickstrøm, Keyur Radiya, Michael C. Kampffmeyer, Karl Øyvind Mikalsen, Robert Jenssen
Date:2025-10-09 11:57:04

Contrast-enhanced Computed Tomography (CT) is important for diagnosis and treatment planning for various medical conditions. Deep learning (DL) based segmentation models may enable automated medical image analysis for detecting and delineating tumors in CT images, thereby reducing clinicians' workload. Achieving generalization capabilities in limited data domains, such as radiology, requires modern DL models to be trained with image augmentation. However, naively applying augmentation methods developed for natural images to CT scans often disregards the nature of the CT modality, where the intensities measure Hounsfield Units (HU) and have important physical meaning. This paper challenges the use of such intensity augmentations for CT imaging and shows that they may lead to artifacts and poor generalization. To mitigate this, we propose a CT-specific augmentation technique, called Random windowing, that exploits the available HU distribution of intensities in CT images. Random windowing encourages robustness to contrast-enhancement and significantly increases model performance on challenging images with poor contrast or timing. We perform ablations and analysis of our method on multiple datasets, and compare to, and outperform, state-of-the-art alternatives, while focusing on the challenge of liver tumor segmentation.

Reverse Supply Chain Network Design of a Polyurethane Waste Upcycling System

Authors:Dalga Merve Özkan, Sergio Lucia, Sebastian Engell
Date:2025-10-09 11:35:54

This paper presents a general mathematical programming framework for the design and optimization of supply chain infrastructures for the upcycling of plastic waste. For this purpose, a multi-product, multi-echelon, multi-period mixed-integer linear programming (MILP) model has been formulated. The objective is to minimize the cost of the entire circular supply chain starting from the collection of post-consumer plastic waste to the production of virgin-equivalent high value polymers, satisfying a large number of constraints from collection quota to the quality of the feedstock. The framework aims to support the strategic planning of future circular supply chains by determining the optimal number, locations and sizes of various types of facilities as well as the amounts of materials to be transported between the nodes of the supply chain network over a specified period. The functionality of the framework has been tested with a case study for the upcycling of rigid polyurethane foam waste coming from construction sites in Germany. The economic potential and infrastructure requirements are evaluated, and it has been found that from a solely economic perspective, the current status of the value chain is not competitive with fossil-based feedstock or incineration. However, with the right economic incentives, there is a considerable potential to establish such value chains, once the upcycling technology is ready and the economic framework conditions have stabilized.

ReInAgent: A Context-Aware GUI Agent Enabling Human-in-the-Loop Mobile Task Navigation

Authors:Haitao Jia, Ming He, Zimo Yin, Likang Wu, Jianping Fan, Jitao Sang
Date:2025-10-09 09:22:05

Mobile GUI agents exhibit substantial potential to facilitate and automate the execution of user tasks on mobile phones. However, exist mobile GUI agents predominantly privilege autonomous operation and neglect the necessity of active user engagement during task execution. This omission undermines their adaptability to information dilemmas including ambiguous, dynamically evolving, and conflicting task scenarios, leading to execution outcomes that deviate from genuine user requirements and preferences. To address these shortcomings, we propose ReInAgent, a context-aware multi-agent framework that leverages dynamic information management to enable human-in-the-loop mobile task navigation. ReInAgent integrates three specialized agents around a shared memory module: an information-managing agent for slot-based information management and proactive interaction with the user, a decision-making agent for conflict-aware planning, and a reflecting agent for task reflection and information consistency validation. Through continuous contextual information analysis and sustained user-agent collaboration, ReInAgent overcomes the limitation of existing approaches that rely on clear and static task assumptions. Consequently, it enables more adaptive and reliable mobile task navigation in complex, real-world scenarios. Experimental results demonstrate that ReInAgent effectively resolves information dilemmas and produces outcomes that are more closely aligned with genuine user preferences. Notably, on complex tasks involving information dilemmas, ReInAgent achieves a 25% higher success rate than Mobile-Agent-v2.

Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation

Authors:Mingyang Sun, Jiude Wei, Qichen He, Donglin Wang, Cewu Lu, Jianhua Sun
Date:2025-10-09 09:08:33

Enabling robots to perform precise and generalized manipulation in unstructured environments remains a fundamental challenge in embodied AI. While Vision-Language Models (VLMs) have demonstrated remarkable capabilities in semantic reasoning and task planning, a significant gap persists between their high-level understanding and the precise physical execution required for real-world manipulation. To bridge this "semantic-to-physical" gap, we introduce GRACE, a novel framework that grounds VLM-based reasoning through executable analytic concepts (EAC)-mathematically defined blueprints that encode object affordances, geometric constraints, and semantics of manipulation. Our approach integrates a structured policy scaffolding pipeline that turn natural language instructions and visual information into an instantiated EAC, from which we derive grasp poses, force directions and plan physically feasible motion trajectory for robot execution. GRACE thus provides a unified and interpretable interface between high-level instruction understanding and low-level robot control, effectively enabling precise and generalizable manipulation through semantic-physical grounding. Extensive experiments demonstrate that GRACE achieves strong zero-shot generalization across a variety of articulated objects in both simulated and real-world environments, without requiring task-specific training.

Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots

Authors:Boyu Li, Siyuan He, Hang Xu, Haoqi Yuan, Yu Zang, Liwei Hu, Junpeng Yue, Zhenxiong Jiang, Pengbo Hu, Börje F. Karlsson, Yehui Tang, Zongqing Lu
Date:2025-10-09 07:35:12

In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprioceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio-MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/DualTHOR-5F3B.

Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents

Authors:Renhua Ding, Xiao Yang, Zhengwei Fang, Jun Luo, Kun He, Jun Zhu
Date:2025-10-09 05:34:57

Large vision-language models (LVLMs) enable autonomous mobile agents to operate smartphone user interfaces, yet vulnerabilities to UI-level attacks remain critically understudied. Existing research often depends on conspicuous UI overlays, elevated permissions, or impractical threat models, limiting stealth and real-world applicability. In this paper, we present a practical and stealthy one-shot jailbreak attack that leverages in-app prompt injections: malicious applications embed short prompts in UI text that remain inert during human interaction but are revealed when an agent drives the UI via ADB (Android Debug Bridge). Our framework comprises three crucial components: (1) low-privilege perception-chain targeting, which injects payloads into malicious apps as the agent's visual inputs; (2) stealthy user-invisible activation, a touch-based trigger that discriminates agent from human touches using physical touch attributes and exposes the payload only during agent operation; and (3) one-shot prompt efficacy, a heuristic-guided, character-level iterative-deepening search algorithm (HG-IDA*) that performs one-shot, keyword-level detoxification to evade on-device safety filters. We evaluate across multiple LVLM backends, including closed-source services and representative open-source models within three Android applications, and we observe high planning and execution hijack rates in single-shot scenarios (e.g., GPT-4o: 82.5% planning / 75.0% execution). These findings expose a fundamental security vulnerability in current mobile agents with immediate implications for autonomous smartphone operation.

DEAS: DEtached value learning with Action Sequence for Scalable Offline RL

Authors:Changyeon Kim, Haeone Lee, Younggyo Seo, Kimin Lee, Yuke Zhu
Date:2025-10-09 03:11:09

Offline reinforcement learning (RL) presents an attractive paradigm for training intelligent agents without expensive online interactions. However, current approaches still struggle with complex, long-horizon sequential decision making. In this work, we introduce DEtached value learning with Action Sequence (DEAS), a simple yet effective offline RL framework that leverages action sequences for value learning. These temporally extended actions provide richer information than single-step actions and can be interpreted through the options framework via semi-Markov decision process Q-learning, enabling reduction of the effective planning horizon by considering longer sequences at once. However, directly adopting such sequences in actor-critic algorithms introduces excessive value overestimation, which we address through detached value learning that steers value estimates toward in-distribution actions that achieve high return in the offline dataset. We demonstrate that DEAS consistently outperforms baselines on complex, long-horizon tasks from OGBench and can be applied to enhance the performance of large-scale Vision-Language-Action models that predict action sequences, significantly boosting performance in both RoboCasa Kitchen simulation tasks and real-world manipulation tasks.

Probabilistically-Safe Bipedal Navigation over Uncertain Terrain via Conformal Prediction and Contraction Analysis

Authors:Kasidit Muenprasitivej, Ye Zhao, Glen Chou
Date:2025-10-09 03:03:09

We address the challenge of enabling bipedal robots to traverse rough terrain by developing probabilistically safe planning and control strategies that ensure dynamic feasibility and centroidal robustness under terrain uncertainty. Specifically, we propose a high-level Model Predictive Control (MPC) navigation framework for a bipedal robot with a specified confidence level of safety that (i) enables safe traversal toward a desired goal location across a terrain map with uncertain elevations, and (ii) formally incorporates uncertainty bounds into the centroidal dynamics of locomotion control. To model the rough terrain, we employ Gaussian Process (GP) regression to estimate elevation maps and leverage Conformal Prediction (CP) to construct calibrated confidence intervals that capture the true terrain elevation. Building on this, we formulate contraction-based reachable tubes that explicitly account for terrain uncertainty, ensuring state convergence and tube invariance. In addition, we introduce a contraction-based flywheel torque control law for the reduced-order Linear Inverted Pendulum Model (LIPM), which stabilizes the angular momentum about the center-of-mass (CoM). This formulation provides both probabilistic safety and goal reachability guarantees. For a given confidence level, we establish the forward invariance of the proposed torque control law by demonstrating exponential stabilization of the actual CoM phase-space trajectory and the desired trajectory prescribed by the high-level planner. Finally, we evaluate the effectiveness of our planning framework through physics-based simulations of the Digit bipedal robot in MuJoCo.

Multimodal Safety Evaluation in Generative Agent Social Simulations

Authors:Alhim Vera, Karen Sanchez, Carlos Hinojosa, Haidar Bin Hamid, Donghoon Kim, Bernard Ghanem
Date:2025-10-09 02:42:57

Can generative agents be trusted in multimodal environments? Despite advances in large language and vision-language models that enable agents to act autonomously and pursue goals in rich settings, their ability to reason about safety, coherence, and trust across modalities remains limited. We introduce a reproducible simulation framework for evaluating agents along three dimensions: (1) safety improvement over time, including iterative plan revisions in text-visual scenarios; (2) detection of unsafe activities across multiple categories of social situations; and (3) social dynamics, measured as interaction counts and acceptance ratios of social exchanges. Agents are equipped with layered memory, dynamic planning, multimodal perception, and are instrumented with SocialMetrics, a suite of behavioral and structural metrics that quantifies plan revisions, unsafe-to-safe conversions, and information diffusion across networks. Experiments show that while agents can detect direct multimodal contradictions, they often fail to align local revisions with global safety, reaching only a 55 percent success rate in correcting unsafe plans. Across eight simulation runs with three models - Claude, GPT-4o mini, and Qwen-VL - five agents achieved average unsafe-to-safe conversion rates of 75, 55, and 58 percent, respectively. Overall performance ranged from 20 percent in multi-risk scenarios with GPT-4o mini to 98 percent in localized contexts such as fire/heat with Claude. Notably, 45 percent of unsafe actions were accepted when paired with misleading visuals, showing a strong tendency to overtrust images. These findings expose critical limitations in current architectures and provide a reproducible platform for studying multimodal safety, coherence, and social dynamics.

Space Logistics Analysis and Incentive Design for Commercialization of Orbital Debris Remediation

Authors:Asaad Abdul-Hamid, Brycen D. Pearl, Hang Woon Lee, Hao Chen
Date:2025-10-09 02:41:46

As orbital debris continues to become a higher priority for the space industry, there is a need to explore how partnerships between the public and private space sector may aid in addressing this issue. This research develops a space logistics framework for planning orbital debris remediation missions, providing a quantitative basis for partnerships that are mutually beneficial between space operators and debris remediators. By integrating network-based space logistics and game theory, we illuminate the high-level costs of remediating orbital debris, and the surplus that stands to be shared as a result. These findings indicate significant progress toward the continued development of a safe, sustainable, and profitable space economy.

Inspection Planning Primitives with Implicit Models

Authors:Jingyang You, Hanna Kurniawati, Lashika Medagoda
Date:2025-10-08 23:16:36

The aging and increasing complexity of infrastructures make efficient inspection planning more critical in ensuring safety. Thanks to sampling-based motion planning, many inspection planners are fast. However, they often require huge memory. This is particularly true when the structure under inspection is large and complex, consisting of many struts and pillars of various geometry and sizes. Such structures can be represented efficiently using implicit models, such as neural Signed Distance Functions (SDFs). However, most primitive computations used in sampling-based inspection planner have been designed to work efficiently with explicit environment models, which in turn requires the planner to use explicit environment models or performs frequent transformations between implicit and explicit environment models during planning. This paper proposes a set of primitive computations, called Inspection Planning Primitives with Implicit Models (IPIM), that enable sampling-based inspection planners to entirely use neural SDFs representation during planning. Evaluation on three scenarios, including inspection of a complex real-world structure with over 92M triangular mesh faces, indicates that even a rudimentary sampling-based planner with IPIM can generate inspection trajectories of similar quality to those generated by the state-of-the-art planner, while using up to 70x less memory than the state-of-the-art inspection planner.

AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation

Authors:Adam Hung, Fan Yang, Abhinav Kumar, Sergio Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson
Date:2025-10-08 21:03:14

Dexterous manipulation tasks often require switching between different contact modes, such as rolling, sliding, sticking, or non-contact contact modes. When formulating dexterous manipulation tasks as a trajectory optimization problem, a common approach is to decompose these tasks into sub-tasks for each contact mode, which are each solved independently. Optimizing each sub-task independently can limit performance, as optimizing contact points, contact forces, or other variables without information about future sub-tasks can place the system in a state from which it is challenging to make progress on subsequent sub-tasks. Further, optimizing these sub-tasks is very computationally expensive. To address these challenges, we propose Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance. By incorporating this value function into the cost of the trajectory optimization at each planning step, the value function gradients guide the optimizer toward states that minimize the cost in future sub-tasks. This effectively bridges separately optimized sub-tasks, and accelerates the optimization by reducing the amount of online computation needed. We validate AVO on a screwdriver grasping and turning task in both simulation and real world experiments, and show improved performance even with 50% less computational budget compared to trajectory optimization without the value function.

ExpertAgent: Enhancing Personalized Education through Dynamic Planning and Retrieval-Augmented Long-Chain Reasoning

Authors:Binrong Zhu, Guiran Liu, Nina Jiang
Date:2025-10-08 19:03:34

The application of advanced generative artificial intelligence in education is often constrained by the lack of real-time adaptability, personalization, and reliability of the content. To address these challenges, we propose ExpertAgent - an intelligent agent framework designed for personalized education that provides reliable knowledge and enables highly adaptive learning experiences. Therefore, we developed ExpertAgent, an innovative learning agent that provides users with a proactive and personalized learning experience. ExpertAgent dynamic planning of the learning content and strategy based on a continuously updated student model. Therefore, overcoming the limitations of traditional static learning content to provide optimized teaching strategies and learning experience in real time. All instructional content is grounded in a validated curriculum repository, effectively reducing hallucination risks in large language models and improving reliability and trustworthiness.

Quantum Grid Path Planning Using Parallel QAOA Circuits Based on Minimum Energy Principle

Authors:Jun Liu
Date:2025-10-08 18:09:52

To overcome the bottleneck of classical path planning schemes in solving NP problems and address the predicament faced by current mainstream quantum path planning frameworks in the Noisy Intermediate-Scale Quantum (NISQ) era, this study attempts to construct a quantum path planning solution based on parallel Quantum Approximate Optimization Algorithm (QAOA) architecture. Specifically, the grid path planning problem is mapped to the problem of finding the minimum quantum energy state. Two parallel QAOA circuits are built to simultaneously execute two solution processes, namely connectivity energy calculation and path energy calculation. A classical algorithm is employed to filter out unreasonable solutions of connectivity energy, and finally, the approximate optimal solution to the path planning problem is obtained by merging the calculation results of the two parallel circuits. The research findings indicate that by setting appropriate filter parameters, quantum states corresponding to position points with extremely low occurrence probabilities can be effectively filtered out, thereby increasing the probability of obtaining the target quantum state. Even when the circuit layer number p is only 1, the theoretical solution of the optimal path coding combination can still be found by leveraging the critical role of the filter. Compared with serial circuits, parallel circuits exhibit a significant advantage, as they can find the optimal feasible path coding combination with the highest probability.

Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences

Authors:Pulkit Rustagi, Kyle Hollins Wray, Sandhya Saisubramanian
Date:2025-10-08 17:40:41

Many real-world scenarios require multiple agents to coordinate in shared environments, while balancing trade-offs between multiple, potentially competing objectives. Current multi-objective multi-agent path finding (MO-MAPF) algorithms typically produce conflict-free plans by computing Pareto frontiers. They do not explicitly optimize for user-defined preferences, even when the preferences are available, and scale poorly with the number of objectives. We propose a lexicographic framework for modeling MO-MAPF, along with an algorithm \textit{Lexicographic Conflict-Based Search} (LCBS) that directly computes a single solution aligned with a lexicographic preference over objectives. LCBS integrates a priority-aware low-level $A^*$ search with conflict-based search, avoiding Pareto frontier construction and enabling efficient planning guided by preference over objectives. We provide insights into optimality and scalability, and empirically demonstrate that LCBS computes optimal solutions while scaling to instances with up to ten objectives -- far beyond the limits of existing MO-MAPF methods. Evaluations on standard and randomized MAPF benchmarks show consistently higher success rates against state-of-the-art baselines, especially with increasing number of objectives.

Test-Time Graph Search for Goal-Conditioned Reinforcement Learning

Authors:Evgenii Opryshko, Junwei Quan, Claas Voelcker, Yilun Du, Igor Gilitschenski
Date:2025-10-08 17:20:53

Offline goal-conditioned reinforcement learning (GCRL) trains policies that reach user-specified goals at test time, providing a simple, unsupervised, domain-agnostic way to extract diverse behaviors from unlabeled, reward-free datasets. Nonetheless, long-horizon decision making remains difficult for GCRL agents due to temporal credit assignment and error accumulation, and the offline setting amplifies these effects. To alleviate this issue, we introduce Test-Time Graph Search (TTGS), a lightweight planning approach to solve the GCRL task. TTGS accepts any state-space distance or cost signal, builds a weighted graph over dataset states, and performs fast search to assemble a sequence of subgoals that a frozen policy executes. When the base learner is value-based, the distance is derived directly from the learned goal-conditioned value function, so no handcrafted metric is needed. TTGS requires no changes to training, no additional supervision, no online interaction, and no privileged information, and it runs entirely at inference. On the OGBench benchmark, TTGS improves success rates of multiple base learners on challenging locomotion tasks, demonstrating the benefit of simple metric-guided test-time planning for offline GCRL.

HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving

Authors:Donald Pfaffmann, Matthias Klusch, Marcel Steinmetz
Date:2025-10-08 16:44:54

We present a novel hybrid learning-assisted planning method, named HyPlan, for solving the collision-free navigation problem for self-driving cars in partially observable traffic environments. HyPlan combines methods for multi-agent behavior prediction, deep reinforcement learning with proximal policy optimization and approximated online POMDP planning with heuristic confidence-based vertical pruning to reduce its execution time without compromising safety of driving. Our experimental performance analysis on the CARLA-CTS2 benchmark of critical traffic scenarios with pedestrians revealed that HyPlan may navigate safer than selected relevant baselines and perform significantly faster than considered alternative online POMDP planners.

Estimating Real Demand Using a Flipped Queueing Model: A Case of Shared Micro-Mobility Services

Authors:Binyu Yang, Jinxiao Du, Junlin He, Shi An, Wei Ma
Date:2025-10-08 16:26:06

The spatial-temporal imbalance between supply and demand in shared micro-mobility services often leads to observed demand being censored, resulting in incomplete records of the underlying real demand. This phenomenon undermines the reliability of the collected demand data and hampers downstream applications such as demand forecasting, fleet management, and micro-mobility planning. How to accurately estimate the real demand is challenging and has not been well explored in existing studies. In view of this, we contribute to real demand estimation for shared micro-mobility services by proposing an analytical method that rigorously derives the real demand under appropriate assumptions. Rather than directly modeling the intractable relationship between observed demand and real demand, we propose a novel random variable, Generalized Vehicle Survival Time (GVST), which is observable from trip records. The relationship between GVST and real demand is characterized by introducing a flipped queueing model (FQM) that captures the operational dynamics of shared micro-mobility services. Specifically, the distribution of GVST is derived within the FQM, which allows the real demand estimation problem to be transformed into an inverse queueing problem. We analytically derive the real demand in closed form using a one-sided estimation method, and solve the problem by a system of equations in a two-sided estimation method. We validate the proposed methods using synthetic data and conduct empirical analyses using real-world datasets from bike-sharing and shared e-scooter systems. The experimental results show that both the two-sided and one-sided methods outperform benchmark models. In particular, the one-sided approach provides a closed-form solution that delivers acceptable accuracy, constituting a practical rule of thumb for demand-related analytics and decision-making processes.

Diffusing Trajectory Optimization Problems for Recovery During Multi-Finger Manipulation

Authors:Abhinav Kumar, Fan Yang, Sergio Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin, Dmitry Berenson
Date:2025-10-08 13:58:31

Multi-fingered hands are emerging as powerful platforms for performing fine manipulation tasks, including tool use. However, environmental perturbations or execution errors can impede task performance, motivating the use of recovery behaviors that enable normal task execution to resume. In this work, we take advantage of recent advances in diffusion models to construct a framework that autonomously identifies when recovery is necessary and optimizes contact-rich trajectories to recover. We use a diffusion model trained on the task to estimate when states are not conducive to task execution, framed as an out-of-distribution detection problem. We then use diffusion sampling to project these states in-distribution and use trajectory optimization to plan contact-rich recovery trajectories. We also propose a novel diffusion-based approach that distills this process to efficiently diffuse the full parameterization, including constraints, goal state, and initialization, of the recovery trajectory optimization problem, saving time during online execution. We compare our method to a reinforcement learning baseline and other methods that do not explicitly plan contact interactions, including on a hardware screwdriver-turning task where we show that recovering using our method improves task performance by 96% and that ours is the only method evaluated that can attempt recovery without causing catastrophic task failure. Videos can be found at https://dtourrecovery.github.io/.

Temporal-Prior-Guided View Planning for Periodic 3D Plant Reconstruction

Authors:Sicong Pan, Xuying Huang, Maren Bennewitz
Date:2025-10-08 13:57:29

Periodic 3D reconstruction is essential for crop monitoring, but costly when each cycle restarts from scratch, wasting resources and ignoring information from previous captures. We propose temporal-prior-guided view planning for periodic plant reconstruction, in which a previously reconstructed model of the same plant is non-rigidly aligned to a new partial observation to form an approximation of the current geometry. To accommodate plant growth, we inflate this approximation and solve a set covering optimization problem to compute a minimal set of views. We integrated this method into a complete pipeline that acquires one additional next-best view before registration for robustness and then plans a globally shortest path to connect the planned set of views and outputs the best view sequence. Experiments on maize and tomato under hemisphere and sphere view spaces show that our system maintains or improves surface coverage while requiring fewer views and comparable movement cost compared to state-of-the-art baselines.

Smart Contract Adoption in Derivative Markets under Bounded Risk: An Optimization Approach

Authors:Jinho Cha, Long Pham, Thi Le Hoa Vo, Jaeyoung Cho, Jaejin Lee
Date:2025-10-08 13:28:55

This study develops and analyzes an optimization model of smart contract adoption under bounded risk, linking structural theory with simulation and real-world validation. We examine how adoption intensity alpha is structurally pinned at a boundary solution, invariant to variance and heterogeneity, while profitability and service outcomes are variance-fragile, eroding under volatility and heavy-tailed demand. A sharp threshold in the fixed cost parameter A3 triggers discontinuous adoption collapse (H1), variance shocks reduce profits monotonically but not adoption (H2), and additional results on readiness heterogeneity (H3), profit-service co-benefits (H4), and distributional robustness (H5) confirm the duality between stable adoption and fragile payoffs. External validity checks further establish convergence of sample average approximation at the canonical O(1/sqrt(N)) rate (H6). Empirical validation using S&P 500 returns and the MovieLens100K dataset corroborates the theoretical structure: bounded and heavy-tailed distributions fit better than Gaussian models, and profits diverge across volatility regimes even as adoption remains stable. Taken together, the results demonstrate that adoption choices are robust to uncertainty, but their financial consequences are highly fragile. For operations and finance, this duality underscores the need for risk-adjusted performance evaluation, option-theoretic modeling, and distributional stress testing in strategic investment and supply chain design.

Falsification-Driven Reinforcement Learning for Maritime Motion Planning

Authors:Marlon Müller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff
Date:2025-10-08 12:56:31

Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.

DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning

Authors:Ke Guo, Haochen Liu, Xiaojun Wu, Chen Lv
Date:2025-10-08 11:46:39

Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.

Adaptive Semantic Communication for UAV/UGV Cooperative Path Planning

Authors:Fangzhou Zhao, Yao Sun, Jianglin Lan, Lan Zhang, Xuesong Liu, Muhammad Ali Imran
Date:2025-10-08 11:30:57

Effective path planning is fundamental to the coordination of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) systems, particularly in applications such as surveillance, navigation, and emergency response. Combining UAVs' broad field of view with UGVs' ground-level operational capability greatly improve the likelihood of successfully achieving task objectives such as locating victims, monitoring target areas, or navigating hazardous terrain. In complex environments, UAVs need to provide precise environmental perception information for UGVs to optimize their routing policy. However, due to severe interference and non-line-of-sight conditions, wireless communication is often unstable in such complex environments, making it difficult to support timely and accurate path planning for UAV-UGV coordination. To this end, this paper proposes a semantic communication (SemCom) framework to enhance UAV/UGV cooperative path planning under unreliable wireless conditions. Unlike traditional methods that transmit raw data, SemCom transmits only the key information for path planning, reducing transmission volume without sacrificing accuracy. The proposed framework is developed by defining key semantics for path planning and designing a transceiver for meeting the requirements of UAV-UGV cooperative path planning. Simulation results show that, compared to conventional SemCom transceivers, the proposed transceiver significantly reduces data transmission volume while maintaining path planning accuracy, thereby enhancing system collaboration efficiency.