planning - 2025-04-24

Latent Diffusion Planning for Imitation Learning

Authors:Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn

Date:2025-04-23 17:53:34

Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. First, we learn a compact latent space through a variational autoencoder, enabling effective forecasting of future states in image-based domains. Then, we train a planner and an inverse dynamics model with diffusion objectives. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches, as they cannot leverage such additional data.

Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms

Authors:Hsin-Jung Yang, Mahsa Khosravi, Benjamin Walt, Girish Krishnan, Soumik Sarkar

Date:2025-04-23 17:41:55

Soft continuum arms (SCAs) soft and deformable nature presents challenges in modeling and control due to their infinite degrees of freedom and non-linear behavior. This work introduces a reinforcement learning (RL)-based framework for visual servoing tasks on SCAs with zero-shot sim-to-real transfer capabilities, demonstrated on a single section pneumatic manipulator capable of bending and twisting. The framework decouples kinematics from mechanical properties using an RL kinematic controller for motion planning and a local controller for actuation refinement, leveraging minimal sensing with visual feedback. Trained entirely in simulation, the RL controller achieved a 99.8% success rate. When deployed on hardware, it achieved a 67% success rate in zero-shot sim-to-real transfer, demonstrating robustness and adaptability. This approach offers a scalable solution for SCAs in 3D visual servoing, with potential for further refinement and expanded applications.

Computing Optimal Transport Plans via Min-Max Gradient Flows

Authors:Lauren Conger, Franca Hoffmann, Ricardo Baptista, Eric Mazumdar

Date:2025-04-23 17:11:34

We pose the Kantorovich optimal transport problem as a min-max problem with a Nash equilibrium that can be obtained dynamically via a two-player game, providing a framework for approximating optimal couplings. We prove convergence of the timescale-separated gradient descent dynamics to the optimal transport plan, and implement the gradient descent algorithm with a particle method, where the marginal constraints are enforced weakly using the KL divergence, automatically selecting a dynamical adaptation of the regularizer. The numerical results highlight the different advantages of using the standard Kullback-Leibler (KL) divergence versus the reverse KL divergence with this approach, opening the door for new methodologies.

Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models

Authors:Ilyass Taouil, Haizhou Zhao, Angela Dai, Majid Khadiv

Date:2025-04-23 16:07:02

This paper uses the capabilities of latent diffusion models (LDMs) to generate realistic RGB human-object interaction scenes to guide humanoid loco-manipulation planning. To do so, we extract from the generated images both the contact locations and robot configurations that are then used inside a whole-body trajectory optimization (TO) formulation to generate physically consistent trajectories for humanoids. We validate our full pipeline in simulation for different long-horizon loco-manipulation scenarios and perform an extensive analysis of the proposed contact and robot configuration extraction pipeline. Our results show that using the information extracted from LDMs, we can generate physically consistent trajectories that require long-horizon reasoning.

Evaluating the Impact of CT-to-RED Calibration Curves on Dosimetric Accuracy in Brain Radiotherapy Dose Distribution

Authors:Islam G. Ali, Wael M. Daabis, Hossam Donya

Date:2025-04-23 15:24:57

Accurate dose calculation is crucial in radiotherapy, as tissue relative electron densities (RED) derived from CT scans play a vital role. This study investigated the impact of different CT-to-RED calibration curves on brain cancer treatment plans. Three calibration curves were compared: CIRS phantom-derived, Catphan phantom-derived, and the default curve in the Monaco Treatment Planning System. Ten volumetric modulated arc therapy (VMAT) plans were generated and recalculated using each curve. Dosimetric parameters for Planning Target Volume (PTV) and Organs at Risk (OARs) were analyzed. Results showed significant differences in PTV dose distribution between the CIRS-derived and default curves, while no significant differences were found between Catphan-derived and default curves. The CIRS-derived curve demonstrated superior performance in representing brain tissue electron densities. These findings emphasize the importance of using site-specific CT-to-RED calibration curves for accurate dose calculations in brain radiotherapy, potentially improving treatment safety and efficacy

Credible plan-driven RAG method for Multi-hop Question Answering

Authors:Ningning Zhang, Chi Zhang, Zhizhong Tan, Xingxing Yang, Weiping Deng, Wenyong Wang

Date:2025-04-23 15:03:17

Multi-hop question answering (QA) presents a considerable challenge for Retrieval-Augmented Generation (RAG), requiring the structured decomposition of complex queries into logical reasoning paths and the generation of dependable intermediate results. However, deviations in reasoning paths or errors in intermediate results, which are common in current RAG methods, may propagate and accumulate throughout the reasoning process, diminishing the accuracy of the answer to complex queries. To address this challenge, we propose the Plan-then-Act-and-Review (PAR RAG) framework, which is organized into three key stages: planning, act, and review, and aims to offer an interpretable and incremental reasoning paradigm for accurate and reliable multi-hop question answering by mitigating error propagation.PAR RAG initially applies a top-down problem decomposition strategy, formulating a comprehensive plan that integrates multiple executable steps from a holistic viewpoint. This approach avoids the pitfalls of local optima common in traditional RAG methods, ensuring the accuracy of the entire reasoning path. Subsequently, PAR RAG incorporates a plan execution mechanism based on multi-granularity verification. By utilizing both coarse-grained similarity information and fine-grained relevant data, the framework thoroughly checks and adjusts intermediate results, ensuring process accuracy while effectively managing error propagation and amplification. Experimental results on multi-hop QA datasets demonstrate that the PAR RAG framework substantially outperforms existing state-of-the-art methods in key metrics, including EM and F1 scores.

MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

Authors:Itamar Mishani, Yorai Shaoul, Maxim Likhachev

Date:2025-04-23 14:09:42

Planning long-horizon motions using a set of predefined skills is a key challenge in robotics and AI. Addressing this challenge requires methods that systematically explore skill combinations to uncover task-solving sequences, harness generic, easy-to-learn skills (e.g., pushing, grasping) to generalize across unseen tasks, and bypass reliance on symbolic world representations that demand extensive domain and task-specific knowledge. Despite significant progress, these elements remain largely disjoint in existing approaches, leaving a critical gap in achieving robust, scalable solutions for complex, long-horizon problems. In this work, we present MOSAIC, a skill-centric framework that unifies these elements by using the skills themselves to guide the planning process. MOSAIC uses two families of skills: Generators compute executable trajectories and world configurations, and Connectors link these independently generated skill trajectories by solving boundary value problems, enabling progress toward completing the overall task. By breaking away from the conventional paradigm of incrementally discovering skills from predefined start or goal states--a limitation that significantly restricts exploration--MOSAIC focuses planning efforts on regions where skills are inherently effective. We demonstrate the efficacy of MOSAIC in both simulated and real-world robotic manipulation tasks, showcasing its ability to solve complex long-horizon planning problems using a diverse set of skills incorporating generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit https://skill-mosaic.github.io for demonstrations and examples.

DYNUS: Uncertainty-aware Trajectory Planner in Dynamic Unknown Environments

Authors:Kota Kondo, Mason Peterson, Nicholas Rober, Juan Rached Viso, Lucas Jia, Jialin Chen, Harvey Merton, Jonathan P. How

Date:2025-04-23 14:05:04

This paper introduces DYNUS, an uncertainty-aware trajectory planner designed for dynamic unknown environments. Operating in such settings presents many challenges -- most notably, because the agent cannot predict the ground-truth future paths of obstacles, a previously planned trajectory can become unsafe at any moment, requiring rapid replanning to avoid collisions. Recently developed planners have used soft-constraint approaches to achieve the necessary fast computation times; however, these methods do not guarantee collision-free paths even with static obstacles. In contrast, hard-constraint methods ensure collision-free safety, but typically have longer computation times. To address these issues, we propose three key contributions. First, the DYNUS Global Planner (DGP) and Temporal Safe Corridor Generation operate in spatio-temporal space and handle both static and dynamic obstacles in the 3D environment. Second, the Safe Planning Framework leverages a combination of exploratory, safe, and contingency trajectories to flexibly re-route when potential future collisions with dynamic obstacles are detected. Finally, the Fast Hard-Constraint Local Trajectory Formulation uses a variable elimination approach to reduce the problem size and enable faster computation by pre-computing dependencies between free and dependent variables while still ensuring collision-free trajectories. We evaluated DYNUS in a variety of simulations, including dense forests, confined office spaces, cave systems, and dynamic environments. Our experiments show that DYNUS achieves a success rate of 100% and travel times that are approximately 25.0% faster than state-of-the-art methods. We also evaluated DYNUS on multiple platforms -- a quadrotor, a wheeled robot, and a quadruped -- in both simulation and hardware experiments.

PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands

Authors:Pei Lin, Yuzhe Huang, Wanlin Li, Jianpeng Ma, Chenxi Xiao, Ziyuan Jiao

Date:2025-04-23 12:10:11

Robots are increasingly envisioned as human companions, assisting with everyday tasks that often involve manipulating deformable objects. Although recent advances in robotic hardware and embodied AI have expanded their capabilities, current systems still struggle with handling thin, flat, and deformable objects such as paper and fabric. This limitation arises from the lack of suitable perception techniques for robust state estimation under diverse object appearances, as well as the absence of planning techniques for generating appropriate grasp motions. To bridge these gaps, this paper introduces PP-Tac, a robotic system for picking up paper-like objects. PP-Tac features a multi-fingered robotic hand with high-resolution omnidirectional tactile sensors \sensorname. This hardware configuration enables real-time slip detection and online frictional force control that mitigates such slips. Furthermore, grasp motion generation is achieved through a trajectory synthesis pipeline, which first constructs a dataset of finger's pinching motions. Based on this dataset, a diffusion-based policy is trained to control the hand-arm robotic system. Experiments demonstrate that PP-Tac can effectively grasp paper-like objects of varying material, thickness, and stiffness, achieving an overall success rate of 87.5\%. To our knowledge, this work is the first attempt to grasp paper-like deformable objects using a tactile dexterous hand. Our project webpage can be found at: https://peilin-666.github.io/projects/PP-Tac/

Path Matters: Industrial Data Meet Quantum Optimization

Authors:Lukas Schmidbauer, Carlos A. Riofrío, Florian Heinrich, Vanessa Junk, Ulrich Schwenk, Thomas Husslein, Wolfgang Mauerer

Date:2025-04-23 10:45:38

Real-world optimization problems must undergo a series of transformations before becoming solvable on current quantum hardware. Even for a fixed problem, the number of possible transformation paths -- from industry-relevant formulations through binary constrained linear programs (BILPs), to quadratic unconstrained binary optimization (QUBO), and finally to a hardware-executable representation -- is remarkably large. Each step introduces free parameters, such as Lagrange multipliers, encoding strategies, slack variables, rounding schemes or algorithmic choices -- making brute-force exploration of all paths intractable. In this work, we benchmark a representative subset of these transformation paths using a real-world industrial production planning problem with industry data: the optimization of work allocation in a press shop producing vehicle parts. We focus on QUBO reformulations and algorithmic parameters for both quantum annealing (QA) and the Linear Ramp Quantum Approximate Optimization Algorithm (LR-QAOA). Our goal is to identify a reduced set of effective configurations applicable to similar industrial settings. Our results show that QA on D-Wave hardware consistently produces near-optimal solutions, whereas LR-QAOA on IBM quantum devices struggles to reach comparable performance. Hence, the choice of hardware and solver strategy significantly impacts performance. The problem formulation and especially the penalization strategy determine the solution quality. Most importantly, mathematically-defined penalization strategies are equally successful as hand-picked penalty factors, paving the way for automated QUBO formulation. Moreover, we observe a strong correlation between simulated and quantum annealing performance metrics, offering a scalable proxy for predicting QA behavior on larger problem instances.

Partitioning of multiple brain metastases improves dose gradients in single-isocenter radiosurgery

Authors:Johan Sundström, Anton Finnson, Elin Hynning, Geert De Kerf, Albin Fredriksson

Date:2025-04-23 09:02:57

Background: A growing number of cancer patients with brain metastases can benefit from stereotactic radiosurgery (SRS) thanks to recent advances in systemic therapies. With an increasing patient load, single-isocenter treatments on widely available C-arm linear accelerators are an attractive option. However, the planning of such treatments is challenging for multi-target cases due to the island blocking problem, which occurs when the multi-leaf collimator cannot conform to all targets simultaneously. Purpose: We propose a multi-target partitioning algorithm that mitigates excessive exposure of normal tissue caused by the island blocking problem. Methods: The algorithm divides (partitions) the set of targets into subsets to treat with separate arc passes, optimizing both subsets and collimator angles to minimize island blocking. The algorithm was incorporated into a fully automated treatment planning script and evaluated on 20 simulated patient cases, each with 10 brain metastases and 21 Gy prescriptions. It was also retrospectively evaluated on six clinical cases. Results: Partitioning significantly improved the gradient index, global efficiency index, and brain V12Gy compared to simultaneous treatment of all metastases. For example, the average gradient index improved from 5.9 to 3.3, global efficiency index from 0.32 to 0.46, and normal brain V12Gy from 49 cm3 to 26 cm3 between 3 and 9 arcs. The proposed algorithm outperformed baselines in utilizing a limited number of arcs. All target partitioning strategies increased the total number of monitor units (MUs). Conclusions: The dose gradient in single-isocenter VMAT plans can be substantially improved by treating a smaller subset of metastases at a time. This requires more MUs and arcs, implying a trade-off between delivery time and plan quality which can be explored using the algorithm proposed in this paper.

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance

Authors:Meng Chu, Yukang Chen, Haokun Gui, Shaozuo Yu, Yi Wang, Jiaya Jia

Date:2025-04-23 08:32:25

Tourism and travel planning increasingly rely on digital assistance, yet existing multimodal AI systems often lack specialized knowledge and contextual understanding of urban environments. We present TraveLLaMA, a specialized multimodal language model designed for urban scene understanding and travel assistance. Our work addresses the fundamental challenge of developing practical AI travel assistants through a novel large-scale dataset of 220k question-answer pairs. This comprehensive dataset uniquely combines 130k text QA pairs meticulously curated from authentic travel forums with GPT-enhanced responses, alongside 90k vision-language QA pairs specifically focused on map understanding and scene comprehension. Through extensive fine-tuning experiments on state-of-the-art vision-language models (LLaVA, Qwen-VL, Shikra), we demonstrate significant performance improvements ranging from 6.5\%-9.4\% in both pure text travel understanding and visual question answering tasks. Our model exhibits exceptional capabilities in providing contextual travel recommendations, interpreting map locations, and understanding place-specific imagery while offering practical information such as operating hours and visitor reviews. Comparative evaluations show TraveLLaMA significantly outperforms general-purpose models in travel-specific tasks, establishing a new benchmark for multi-modal travel assistance systems.

Circinus: Efficient Query Planner for Compound ML Serving

Authors:Banruo Liu, Wei-Yu Lin, Minghao Fang, Yihan Jiang, Fan Lai

Date:2025-04-23 03:57:24

The rise of compound AI serving -- integrating multiple operators in a pipeline that may span edge and cloud tiers -- enables end-user applications such as autonomous driving, generative AI-powered meeting companions, and immersive gaming. Achieving high service goodput -- i.e., meeting service level objectives (SLOs) for pipeline latency, accuracy, and costs -- requires effective planning of operator placement, configuration, and resource allocation across infrastructure tiers. However, the diverse SLO requirements, varying edge capabilities, and high query volumes create an enormous planning search space, rendering current solutions fundamentally limited for real-time serving and cost-efficient deployments. This paper presents Circinus, an SLO-aware query planner for large-scale compound AI workloads. Circinus novelly decomposes multi-query planning and multi-dimensional SLO objectives while preserving global decision quality. By exploiting plan similarities within and across queries, it significantly reduces search steps. It further improves per-step efficiency with a precision-aware plan profiler that incrementally profiles and strategically applies early stopping based on imprecise estimates of plan performance. At scale, Circinus selects query-plan combinations to maximize global SLO goodput. Evaluations in real-world settings show that Circinus improves service goodput by 3.2-5.0$\times$, accelerates query planning by 4.2-5.8$\times$, achieving query response in seconds, while reducing deployment costs by 3.2-4.0$\times$ over state of the arts even in their intended single-tier deployments.

Distributed Space Resource Logistics Architecture Optimization under Economies of Scale

Authors:Evangelia Gkaravela, Hang Woon Lee, Hao Chen

Date:2025-04-23 03:26:43

This paper proposes an optimization framework for distributed resource logistics system design to support future multimission space exploration. The performance and impact of distributed In-Situ Resource Utilization (ISRU) systems in facilitating space transportation are analyzed. The proposed framework considers technology trade studies, deployment strategy, facility location evaluation, and resource logistics after production for distributed ISRU systems. We develop piecewise linear sizing and cost estimation models based on economies of scale that can be easily integrated into network-based mission planning formulations. A case study on a multi-mission cislunar logistics campaign is conducted to demonstrate the value of the proposed method and evaluate key tradeoffs to compare the performance of distributed ISRU systems with traditional concentrated ISRU. Finally, a comprehensive sensitivity analysis is performed to assess the proposed system under varying conditions, comparing concentrated and distributed ISRU systems.

SILM: A Subjective Intent Based Low-Latency Framework for Multiple Traffic Participants Joint Trajectory Prediction

Authors:Qu Weiming, Wang Jia, Du Jiawei, Zhu Yuanhao, Yu Jianfeng, Xia Rui, Cao Song, Wu Xihong, Luo Dingsheng

Date:2025-04-23 02:56:34

Trajectory prediction is a fundamental technology for advanced autonomous driving systems and represents one of the most challenging problems in the field of cognitive intelligence. Accurately predicting the future trajectories of each traffic participant is a prerequisite for building high safety and high reliability decision-making, planning, and control capabilities in autonomous driving. However, existing methods often focus solely on the motion of other traffic participants without considering the underlying intent behind that motion, which increases the uncertainty in trajectory prediction. Autonomous vehicles operate in real-time environments, meaning that trajectory prediction algorithms must be able to process data and generate predictions in real-time. While many existing methods achieve high accuracy, they often struggle to effectively handle heterogeneous traffic scenarios. In this paper, we propose a Subjective Intent-based Low-latency framework for Multiple traffic participants joint trajectory prediction. Our method explicitly incorporates the subjective intent of traffic participants based on their key points, and predicts the future trajectories jointly without map, which ensures promising performance while significantly reducing the prediction latency. Additionally, we introduce a novel dataset designed specifically for trajectory prediction. Related code and dataset will be available soon.

GENCNIPPET: Automated Generation of Code Snippets for Supporting Programming Questions

Authors:Saikat Mondal, Chanchal K. Roy

Date:2025-04-22 22:07:40

Context: Software developers often ask questions on Technical Q&A forums like Stack Overflow (SO) to seek solutions to their programming-related problems (e.g., errors and unexpected behavior of code). Problem: Many questions miss required code snippets due to the lack of readily available code, time constraints, employer restrictions, confidentiality concerns, or uncertainty about what code to share. Unfortunately, missing but required code snippets prevent questions from getting prompt and appropriate solutions. Objective: We plan to introduce GENCNIPPET, a tool designed to integrate with SO's question submission system. GENCNIPPET will generate relevant code examples (when required) to support questions for their timely solutions. Methodology: We first downloaded the SO April 2024 data dump, which contains 1.94 million questions related to Python that have code snippets and 1.43 million questions related to Java. Then, we filter these questions to identify those that genuinely require code snippets using a state-of-the-art machine learning model. Next, we select questions with positive scores to ensure high-quality data. Our plan is to fine-tune Llama-3 models (e.g., Llama-3-8B), using 80% of the selected questions for training and 10% for validation. The primary reasons for choosing Llama models are their open-source accessibility and robust fine-tuning capabilities, which are essential for deploying a freely accessible tool. GENCNIPPET will be integrated with the SO question submission system as a browser plugin. It will communicate with the fine-tuned model to generate code snippets tailored to the target questions. The effectiveness of the generated code examples will be assessed using automatic evaluation against ground truth, user perspectives, and live (wild) testing in real-world scenarios.

HTN Plan Repair Algorithms Compared: Strengths and Weaknesses of Different Methods

Authors:Paul Zaidins, Robert P. Goldman, Ugur Kuter, Dana Nau, Mark Roberts

Date:2025-04-22 18:55:26

This paper provides theoretical and empirical comparisons of three recent hierarchical plan repair algorithms: SHOPFixer, IPyHOPPER, and Rewrite. Our theoretical results show that the three algorithms correspond to three different definitions of the plan repair problem, leading to differences in the algorithms' search spaces, the repair problems they can solve, and the kinds of repairs they can make. Understanding these distinctions is important when choosing a repair method for any given application. Building on the theoretical results, we evaluate the algorithms empirically in a series of benchmark planning problems. Our empirical results provide more detailed insight into the runtime repair performance of these systems and the coverage of the repair problems solved, based on algorithmic properties such as replanning, chronological backtracking, and backjumping over plan trees.

VR-based Intervention for Perspective Change: A Case to Investigate Virtual Materiality

Authors:Ali Arya, Anthony Scavarelli, Dan Hawes, Luciara Nardon

Date:2025-04-22 16:53:21

This paper addresses the concept of materiality in virtual environments, which we define as being composed of objects that can influence user experience actively. Such virtual materiality is closely related to its physical counterpart, which is discussed in theoretical frameworks such as sociomateriality and actor-network theory. They define phenomena in terms of the entanglement of human and non-human elements. We report on an early investigation of virtual materiality within the context of reflection and perspective change in nature-based virtual environments. We considered the case of university students reflecting on the planning and management of their theses and major projects. Inspired by nature's known positive cognitive and affective effects and repeated questioning processes, we established a virtual reflection intervention to demonstrate the environmental mechanisms and material characteristics relevant to virtual materiality. Our work is a preliminary step toward understanding virtual materiality and its implications for research and the design of virtual environments.

Monocular inspection of spacecraft under illumination constraints and avoidance regions

Authors:Tochukwu Elijah Ogri, Muzaffar Qureshi, Zachary I. Bell, Matthew Longmire, Rushikesh Kamalapurkar

Date:2025-04-22 14:50:09

This paper presents an adaptive control approach to information-based guidance and control of a spacecraft carrying out on-orbit inspection by actively computing optimal policies for the spacecraft to achieve the best possible representation of objects within its orbital environment. Due to the complexity of navigating the space environment, it may be impossible to carry out on-orbit servicing to maintain space systems like satellites using a spacecraft equipped with controllers that cannot adapt to changing conditions. In particular, the presence of constraints such as illumination, field-of-view (FOV), minimal fuel, the use of visual-inertial navigation for improved localization, and the need for real-time computation of control policies render the spacecraft motion planning problem challenging. The control framework developed in this paper addresses these challenges by formulating the inspection task as a constrained optimization problem where the goal is to maximize information gained from the cameras, while navigating to the next best view, subject to illumination and FOV constraints. The developed architecture is analyzed using a Lyapunov-based stability analysis and the effectiveness of the planning algorithm is verified in simulation.

Bayesian sample size calculations for external validation studies of risk prediction models

Authors:Mohsen Sadatsafavi, Paul Gustafson, Solmaz Setayeshgar, Laure Wynants, Richard Riley

Date:2025-04-22 14:07:05

Summary: Contemporary sample size calculations for external validation of risk prediction models require users to specify fixed values of assumed model performance metrics alongside target precision levels (e.g., 95% CI widths). However, due to the finite samples of previous studies, our knowledge of true model performance in the target population is uncertain, and so choosing fixed values represents an incomplete picture. As well, for net benefit (NB) as a measure of clinical utility, the relevance of conventional precision-based inference is doubtful. In this work, we propose a general Bayesian algorithm for constructing the joint distribution of predicted risks and response values based on summary statistics of model performance in previous studies. For statistical metrics of performance, we propose sample size determination rules that either target desired expected precision, or a desired assurance probability that the precision criteria will be satisfied. For NB, we propose rules based on optimality assurance (the probability that the planned study correctly identifies the most beneficial strategy) and the Expected Value of Sample Information (EVSI), the expected gain in NB from the planned validation study. We showcase these developments in a case study on the validation of a risk prediction model for deterioration of hospitalized COVID-19 patients. Compared to the conventional sample size calculation methods, a Bayesian approach requires explicit quantification of uncertainty around model performance, but thereby enables various sample size rules based on expected precision, assurance probabilities, and value of information.

Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation

Authors:Qizhen Wu, Lei Chen, Kexin Liu, Jinhu Lü

Date:2025-04-22 13:22:58

In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making that integrates discrete commands and continuous actions. Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers, limiting adaptability in dynamic environments. Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers. This method effectively maps commands to task allocation and actions to path planning, while leveraging cross-training techniques to enhance learning across the hierarchical framework. Furthermore, we introduce a trajectory prediction model that bridges abstract task representations with actionable planning goals. In our experiments, it achieves over 80% in confrontation win rate and under 0.01 seconds in decision time, outperforming existing approaches. Demonstrations through large-scale tests and real-world robot experiments further emphasize the generalization capabilities and practical applicability of our method.

An Extended Horizon Tactical Decision-Making for Automated Driving Based on Monte Carlo Tree Search

Authors:Karim Essalmi, Fernando Garrido, Fawzi Nashashibi

Date:2025-04-22 13:11:16

This paper introduces COR-MCTS (Conservation of Resources - Monte Carlo Tree Search), a novel tactical decision-making approach for automated driving focusing on maneuver planning over extended horizons. Traditional decision-making algorithms are often constrained by fixed planning horizons, typically up to 6 seconds for classical approaches and 3 seconds for learning-based methods limiting their adaptability in particular dynamic driving scenarios. However, planning must be done well in advance in environments such as highways, roundabouts, and exits to ensure safe and efficient maneuvers. To address this challenge, we propose a hybrid method integrating Monte Carlo Tree Search (MCTS) with our prior utility-based framework, COR-MP (Conservation of Resources Model for Maneuver Planning). This combination enables long-term, real-time decision-making, significantly enhancing the ability to plan a sequence of maneuvers over extended horizons. Through simulations across diverse driving scenarios, we demonstrate that COR-MCTS effectively improves planning robustness and decision efficiency over extended horizons.

The 2nd MERCADO Workshop at IEEE VIS 2025: Multimodal Experiences for Remote Communication Around Data Online

Authors:Wolfgang Büschel, Gabriela Molina León, Arnaud Prouzeau, Mahmood Jasim, Christophe Hurter, Maxime Cordeil, Matthew Brehmer

Date:2025-04-22 12:54:54

We propose a half-day workshop at IEEE VIS 2025 on addressing the emerging challenges in data-rich multimodal remote collaboration. We focus on synchronous, remote, and hybrid settings where people take part in tasks such as data analysis, decision-making, and presentation. With this workshop, we continue successful prior work from the first MERCADO workshop at VIS 2023 and a 2024 Shonan Seminar that followed. Based on the findings of the earlier events, we invite research and ideas related to four themes of challenges: Tools & Technologies, Individual Differences & Interpersonal Dynamics, AI-assisted Collaboration, and Evaluation. With this workshop, we aim to broaden the community, foster new collaborations, and develop a research agenda to address these challenges in future research. Our planned workshop format is comprised of a keynote, short presentations, a breakout group session, and discussions organized around the identified challenges.

Dynamic Intent Queries for Motion Transformer-based Trajectory Prediction

Authors:Tobias Demmler, Lennart Hartung, Andreas Tamke, Thao Dang, Alexander Hegai, Karsten Haug, Lars Mikelsons

Date:2025-04-22 10:20:35

In autonomous driving, accurately predicting the movements of other traffic participants is crucial, as it significantly influences a vehicle's planning processes. Modern trajectory prediction models strive to interpret complex patterns and dependencies from agent and map data. The Motion Transformer (MTR) architecture and subsequent work define the most accurate methods in common benchmarks such as the Waymo Open Motion Benchmark. The MTR model employs pre-generated static intention points as initial goal points for trajectory prediction. However, the static nature of these points frequently leads to misalignment with map data in specific traffic scenarios, resulting in unfeasible or unrealistic goal points. Our research addresses this limitation by integrating scene-specific dynamic intention points into the MTR model. This adaptation of the MTR model was trained and evaluated on the Waymo Open Motion Dataset. Our findings demonstrate that incorporating dynamic intention points has a significant positive impact on trajectory prediction accuracy, especially for predictions over long time horizons. Furthermore, we analyze the impact on ground truth trajectories which are not compliant with the map data or are illegal maneuvers.

Comparative Analysis of Evolutionary Algorithms for Energy-Aware Production Scheduling

Authors:Sascha C Burmeister, Till N Rogalski, Guido Schryen

Date:2025-04-22 07:54:05

The energy transition is driving rapid growth in renewable energy generation, creating the need to balance energy supply and demand with energy price awareness. One such approach for manufacturers to balance their energy demand with available energy is energyaware production planning. Through energy-aware production planning, manufacturers can align their energy demand with dynamic grid conditions, supporting renewable energy integration while benefiting from lower prices and reduced emissions. Energy-aware production planning can be modeled as a multi-criteria scheduling problem, where the objectives extend beyond traditional metrics like makespan or required workers to also include minimizing energy costs and emissions. Due to market dynamics and the NP-hard multi-objective nature of the problem, evolutionary algorithms are widely used for energy-aware scheduling. However, existing research focuses on the design and analysis of single algorithms, with limited comparisons between different approaches. In this study, we adapt NSGA-III, HypE, and $\theta$-DEA as memetic metaheuristics for energy-aware scheduling to minimize makespan, energy costs, emissions, and the number of workers, within a real-time energy market context. These adapted metaheuristics present different approaches for environmental selection. In a comparative analysis, we explore differences in solution efficiency and quality across various scenarios which are based on benchmark instances from the literature and real-world energy market data. Additionally, we estimate upper bounds on the distance between objective values obtained with our memetic metaheuristics and reference sets obtained via an exact solver.

Exploring Inevitable Waypoints for Unsolvability Explanation in Hybrid Planning Problems

Authors:Mir Md Sajid Sarwar, Rajarshi Ray

Date:2025-04-22 07:45:30

Explaining unsolvability of planning problems is of significant research interest in Explainable AI Planning. AI planning literature has reported several research efforts on generating explanations of solutions to planning problems. However, explaining the unsolvability of planning problems remains a largely open and understudied problem. A widely practiced approach to plan generation and automated problem solving, in general, is to decompose tasks into sub-problems that help progressively converge towards the goal. In this paper, we propose to adopt the same philosophy of sub-problem identification as a mechanism for analyzing and explaining unsolvability of planning problems in hybrid systems. In particular, for a given unsolvable planning problem, we propose to identify common waypoints, which are universal obstacles to plan existence; in other words, they appear on every plan from the source to the planning goal. This work envisions such waypoints as sub-problems of the planning problem and the unreachability of any of these waypoints as an explanation for the unsolvability of the original planning problem. We propose a novel method of waypoint identification by casting the problem as an instance of the longest common subsequence problem, a widely popular problem in computer science, typically considered as an illustrative example for the dynamic programming paradigm. Once the waypoints are identified, we perform symbolic reachability analysis on them to identify the earliest unreachable waypoint and report it as the explanation of unsolvability. We present experimental results on unsolvable planning problems in hybrid domains.

An ACO-MPC Framework for Energy-Efficient and Collision-Free Path Planning in Autonomous Maritime Navigation

Authors:Yaoze Liu, Zhen Tian, Qifan Zhou, Zixuan Huang, Hongyu Sun

Date:2025-04-22 06:09:54

Automated driving on ramps presents significant challenges due to the need to balance both safety and efficiency during lane changes. This paper proposes an integrated planner for automated vehicles (AVs) on ramps, utilizing an unsatisfactory level metric for efficiency and arrow-cluster-based sampling for safety. The planner identifies optimal times for the AV to change lanes, taking into account the vehicle's velocity as a key factor in efficiency. Additionally, the integrated planner employs arrow-cluster-based sampling to evaluate collision risks and select an optimal lane-changing curve. Extensive simulations were conducted in a ramp scenario to verify the planner's efficient and safe performance. The results demonstrate that the proposed planner can effectively select an appropriate lane-changing time point and a safe lane-changing curve for AVs, without incurring any collisions during the maneuver.

Antenna Technology Readiness for the Black Hole Explorer (BHEX) Mission

Authors:T. K. Sridharan, R. Lehmensiek, S. Schwarz, D. P. Marrone

Date:2025-04-22 03:30:32

The Black Hole Explorer (BHEX) will be the first sub-mm wavelength Space Very-Long-Baseline Interferometry (VLBI) mission. It targets astronomical imaging with the highest ever spatial resolution to enable detection of the photon ring of a supermassive black hole. BHEX is being proposed for launch in 2031 as a NASA Small Explorers mission. BHEX science goals and mission opportunity require a high precision lightweight spaceborne antenna. A survey of the technology landscape for realizing such an antenna is presented. Technology readiness (TRL) for the antenna is discussed and assessed to be at TRL 5. An update on our technology maturation efforts is provided. Design studies leading to the conceptual design of a metallized carbon fiber reinforced plastic (CFRP) technology based antenna with a mass of only $\dim 50$ kg, incorporating a 3.4 m primary reflector with a surface precision of < 40 $\mu$m to allow efficient operation up to 320 GHz are outlined. Current plans anticipate attaining TRL6 in 2026 for the BHEX antenna. Completed design studies point to a large margin in surface precision which opens up opportunities for applications beyond BHEX, at significantly higher THz frequencies.

MRTA-Sim: A Modular Simulator for Multi-Robot Allocation, Planning, and Control in Open-World Environments

Authors:Victoria Marie Tuck, Hardik Parwana, Pei-Wei Chen, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, S. Shankar Sastry, Sanjit A. Seshia

Date:2025-04-21 20:03:07

This paper introduces MRTA-Sim, a Python/ROS2/Gazebo simulator for testing approaches to Multi-Robot Task Allocation (MRTA) problems on simulated robots in complex, indoor environments. Grid-based approaches to MRTA problems can be too restrictive for use in complex, dynamic environments such in warehouses, department stores, hospitals, etc. However, approaches that operate in free-space often operate at a layer of abstraction above the control and planning layers of a robot and make an assumption on approximate travel time between points of interest in the system. These abstractions can neglect the impact of the tight space and multi-agent interactions on the quality of the solution. Therefore, MRTA solutions should be tested with the navigation stacks of the robots in mind, taking into account robot planning, conflict avoidance between robots, and human interaction and avoidance. This tool connects the allocation output of MRTA solvers to individual robot planning using the NAV2 stack and local, centralized multi-robot deconfliction using Control Barrier Function-Quadrtic Programs (CBF-QPs), creating a platform closer to real-world operation for more comprehensive testing of these approaches. The simulation architecture is modular so that users can swap out methods at different levels of the stack. We show the use of our system with a Satisfiability Modulo Theories (SMT)-based approach to dynamic MRTA on a fleet of indoor delivery robots.

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Authors:Chun-Hsiao Yeh, Chenyu Wang, Shengbang Tong, Ta-Ying Cheng, Rouyu Wang, Tianzhe Chu, Yuexiang Zhai, Yubei Chen, Shenghua Gao, Yi Ma

Date:2025-04-21 17:59:53

Multi-view understanding, the ability to reconcile visual information across diverse viewpoints for effective navigation, manipulation, and 3D scene comprehension, is a fundamental challenge in Multi-Modal Large Language Models (MLLMs) to be used as embodied agents. While recent MLLMs have shown impressive advances in high-level reasoning and planning, they frequently fall short when confronted with multi-view geometric consistency and cross-view correspondence. To comprehensively evaluate the challenges of MLLMs in multi-view scene reasoning, we propose All-Angles Bench, a benchmark of over 2,100 human carefully annotated multi-view question-answer pairs across 90 diverse real-world scenes. Our six tasks (counting, attribute identification, relative distance, relative direction, object manipulation, and camera pose estimation) specifically test model's geometric correspondence and the capacity to align information consistently across views. Our extensive experiments, benchmark on 27 representative MLLMs including Gemini-2.0-Flash, Claude-3.7-Sonnet, and GPT-4o against human evaluators reveals a substantial performance gap, indicating that current MLLMs remain far from human-level proficiency. Through in-depth analysis, we show that MLLMs are particularly underperforming under two aspects: (1) cross-view correspondence for partially occluded views and (2) establishing the coarse camera poses. These findings highlight the necessity of domain-specific refinements or modules that embed stronger multi-view awareness. We believe that our All-Angles Bench offers valuable insights and contribute to bridging the gap between MLLMs and human-level multi-view understanding. The project and benchmark are publicly available at https://danielchyeh.github.io/All-Angles-Bench/.