planning - 2025-05-13

Improving Trajectory Stitching with Flow Models

Authors:Reece O'Mahoney, Wanming Yu, Ioannis Havoutis
Date:2025-05-12 17:50:10

Generative models have shown great promise as trajectory planners, given their affinity to modeling complex distributions and guidable inference process. Previous works have successfully applied these in the context of robotic manipulation but perform poorly when the required solution does not exist as a complete trajectory within the training set. We identify that this is a result of being unable to plan via stitching, and subsequently address the architectural and dataset choices needed to remedy this. On top of this, we propose a novel addition to the training and inference procedures to both stabilize and enhance these capabilities. We demonstrate the efficacy of our approach by generating plans with out of distribution boundary conditions and performing obstacle avoidance on the Franka Panda in simulation and on real hardware. In both of these tasks our method performs significantly better than the baselines and is able to avoid obstacles up to four times as large.

Multi-Agent Path Finding via Finite-Horizon Hierarchical Factorization

Authors:Jiarui Li, Alessandro Zanardi, Gioele Zardini
Date:2025-05-12 17:31:51

We present a novel algorithm for large-scale Multi-Agent Path Finding (MAPF) that enables fast, scalable planning in dynamic environments such as automated warehouses. Our approach introduces finite-horizon hierarchical factorization, a framework that plans one step at a time in a receding-horizon fashion. Robots first compute individual plans in parallel, and then dynamically group based on spatio-temporal conflicts and reachability. The framework accounts for conflict resolution, and for immediate execution and concurrent planning, significantly reducing response time compared to offline algorithms. Experimental results on benchmark maps demonstrate that our method achieves up to 60% reduction in time-to-first-action while consistently delivering high-quality solutions, outperforming state-of-the-art offline baselines across a range of problem sizes and planning horizons.

Image Restoration via Integration of Optimal Control Techniques and the Hamilton-Jacobi-Bellman Equation

Authors:Dragos-Patru Covei
Date:2025-05-12 16:07:57

In this paper, we propose a novel image restoration framework that integrates optimal control techniques with the Hamilton-Jacobi-Bellman (HJB) equation. Motivated by models from production planning, our method restores degraded images by balancing an intervention cost against a state-dependent penalty that quantifies the loss of critical image information. Under the assumption of radial symmetry, the HJB equation is reduced to an ordinary differential equation and solved via a shooting method, from which the optimal feedback control is derived. Numerical experiments, supported by extensive parameter tuning and quality metrics such as PSNR and SSIM, demonstrate that the proposed framework achieves significant improvement in image quality. The results not only validate the theoretical model but also suggest promising directions for future research in adaptive and hybrid image restoration techniques.

Intuitive Human-Robot Interfaces Leveraging on Autonomy Features for the Control of Highly-redundant Robots

Authors:Davide Torielli
Date:2025-05-12 15:33:43

[...] With the TelePhysicalOperation interface, the user can teleoperate the different capabilities of a robot (e.g., single/double arm manipulation, wheel/leg locomotion) by applying virtual forces on selected robot body parts. This approach emulates the intuitiveness of physical human-robot interaction, but at the same time it permits to teleoperate the robot from a safe distance, in a way that resembles a "Marionette" interface. The system is further enhanced with wearable haptic feedback functions to align better with the "Marionette" metaphor, and a user study has been conducted to validate its efficacy with and without the haptic channel enabled. Considering the importance of robot independence, the TelePhysicalOperation interface incorporates autonomy modules to face, for example, the teleoperation of dual-arm mobile base robots for bimanual object grasping and transportation tasks. With the laser-guided interface, the user can indicate points of interest to the robot through the utilization of a simple but effective laser emitter device. With a neural network-based vision system, the robot tracks the laser projection in real time, allowing the user to indicate not only fixed goals, like objects, but also paths to follow. With the implemented autonomous behavior, a mobile manipulator employs its locomanipulation abilities to follow the indicated goals. The behavior is modeled using Behavior Trees, exploiting their reactivity to promptly respond to changes in goal positions, and their modularity to adapt the motion planning to the task needs. The proposed laser interface has also been employed in an assistive scenario. In this case, users with upper limbs impairments can control an assistive manipulator by directing a head-worn laser emitter to the point of interests, to collaboratively address activities of everyday life. [...]

Integrated Localization and Path Planning for an Ocean Exploring Team of Autonomous Underwater Vehicles with Consensus Graph Model Predictive Control

Authors:Mohsen Eskandari, Andrey V. Savkin, Mohammad Deghat
Date:2025-05-12 12:14:50

Navigation of a team of autonomous underwater vehicles (AUVs) coordinated by an unmanned surface vehicle (USV) is efficient and reliable for deep ocean exploration. AUVs depart from and return to the USV after collaborative navigation, data collection, and ocean exploration missions. Efficient path planning and accurate localization are essential, the latter of which is critical due to the lack of global localization signals and poor radio frequency (RF) communication in deep waters. Inertial navigation and acoustic communication are common solutions for localization. However, the former is subject to odometry drifts, and the latter is limited to short distances. This paper proposes a systematic approach for localization-aware energy-efficient collision-free path planning for a USV-AUVs team. Path planning is formulated as finite receding horizon model predictive control (MPC) optimization. A dynamic-aware linear kinodynamic motion equation is developed. The mathematical formulation for the MPC optimization is effectively developed where localization is integrated as consensus graph optimization among AUV nodes. Edges in the optimized AUV-to-USV (A2U) and AUV-to-AUV (A2A) graphs are constrained to the sonar range of acoustic modems. The time complexity of the consensus MPC optimization problem is analyzed, revealing a nonconvex NP-hard problem, which is solved using sequential convex programming. Numerical simulation results are provided to evaluate the proposed method.

Solar Orbiter's 2024 Major Flare Campaigns: An Overview

Authors:Daniel F. Ryan, Laura A. Hayes, Hannah Collier, Graham S. Kerr, Andrew R. Inglis, David Williams, Andrew P. Walsh, Miho Janvier, Daniel Müller, David Berghmans, Cis Verbeeck, Emil Kraaikamp, Peter R. Young, Therese A. Kucera, Säm Krucker, Muriel Z. Stiefel, Daniele Calchetti, Katharine K. Reeves, Sabrina Savage, Vanessa Polito
Date:2025-05-12 12:05:45

Solar Orbiter conducted a series of flare-optimised observing campaigns in 2024 utilising the Major Flare Solar Orbiter Observing Plan (SOOP). Dedicated observations were performed during two distinct perihelia intervals in March/April and October, during which over 22 flares were observed, ranging from B- to M-class. These campaigns leveraged high-resolution and high-cadence observations from the mission's remote-sensing suite, including the High-Resolution EUV Imager (EUI/HRI_EUV), the Spectrometer/Telescope for Imaging X-rays (STIX), the Spectral Imaging of the Coronal Environment (SPICE) spectrometer, and the High Resolution Telescope of the Polarimetric and Helioseismic Imager (PHI/HRT), as well as coordinated ground-based and Earth-orbiting observations. EUI/HRI_EUV operating in short-exposure modes, provided two-second-cadence, non-saturated EUV images, revealing structures and dynamics on scales not previously observed. Simultaneously, STIX captured hard X-ray imaging and spectroscopy of accelerated electrons, while SPICE acquired EUV slit spectroscopy to probe chromospheric and coronal responses. Together, these observations offer an unprecedented view of magnetic reconnection, energy release, particle acceleration, and plasma heating across a broad range of temperatures and spatial scales. These campaigns have generated a rich dataset that will be the subject of numerous future studies addressing Solar Orbiter's top-level science goal: "How do solar eruptions produce energetic particle radiation that fills the heliosphere?". This paper presents the scientific motivations, operational planning, and observational strategies behind the 2024 flare campaigns, along with initial insights into the observed flares. We also discuss lessons learned for optimizing future Solar Orbiter Major Flare campaigns and provide a resource for researchers aiming to utilize these unique observations.

LA-IMR: Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling for Tail-Latency-Sensitive Cloud Robotics

Authors:Eunil Seo, Chanh Nguyen, Erik Elmroth
Date:2025-05-12 10:12:24

Hybrid cloud-edge infrastructures now support latency-critical workloads ranging from autonomous vehicles and surgical robotics to immersive AR/VR. However, they continue to experience crippling long-tail latency spikes whenever bursty request streams exceed the capacity of heterogeneous edge and cloud tiers. To address these long-tail latency issues, we present Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling (LA-IMR). This control layer integrates a closed-form, utilization-driven latency model with event-driven scheduling, replica autoscaling, and edge-to-cloud offloading to mitigate 99th-percentile (P99) delays. Our analytic model decomposes end-to-end latency into processing, network, and queuing components, expressing inference latency as an affine power-law function of instance utilization. Once calibrated, it produces two complementary functions that drive: (i) millisecond-scale routing decisions for traffic offloading, and (ii) capacity planning that jointly determines replica pool sizes. LA-IMR enacts these decisions through a quality-differentiated, multi-queue scheduler and a custom-metric Kubernetes autoscaler that scales replicas proactively -- before queues build up -- rather than reactively based on lagging CPU metrics. Across representative vision workloads (YOLOv5m and EfficientDet) and bursty arrival traces, LA-IMR reduces P99 latency by up to 20.7 percent compared to traditional latency-only autoscaling, laying a principled foundation for next-generation, tail-tolerant cloud-edge inference services.

BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy

Authors:Micah Nye, Ayoub Raji, Andrew Saba, Eidan Erlich, Robert Exley, Aragya Goyal, Alexander Matros, Ritesh Misra, Matthew Sivaprakasam, Marko Bertogna, Deva Ramanan, Sebastian Scherer
Date:2025-05-12 06:35:22

We present the BETTY dataset, a large-scale, multi-modal dataset collected on several autonomous racing vehicles, targeting supervised and self-supervised state estimation, dynamics modeling, motion forecasting, perception, and more. Existing large-scale datasets, especially autonomous vehicle datasets, focus primarily on supervised perception, planning, and motion forecasting tasks. Our work enables multi-modal, data-driven methods by including all sensor inputs and the outputs from the software stack, along with semantic metadata and ground truth information. The dataset encompasses 4 years of data, currently comprising over 13 hours and 32TB, collected on autonomous racing vehicle platforms. This data spans 6 diverse racing environments, including high-speed oval courses, for single and multi-agent algorithm evaluation in feature-sparse scenarios, as well as high-speed road courses with high longitudinal and lateral accelerations and tight, GPS-denied environments. It captures highly dynamic states, such as 63 m/s crashes, loss of tire traction, and operation at the limit of stability. By offering a large breadth of cross-modal and dynamic data, the BETTY dataset enables the training and testing of full autonomy stack pipelines, pushing the performance of all algorithms to the limits. The current dataset is available at https://pitt-mit-iac.github.io/betty-dataset/.

CHD: Coupled Hierarchical Diffusion for Long-Horizon Tasks

Authors:Ce Hao, Anxing Xiao, Zhiwei Xue, Harold Soh
Date:2025-05-12 06:21:48

Diffusion-based planners have shown strong performance in short-horizon tasks but often fail in complex, long-horizon settings. We trace the failure to loose coupling between high-level (HL) sub-goal selection and low-level (LL) trajectory generation, which leads to incoherent plans and degraded performance. We propose Coupled Hierarchical Diffusion (CHD), a framework that models HL sub-goals and LL trajectories jointly within a unified diffusion process. A shared classifier passes LL feedback upstream so that sub-goals self-correct while sampling proceeds. This tight HL-LL coupling improves trajectory coherence and enables scalable long-horizon diffusion planning. Experiments across maze navigation, tabletop manipulation, and household environments show that CHD consistently outperforms both flat and hierarchical diffusion baselines.

A Framework for Joint Grasp and Motion Planning in Confined Spaces

Authors:Martin Rudorfer, Jiří Hartvich, Vojtěch Vonásek
Date:2025-05-12 06:21:42

Robotic grasping is a fundamental skill across all domains of robot applications. There is a large body of research for grasping objects in table-top scenarios, where finding suitable grasps is the main challenge. In this work, we are interested in scenarios where the objects are in confined spaces and hence particularly difficult to reach. Planning how the robot approaches the object becomes a major part of the challenge, giving rise to methods for joint grasp and motion planning. The framework proposed in this paper provides 20 benchmark scenarios with systematically increasing difficulty, realistic objects with precomputed grasp annotations, and tools to create and share more scenarios. We further provide two baseline planners and evaluate them on the scenarios, demonstrating that the proposed difficulty levels indeed offer a meaningful progression. We invite the research community to build upon this framework by making all components publicly available as open source.

Towards user-centered interactive medical image segmentation in VR with an assistive AI agent

Authors:Pascal Spiegler, Arash Harirpoush, Yiming Xiao
Date:2025-05-12 03:47:05

Crucial in disease analysis and surgical planning, manual segmentation of volumetric medical scans (e.g. MRI, CT) is laborious, error-prone, and challenging to master, while fully automatic algorithms can benefit from user-feedback. Therefore, with the complementary power of the latest radiological AI foundation models and virtual reality (VR)'s intuitive data interaction, we propose SAMIRA, a novel conversational AI agent that assists users with localizing, segmenting, and visualizing 3D medical concepts in VR. Through speech-based interaction, the agent helps users understand radiological features, locate clinical targets, and generate segmentation masks that can be refined with just a few point prompts. The system also supports true-to-scale 3D visualization of segmented pathology to enhance patient-specific anatomical understanding. Furthermore, to determine the optimal interaction paradigm under near-far attention-switching for refining segmentation masks in an immersive, human-in-the-loop workflow, we compare VR controller pointing, head pointing, and eye tracking as input modes. With a user study, evaluations demonstrated a high usability score (SUS=90.0 $\pm$ 9.0), low overall task load, as well as strong support for the proposed VR system's guidance, training potential, and integration of AI in radiological segmentation tasks.

Terrain-aware Low Altitude Path Planning

Authors:Yixuan Jia, Andrea Tagliabue, Navid Dadkhah Tehrani, Jonathan P. How
Date:2025-05-11 22:53:45

In this paper, we study the problem of generating low altitude path plans for nap-of-the-earth (NOE) flight in real time with only RGB images from onboard cameras and the vehicle pose. We propose a novel training method that combines behavior cloning and self-supervised learning that enables the learned policy to outperform the policy trained with standard behavior cloning approach on this task. Simulation studies are performed on a custom canyon terrain.

YOPOv2-Tracker: An End-to-End Agile Tracking and Navigation Framework from Perception to Action

Authors:Junjie Lu, Yulin Hui, Xuewei Zhang, Wencan Feng, Hongming Shen, Zhiyu Li, Bailing Tian
Date:2025-05-11 09:53:34

Traditional target tracking pipelines including detection, mapping, navigation, and control are comprehensive but introduce high latency, limitting the agility of quadrotors. On the contrary, we follow the design principle of "less is more", striving to simplify the process while maintaining effectiveness. In this work, we propose an end-to-end agile tracking and navigation framework for quadrotors that directly maps the sensory observations to control commands. Importantly, leveraging the multimodal nature of navigation and detection tasks, our network maintains interpretability by explicitly integrating the independent modules of the traditional pipeline, rather than a crude action regression. In detail, we adopt a set of motion primitives as anchors to cover the searching space regarding the feasible region and potential target. Then we reformulate the trajectory optimization as regression of primitive offsets and associated costs considering the safety, smoothness, and other metrics. For tracking task, the trajectories are expected to approach the target and additional objectness scores are predicted. Subsequently, the predictions, after compensation for the estimated lumped disturbance, are transformed into thrust and attitude as control commands for swift response. During training, we seamlessly integrate traditional motion planning with deep learning by directly back-propagating the gradients of trajectory costs to the network, eliminating the need for expert demonstration in imitation learning and providing more direct guidance than reinforcement learning. Finally, we deploy the algorithm on a compact quadrotor and conduct real-world validations in both forest and building environments to demonstrate the efficiency of the proposed method.

RedTeamLLM: an Agentic AI framework for offensive security

Authors:Brian Challita, Pierre Parrend
Date:2025-05-11 09:19:10

From automated intrusion testing to discovery of zero-day attacks before software launch, agentic AI calls for great promises in security engineering. This strong capability is bound with a similar threat: the security and research community must build up its models before the approach is leveraged by malicious actors for cybercrime. We therefore propose and evaluate RedTeamLLM, an integrated architecture with a comprehensive security model for automatization of pentest tasks. RedTeamLLM follows three key steps: summarizing, reasoning and act, which embed its operational capacity. This novel framework addresses four open challenges: plan correction, memory management, context window constraint, and generality vs. specialization. Evaluation is performed through the automated resolution of a range of entry-level, but not trivial, CTF challenges. The contribution of the reasoning capability of our agentic AI framework is specifically evaluated.

Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization

Authors:Saravanan Krishnan, Amith Singhee, Keerthi Narayan Raghunath, Alex Mathai, Atul Kumar, David Wenk
Date:2025-05-11 07:33:31

Industries such as banking, telecom, and airlines - o6en have large so6ware systems that are several decades old. Many of these systems are written in old programming languages such as COBOL, PL/1, Assembler, etc. In many cases, the documentation is not updated, and those who developed/designed these systems are no longer around. Understanding these systems for either modernization or even regular maintenance has been a challenge. An extensive application may have natural boundaries based on its code dependencies and architecture. There are also other logical boundaries in an enterprise setting driven by business functions, data domains, etc. Due to these complications, the system architects generally plan their modernization across these logical boundaries in parts, thereby adopting an incremental approach for the modernization journey of the entire system. In this work, we present a so6ware system analysis tool that allows a subject ma=er expert (SME) or system architect to analyze a large so6ware system incrementally. We analyze the source code and other artifacts (such as data schema) to create a knowledge graph using a customizable ontology/schema. Entities and relations in our ontology can be defined for any combination of programming languages and platforms. Using this knowledge graph, the analyst can then define logical boundaries around dependent Entities (e.g. Programs, Transactions, Database Tables etc.). Our tool then presents different views showcasing the dependencies from the newly defined boundary to/from the other logical groups of the system. This exercise is repeated interactively to 1) Identify the Entities and groupings of interest for a modernization task and 2) Understand how a change in one part of the system may affect the other parts. To validate the efficacy of our tool, we provide an initial study of our system on two client applications.

Efficient Robotic Policy Learning via Latent Space Backward Planning

Authors:Dongxiu Liu, Haoyi Niu, Zhihao Wang, Jinliang Zheng, Yinan Zheng, Zhonghong Ou, Jianming Hu, Jianxiong Li, Xianyuan Zhan
Date:2025-05-11 06:13:51

Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io

cpRRTC: GPU-Parallel RRT-Connect for Constrained Motion Planning

Authors:Jiaming Hu, Jiawei Wang, Henrik Christensen
Date:2025-05-11 00:14:39

Motion planning is a fundamental problem in robotics that involves generating feasible trajectories for a robot to follow. Recent advances in parallel computing, particularly through CPU and GPU architectures, have significantly reduced planning times to the order of milliseconds. However, constrained motion planning especially using sampling based methods on GPUs remains underexplored. Prior work such as pRRTC leverages a tracking compiler with a CUDA backend to accelerate forward kinematics and collision checking. While effective in simple settings, their approach struggles with increased complexity in robot models or environments. In this paper, we propose a novel GPU based framework utilizing NVRTC for runtime compilation, enabling efficient handling of high complexity scenarios and supporting constrained motion planning. Experimental results demonstrate that our method achieves superior performance compared to existing approaches.

Value Iteration with Guessing for Markov Chains and Markov Decision Processes

Authors:Krishnendu Chatterjee, Mahdi JafariRaviz, Raimundo Saona, Jakub Svoboda
Date:2025-05-10 22:24:49

Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path. The widely studied algorithmic approach for these problems is the Value Iteration (VI) algorithm which iteratively applies local updates called Bellman updates. There are many practical approaches for VI in the literature but they all require exponentially many Bellman updates for MCs in the worst case. A preprocessing step is an algorithm that is discrete, graph-theoretical, and requires linear space. An important open question is whether, after a polynomial-time preprocessing, VI can be achieved with sub-exponentially many Bellman updates. In this work, we present a new approach for VI based on guessing values. Our theoretical contributions are twofold. First, for MCs, we present an almost-linear-time preprocessing algorithm after which, along with guessing values, VI requires only subexponentially many Bellman updates. Second, we present an improved analysis of the speed of convergence of VI for MDPs. Finally, we present a practical algorithm for MDPs based on our new approach. Experimental results show that our approach provides a considerable improvement over existing VI-based approaches on several benchmark examples from the literature.

M3CAD: Towards Generic Cooperative Autonomous Driving Benchmark

Authors:Morui Zhu, Yongqi Zhu, Yihao Zhu, Qi Chen, Deyuan Qu, Song Fu, Qing Yang
Date:2025-05-10 19:47:44

We introduce M$^3$CAD, a novel benchmark designed to advance research in generic cooperative autonomous driving. M$^3$CAD comprises 204 sequences with 30k frames, spanning a diverse range of cooperative driving scenarios. Each sequence includes multiple vehicles and sensing modalities, e.g., LiDAR point clouds, RGB images, and GPS/IMU, supporting a variety of autonomous driving tasks, including object detection and tracking, mapping, motion forecasting, occupancy prediction, and path planning. This rich multimodal setup enables M$^3$CAD to support both single-vehicle and multi-vehicle autonomous driving research, significantly broadening the scope of research in the field. To our knowledge, M$^3$CAD is the most comprehensive benchmark specifically tailored for cooperative multi-task autonomous driving research. We evaluate the state-of-the-art end-to-end solution on M$^3$CAD to establish baseline performance. To foster cooperative autonomous driving research, we also propose E2EC, a simple yet effective framework for cooperative driving solution that leverages inter-vehicle shared information for improved path planning. We release M$^3$CAD, along with our baseline models and evaluation results, to support the development of robust cooperative autonomous driving systems. All resources will be made publicly available on https://github.com/zhumorui/M3CAD

STRIVE: Structured Representation Integrating VLM Reasoning for Efficient Object Navigation

Authors:Haokun Zhu, Zongtai Li, Zhixuan Liu, Wenshan Wang, Ji Zhang, Jonathan Francis, Jean Oh
Date:2025-05-10 18:38:41

Vision-Language Models (VLMs) have been increasingly integrated into object navigation tasks for their rich prior knowledge and strong reasoning abilities. However, applying VLMs to navigation poses two key challenges: effectively representing complex environment information and determining \textit{when and how} to query VLMs. Insufficient environment understanding and over-reliance on VLMs (e.g. querying at every step) can lead to unnecessary backtracking and reduced navigation efficiency, especially in continuous environments. To address these challenges, we propose a novel framework that constructs a multi-layer representation of the environment during navigation. This representation consists of viewpoint, object nodes, and room nodes. Viewpoints and object nodes facilitate intra-room exploration and accurate target localization, while room nodes support efficient inter-room planning. Building on this representation, we propose a novel two-stage navigation policy, integrating high-level planning guided by VLM reasoning with low-level VLM-assisted exploration to efficiently locate a goal object. We evaluated our approach on three simulated benchmarks (HM3D, RoboTHOR, and MP3D), and achieved state-of-the-art performance on both the success rate ($\mathord{\uparrow}\, 7.1\%$) and navigation efficiency ($\mathord{\uparrow}\, 12.5\%$). We further validate our method on a real robot platform, demonstrating strong robustness across 15 object navigation tasks in 10 different indoor environments. Project page is available at https://zwandering.github.io/STRIVE.github.io/ .

Motion Planning for Autonomous Vehicles: When Model Predictive Control Meets Ensemble Kalman Smoothing

Authors:Iman Askari, Yebin Wang, Vedeng M. Deshpande, Huazhen Fang
Date:2025-05-10 14:53:27

Safe and efficient motion planning is of fundamental importance for autonomous vehicles. This paper investigates motion planning based on nonlinear model predictive control (NMPC) over a neural network vehicle model. We aim to overcome the high computational costs that arise in NMPC of the neural network model due to the highly nonlinear and nonconvex optimization. In a departure from numerical optimization solutions, we reformulate the problem of NMPC-based motion planning as a Bayesian estimation problem, which seeks to infer optimal planning decisions from planning objectives. Then, we use a sequential ensemble Kalman smoother to accomplish the estimation task, exploiting its high computational efficiency for complex nonlinear systems. The simulation results show an improvement in computational speed by orders of magnitude, indicating the potential of the proposed approach for practical motion planning.

A Point-Based Algorithm for Distributional Reinforcement Learning in Partially Observable Domains

Authors:Larry Preuett III
Date:2025-05-10 05:19:32

In many real-world planning tasks, agents must tackle uncertainty about the environment's state and variability in the outcomes of any chosen policy. We address both forms of uncertainty as a first step toward safer algorithms in partially observable settings. Specifically, we extend Distributional Reinforcement Learning (DistRL)-which models the entire return distribution for fully observable domains-to Partially Observable Markov Decision Processes (POMDPs), allowing an agent to learn the distribution of returns for each conditional plan. Concretely, we introduce new distributional Bellman operators for partial observability and prove their convergence under the supremum p-Wasserstein metric. We also propose a finite representation of these return distributions via psi-vectors, generalizing the classical alpha-vectors in POMDP solvers. Building on this, we develop Distributional Point-Based Value Iteration (DPBVI), which integrates psi-vectors into a standard point-based backup procedure-bridging DistRL and POMDP planning. By tracking return distributions, DPBVI naturally enables risk-sensitive control in domains where rare, high-impact events must be carefully managed. We provide source code to foster further research in robust decision-making under partial observability.

Optimizing Railcar Movements to Create Outbound Trains in a Freight Railyard

Authors:Ruonan Zhao, Joseph Geunes, Xiaofeng Nie
Date:2025-05-10 04:49:31

A typical freight railyard at a manufacturing facility contains multiple tracks used for storage, classification, and outbound train assembly. Individual railcar storage locations on classification tracks are often determined before knowledge of their destination locations is known, giving rise to railcar shunting or switching problems, which require retrieving subsets of cars distributed throughout the yard to assemble outbound trains. To address this combinatorially challenging problem class, we propose a large-scale mixed-integer programming model that tracks railcar movements and corresponding costs over a finite planning horizon. The model permits simultaneous movement of multiple car groups via a locomotive and seeks to minimize repositioning costs. We also provide a dynamic programming formulation of the problem, demonstrate the NP-hardness of the corresponding optimization problem, and present an adaptive railcar grouping dynamic programming (ARG-DP) heuristic, which groups railcars with common destinations for efficient moves. Average results from a series of numerical experiments demonstrate the efficiency and quality of the ARG-DP algorithm in both simulated yards and a real yard. On average, across 60 test cases of simulated yards, the ARG-DP algorithm obtains solutions 355 times faster than solving the mixed-integer programming model using a commercial solver, while finding an optimal solution in 60% of the instances and maintaining an average optimality gap of 6.65%. In 10 cases based on the Gaia railyard in Portugal, the ARG-DP algorithm achieves solutions 229 times faster on average, finding an optimal solution in 50% of the instances with an average optimality gap of 6.90%.

Handling Pedestrian Uncertainty in Coordinating Autonomous Vehicles at Signal-Free Intersections

Authors:Filippos N. Tzortzoglou, Andreas A. Malikopoulos
Date:2025-05-09 23:20:45

In this paper, we provide a theoretical framework for the coordination of connected and automated vehicles (CAVs) at signal-free intersections, accounting for the unexpected presence of pedestrians. First, we introduce a general vehicle-to-infrastructure communication protocol and a low-level controller that determines the optimal unconstrained trajectories for CAVs, in terms of fuel efficiency and travel time, to cross the intersection without considering pedestrians. If such an unconstrained trajectory is unattainable, we introduce sufficient certificates for each CAV to cross the intersection while respecting the associated constraints. Next, we consider the case where an unexpected pedestrian enters the road. When the CAV's sensors detect a pedestrian, an emergency mode is activated, which imposes certificates related to an unsafe set in the pedestrian's proximity area. Simultaneously, a re-planning mechanism is implemented for all CAVs to accommodate the trajectories of vehicles operating in emergency mode. Finally, we validate the efficacy of our approach through simulations conducted in MATLAB and RoadRunner softwares, which facilitate the integration of sensor tools and the realization of real-time implementation.

The CHIMERAS Project: Design Framework for the Collisionless HIgh-beta Magnetized Experiment Researching Astrophysical Systems

Authors:S. Dorfman, S. Bose, E. Lichko, M. Abler, J. Juno, J. M. TenBarge, Y. Zhang, S. Chakraborty Thakur, C. A. Cartagena-Sanchez, P. Tatum, E. Scime, G. Joshi, S. Greess, C. Kuchta
Date:2025-05-09 20:48:44

From the near-Earth solar wind to the intracluster medium of galaxy clusters, collisionless, high-beta, magnetized plasmas pervade our universe. Energy and momentum transport from large-scale fields and flows to small scale motions of plasma particles is ubiquitous in these systems, but a full picture of the underlying physical mechanisms remains elusive. The transfer is often mediated by a turbulent cascade of Alfv{\'e}nic fluctuations as well as a variety of kinetic instabilities; these processes tend to be multi-scale and/or multi-dimensional, which makes them difficult to study using spacecraft missions and numerical simulations alone (Dorfman et al. 2023; Lichko et al. 2020, 2023). Meanwhile, existing laboratory devices struggle to produce the collisionless, high ion beta ($\beta_i \gtrsim 1$), magnetized plasmas across the range of scales necessary to address these problems. As envisioned in recent community planning documents (Carter et al. 2020; Milchberg and Scime 2020; Baalrud et al. 2020; Dorfman et al. 2023; National Academies of Sciences, Engineering, and Medicine 2024, it is therefore important to build a next generation laboratory facility to create a $\beta_i \gtrsim 1$, collisionless, magnetized plasma in the laboratory for the first time. A Working Group has been formed and is actively defining the necessary technical requirements to move the facility towards a construction-ready state. Recent progress includes the development of target parameters and diagnostic requirements as well as the identification of a need for source-target device geometry. As the working group is already leading to new synergies across the community, we anticipate a broad community of users funded by a variety of federal agencies (including NASA, DOE, and NSF) to make copious use of the future facility.

Learning Sequential Kinematic Models from Demonstrations for Multi-Jointed Articulated Objects

Authors:Anmol Gupta, Weiwei Gu, Omkar Patil, Jun Ki Lee, Nakul Gopalan
Date:2025-05-09 18:09:06

As robots become more generalized and deployed in diverse environments, they must interact with complex objects, many with multiple independent joints or degrees of freedom (DoF) requiring precise control. A common strategy is object modeling, where compact state-space models are learned from real-world observations and paired with classical planning. However, existing methods often rely on prior knowledge or focus on single-DoF objects, limiting their applicability. They also fail to handle occluded joints and ignore the manipulation sequences needed to access them. We address this by learning object models from human demonstrations. We introduce Object Kinematic Sequence Machines (OKSMs), a novel representation capturing both kinematic constraints and manipulation order for multi-DoF objects. To estimate these models from point cloud data, we present Pokenet, a deep neural network trained on human demonstrations. We validate our approach on 8,000 simulated and 1,600 real-world annotated samples. Pokenet improves joint axis and state estimation by over 20 percent on real-world data compared to prior methods. Finally, we demonstrate OKSMs on a Sawyer robot using inverse kinematics-based planning to manipulate multi-DoF objects.

A Practical Guide to Hosting a Virtual Conference

Authors:Cameron Hummels, Benjamin Oppenheimer, G. Mark Voit, Jessica Werk
Date:2025-05-09 18:00:00

Virtual meetings have long been the outcast of scientific interaction. For many of us, the COVID-19 pandemic has only strengthened that sentiment as countless Zoom meetings have left us bored and exhausted. But remote conferences do not have to be negative experiences. If well designed, they have some distinct advantages over conventional in-person meetings, including universal access, longevity of content, as well as minimal costs and carbon footprint. This article details our experiences as organizers of a successful fully virtual scientific conference, the KITP program "Fundamentals of Gaseous Halos" hosted over 8 weeks in winter 2021. Herein, we provide detailed recommendations on planning and optimization of remote meetings, with application to traditional in-person events as well. We hope these suggestions will assist organizers of future virtual conferences and workshops.

A Proton Treatment Planning Method for Combining FLASH and Spatially Fractionated Radiation Therapy to Enhance Normal Tissue Protection

Authors:Weijie Zhang, Xue Hong, Ya-Nan Zhu, Yuting Lin, Gregory Gan, Ronald C Chen, Hao Gao
Date:2025-05-09 17:57:30

Background: FLASH radiation therapy (FLASH-RT) uses ultra-high dose rates to induce the FLASH effect, enhancing normal tissue sparing. In proton Bragg peak FLASH-RT, this effect is confined to high-dose regions near the target at deep tissue levels. In contrast, Spatially Fractionated Radiation Therapy (SFRT) creates alternating high- and low-dose regions with high peak-to-valley dose ratios (PVDR), sparing tissues at shallow-to-intermediate depths. Purpose: This study investigates a novel proton modality (SFRT-FLASH) that synergizes FLASH-RT and SFRT to enhance normal tissue protection across all depths. Methods: Two SFRT techniques are integrated with FLASH-RT: proton GRID therapy (pGRID) with conventional beam sizes and proton minibeam radiation therapy (pMBRT) with submillimeter beams. These are implemented as pGRID-FLASH (SB-FLASH) and minibeam-FLASH (MB-FLASH), respectively. The pGRID technique uses a scissor-beam (SB) method to achieve uniform target coverage. To meet FLASH dose (5 Gy) and dose-rate (40 Gy/s) thresholds, a single-field uniform-dose-per-fraction strategy is used. Dose and dose-rate constraints are jointly optimized, including a CTV1cm structure (a 1 cm ring around the CTV) for each field. Results: Across four clinical cases, MB-FLASH and SB-FLASH plans were benchmarked against conventional (CONV), FLASH-RT (FLASH), pMBRT (MB), and pGRID (SB) plans. SFRT-FLASH achieved high FLASH effect coverage (~60-80% in CTV1cm) while preserving PVDR (~2.5-7) at shallow-to-intermediate depths. Conclusions: We present a proton treatment planning approach that combines the FLASH effect at depth with high PVDR near the surface, enhancing normal tissue protection and advancing proton therapy.

KRRF: Kinodynamic Rapidly-exploring Random Forest algorithm for multi-goal motion planning

Authors:Petr Ježek, Michal Minařík, Vojtěch Vonásek, Robert Pěnička
Date:2025-05-09 15:29:11

The problem of kinodynamic multi-goal motion planning is to find a trajectory over multiple target locations with an apriori unknown sequence of visits. The objective is to minimize the cost of the trajectory planned in a cluttered environment for a robot with a kinodynamic motion model. This problem has yet to be efficiently solved as it combines two NP-hard problems, the Traveling Salesman Problem~(TSP) and the kinodynamic motion planning problem. We propose a novel approximate method called Kinodynamic Rapidly-exploring Random Forest~(KRRF) to find a collision-free multi-goal trajectory that satisfies the motion constraints of the robot. KRRF simultaneously grows kinodynamic trees from all targets towards all other targets while using the other trees as a heuristic to boost the growth. Once the target-to-target trajectories are planned, their cost is used to solve the TSP to find the sequence of targets. The final multi-goal trajectory satisfying kinodynamic constraints is planned by guiding the RRT-based planner along the target-to-target trajectories in the TSP sequence. Compared with existing approaches, KRRF provides shorter target-to-target trajectories and final multi-goal trajectories with $1.1-2$ times lower costs while being computationally faster in most test cases. The method will be published as an open-source library.

The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review

Authors:Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying
Date:2025-05-09 15:17:00

Automatic lymph node segmentation is the cornerstone for advances in computer vision tasks for early detection and staging of cancer. Traditional segmentation methods are constrained by manual delineation and variability in operator proficiency, limiting their ability to achieve high accuracy. The introduction of deep learning technologies offers new possibilities for improving the accuracy of lymph node image analysis. This study evaluates the application of deep learning in lymph node segmentation and discusses the methodologies of various deep learning architectures such as convolutional neural networks, encoder-decoder networks, and transformers in analyzing medical imaging data across different modalities. Despite the advancements, it still confronts challenges like the shape diversity of lymph nodes, the scarcity of accurately labeled datasets, and the inadequate development of methods that are robust and generalizable across different imaging modalities. To the best of our knowledge, this is the first study that provides a comprehensive overview of the application of deep learning techniques in lymph node segmentation task. Furthermore, this study also explores potential future research directions, including multimodal fusion techniques, transfer learning, and the use of large-scale pre-trained models to overcome current limitations while enhancing cancer diagnosis and treatment planning strategies.