planning - 2025-04-14

The Early History of the Quark-Gluon Plasma

Authors:W. Busza, W. A. Zajc
Date:2025-04-11 17:34:13

We present the historical antecedents to the field of relativistic heavy ion physics, beginning with early attempts to model the strong interaction and ending with the endorsement of a relativistic heavy ion collider in the 1983 U.S. Long-Range Plan for Nuclear Science. Particular attention is paid to two major themes: 1) A program to study high density states of nuclear matter emerging from the 1974 Bear Mountain conference and 2) Efforts to understand the predictions of QCD for matter at high densities and/or temperatures.

Safe Flow Matching: Robot Motion Planning with Control Barrier Functions

Authors:Xiaobing Dai, Dian Yu, Shanshan Zhang, Zewen Yang
Date:2025-04-11 16:10:58

Recent advances in generative modeling have led to promising results in robot motion planning, particularly through diffusion and flow-based models that capture complex, multimodal trajectory distributions. However, these methods are typically trained offline and remain limited when faced with unseen environments or dynamic constraints, often lacking explicit mechanisms to ensure safety during deployment. In this work, we propose, Safe Flow Matching (SafeFM), a motion planning approach for trajectory generation that integrates flow matching with safety guarantees. By incorporating the proposed flow matching barrier functions, SafeFM ensures that generated trajectories remain within safe regions throughout the planning horizon, even in the presence of previously unseen obstacles or state-action constraints. Unlike diffusion-based approaches, our method allows for direct, efficient sampling of constraint-satisfying trajectories, making it well-suited for real-time motion planning. We evaluate SafeFM on a diverse set of tasks, including planar robot navigation and 7-DoF manipulation, demonstrating superior safety, generalization, and planning performance compared to state-of-the-art generative planners. Comprehensive resources are available on the project website: https://safeflowmatching.github.io/SafeFM/

Reinforcement Learning-Driven Plant-Wide Refinery Planning Using Model Decomposition

Authors:Zhouchang Li, Runze Lin, Hongye Su, Lei Xie
Date:2025-04-11 15:42:49

In the era of smart manufacturing and Industry 4.0, the refining industry is evolving towards large-scale integration and flexible production systems. In response to these new demands, this paper presents a novel optimization framework for plant-wide refinery planning, integrating model decomposition with deep reinforcement learning. The approach decomposes the complex large scale refinery optimization problem into manageable submodels, improving computational efficiency while preserving accuracy. A reinforcement learning-based pricing mechanism is introduced to generate pricing strategies for intermediate products, facilitating better coordination between submodels and enabling rapid responses to market changes. Three industrial case studies, covering both single-period and multi-period planning, demonstrate significant improvements in computational efficiency while ensuring refinery profitability.

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Authors:Jialu Li, Shoubin Yu, Han Lin, Jaemin Cho, Jaehong Yoon, Mohit Bansal
Date:2025-04-11 15:41:43

Recent advancements in text-to-video (T2V) diffusion models have significantly enhanced the visual quality of the generated videos. However, even recent T2V models find it challenging to follow text descriptions accurately, especially when the prompt requires accurate control of spatial layouts or object trajectories. A recent line of research uses layout guidance for T2V models that require fine-tuning or iterative manipulation of the attention map during inference time. This significantly increases the memory requirement, making it difficult to adopt a large T2V model as a backbone. To address this, we introduce Video-MSG, a training-free Guidance method for T2V generation based on Multimodal planning and Structured noise initialization. Video-MSG consists of three steps, where in the first two steps, Video-MSG creates Video Sketch, a fine-grained spatio-temporal plan for the final video, specifying background, foreground, and object trajectories, in the form of draft video frames. In the last step, Video-MSG guides a downstream T2V diffusion model with Video Sketch through noise inversion and denoising. Notably, Video-MSG does not need fine-tuning or attention manipulation with additional memory during inference time, making it easier to adopt large T2V models. Video-MSG demonstrates its effectiveness in enhancing text alignment with multiple T2V backbones (VideoCrafter2 and CogVideoX-5B) on popular T2V generation benchmarks (T2VCompBench and VBench). We provide comprehensive ablation studies about noise inversion ratio, different background generators, background object detection, and foreground object segmentation.

Tactile sensing enables vertical obstacle negotiation for elongate many-legged robots

Authors:Juntao He, Baxi Chong, Vincent R Nienhusser, Massimiliano Iaschi, Sehoon Ha, Daniel I. Goldman
Date:2025-04-11 15:20:31

Many-legged elongated robots show promise for reliable mobility on rugged landscapes. However, most studies on these systems focus on motion planning in the 2D horizontal plane (e.g., translation and rotation) without addressing rapid vertical motion. Despite their success on mild rugged terrains, recent field tests reveal a critical need for 3D behaviors (e.g., climbing or traversing tall obstacles) in real-world application. The challenges of 3D motion planning partially lie in designing sensing and control for a complex high-degree-of-freedom system, typically with over 25 degrees of freedom. To address the first challenge, we propose a tactile antenna system that enables the robot to probe obstacles and gather information about the structure of the environment. Building on this sensory input, we develop a control framework that integrates data from the antenna and foot contact sensors to dynamically adjust the robot's vertical body undulation for effective climbing. With the addition of simple, low-bandwidth tactile sensors, a robot with high static stability and redundancy exhibits predictable climbing performance in complex environments using a simple feedback controller. Laboratory and outdoor experiments demonstrate the robot's ability to climb obstacles up to five times its height. Moreover, the robot exhibits robust climbing capabilities on obstacles covered with flowable, robot-sized random items and those characterized by rapidly changing curvatures. These findings demonstrate an alternative solution to perceive the environment and facilitate effective response for legged robots, paving ways towards future highly capable, low-profile many-legged robots.

FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment

Authors:Sebastián Barbas Laina, Simon Boche, Sotiris Papatheodorou, Simon Schaefer, Jaehyung Jung, Stefan Leutenegger
Date:2025-04-11 15:12:05

Geometrically accurate and semantically expressive map representations have proven invaluable to facilitate robust and safe mobile robot navigation and task planning. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments is still an open problem. In this paper we present FindAnything, an open-world mapping and exploration framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything bridges the gap between pure geometric and open-vocabulary semantic information for a higher level of understanding while allowing to explore any environment without the help of any external source of ground-truth pose information. We represent the environment as a series of volumetric occupancy submaps, resulting in a robust and accurate map representation that deforms upon pose updates when the underlying SLAM system corrects its drift, allowing for a locally consistent representation between submaps. Pixel-wise vision-language features are aggregated from efficient SAM (eSAM)-generated segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from open-vocabulary queries to 3D geometry that is scalable also in terms of memory usage. The open-vocabulary map representation of FindAnything achieves state-of-the-art semantic accuracy in closed-set evaluations on the Replica dataset. This level of scene understanding allows a robot to explore environments based on objects or areas of interest selected via natural language queries. Our system is the first of its kind to be deployed on resource-constrained devices, such as MAVs, leveraging vision-language information for real-world robotic tasks.

Enabling Safety for Aerial Robots: Planning and Control Architectures

Authors:Kaleb Ben Naveed, Devansh R. Agrawal, Daniel M. Cherenson, Haejoon Lee, Alia Gilbert, Hardik Parwana, Vishnu S. Chipade, William Bentz, Dimitra Panagou
Date:2025-04-11 15:05:31

Ensuring safe autonomy is crucial for deploying aerial robots in real-world applications. However, safety is a multifaceted challenge that must be addressed from multiple perspectives, including navigation in dynamic environments, operation under resource constraints, and robustness against adversarial attacks and uncertainties. In this paper, we present the authors' recent work that tackles some of these challenges and highlights key aspects that must be considered to enhance the safety and performance of autonomous aerial systems. All presented approaches are validated through hardware experiments.

Sectoral and spatial decomposition methods for multi-sector capacity expansion models

Authors:Federico Parolin, Yu Weng, Paolo Colbertaldo, Ruaridh Macdonald
Date:2025-04-11 13:09:48

Multi-sector capacity expansion models play a crucial role in energy planning by providing decision support for policymaking in technology development. To ensure reliable support, these models require high technological, spatial, and temporal resolution, leading to large-scale linear programming problems that are often computationally intractable. To address this challenge, conventional approaches rely on simplifying abstractions that trade accuracy for computational efficiency. Benders decomposition has been widely explored to improve computational efficiency in electricity capacity expansion models. Specifically, state-of-the-art methods have primarily focused on improving performance through temporal decomposition. However, multi-sector models introduce additional complexity, requiring new decomposition strategies. In this work, we propose a budget-based formulation to extend decomposition to the sectoral and spatial domains. We test the developed sectoral and spatial Benders decomposition algorithms on case studies of the continental United States, considering different configurations in terms of spatial and temporal resolution. Results show that our algorithms achieve substantial performance improvement compared to existing decomposition algorithms, with runtime reductions within 15%-70%. The proposed methods leverage the generic structure of multi-sector capacity expansion models, and can thus be applied to most existing energy planning models, ensuring computational tractability without sacrificing resolution.

Diffusion Models for Robotic Manipulation: A Survey

Authors:Rosa Wolf, Yitian Shi, Sheng Liu, Rania Rayyes
Date:2025-04-11 11:01:11

Diffusion generative models have demonstrated remarkable success in visual domains such as image and video generation. They have also recently emerged as a promising approach in robotics, especially in robot manipulations. Diffusion models leverage a probabilistic framework, and they stand out with their ability to model multi-modal distributions and their robustness to high-dimensional input and output spaces. This survey provides a comprehensive review of state-of-the-art diffusion models in robotic manipulation, including grasp learning, trajectory planning, and data augmentation. Diffusion models for scene and image augmentation lie at the intersection of robotics and computer vision for vision-based tasks to enhance generalizability and data scarcity. This paper also presents the two main frameworks of diffusion models and their integration with imitation learning and reinforcement learning. In addition, it discusses the common architectures and benchmarks and points out the challenges and advantages of current state-of-the-art diffusion-based methods.

GeoTexBuild: 3D Building Model Generation from Map Footprints

Authors:Ruizhe Wang, Junyan Yang, Qiao Wang
Date:2025-04-11 10:23:55

We introduce GeoTexBuild, a modular generative framework for creating 3D building models from map footprints. The proposed framework employs a three-stage process comprising height map generation, geometry reconstruction, and appearance stylization, culminating in building models with intricate geometry and appearance attributes. By integrating customized ControlNet and Text2Mesh models, we explore effective methods for controlling both geometric and visual attributes during the generation process. By this, we eliminate the problem of structural variations behind a single facade photo of the existing 3D generation techniques. Experimental results at each stage validate the capability of GeoTexBuild to generate detailed and accurate building models from footprints derived from site planning or map designs. Our framework significantly reduces manual labor in modeling buildings and can offer inspiration for designers.

Approximation Algorithms for the UAV Path Planning with Object Coverage Constraints

Authors:Jiawei Wang, Vincent Chau, Weiwei Wu
Date:2025-04-11 10:11:36

We study the problem of the Unmanned Aerial Vehicle (UAV) such that a specific set of objects needs to be observed while ensuring a quality of observation. Our goal is to determine the shortest path for the UAV. This paper proposes an offline algorithm with an approximation of $(2+2n)(1+\epsilon)$ where $\epsilon >0$ is a small constant, and $n$ is the number of objects. We then propose several online algorithms in which objects are discovered during the process. To evaluate the performance of these algorithms, we conduct experimental comparisons. Our results show that the online algorithms perform similarly to the offline algorithm, but with significantly faster execution times ranging from 0.01 seconds to 200 seconds. We also show that our methods yield solutions with costs comparable to those obtained by the Gurobi optimizer that requires 30000 seconds of runtime.

RINGO: Real-time Navigation with a Guiding Trajectory for Aerial Manipulators in Unknown Environments

Authors:Zhang Zhaopeng, Wu Shizhen, Guo Chenfeng, Fang Yongchun, Han Jianda, Liang Xiao
Date:2025-04-11 08:07:58

Motion planning for aerial manipulators in constrained environments has typically been limited to known environments or simplified to that of multi-rotors, which leads to poor adaptability and overly conservative trajectories. This paper presents RINGO:~Real-time Navigation with a Guiding Trajectory, a novel planning framework that enables aerial manipulators to navigate unknown environments in real time. The proposed method simultaneously considers the positions of both the multi-rotor and the end-effector. A pre-obtained multi-rotor trajectory serves as a guiding reference, allowing the end-effector to generate a smooth, collision-free, and workspace-compatible trajectory. Leveraging the convex hull property of B-spline curves, we theoretically guarantee that the trajectory remains within the reachable workspace. To the best of our knowledge, this is the first work that enables real-time navigation of aerial manipulators in unknown environments. The simulation and experimental results show the effectiveness of the proposed method. The proposed method generates less conservative trajectories than approaches that consider only the multi-rotor.

Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing

Authors:Tobias Pfandzelter, Nikita Bauer, Alexander Leis, Corentin Perdrizet, Felix Trautwein, Trever Schirmer, Osama Abboud, David Bermbach
Date:2025-04-11 08:06:59

Orbital edge computing reduces the data transmission needs of Earth observation satellites by processing sensor data on-board, allowing near-real-time insights while minimizing downlink costs. However, current orbital edge computing architectures are inflexible, requiring custom mission planning and high upfront development costs. In this paper, we propose a novel approach: shared Earth observation satellites that are operated by a central provider but used by multiple tenants. Each tenant can execute their own logic on-board the satellite to filter, prioritize, and analyze sensor data. We introduce Trabant, a serverless architecture for shared satellite platforms, leveraging the Function-as-a-Service (FaaS) paradigm and time-shifted computing. This architecture abstracts operational complexities, enabling dynamic scheduling under satellite resource constraints, reducing deployment overhead, and aligning event-driven satellite observations with intermittent computation. We present the design of Trabant, demonstrate its capabilities with a proof-of-concept prototype, and evaluate it using real satellite computing telemetry data. Our findings suggest that Trabant can significantly reduce mission planning overheads, offering a scalable and efficient platform for diverse Earth observation missions.

Production and manipulation of stable frozen hydrogen filaments

Authors:Jost Froning, Christian Mannweiler, Eva-Maria Hausch, Anna-Luna Hannen, Alfons Khoukaz
Date:2025-04-11 07:22:47

Frozen hydrogen filaments or droplets/pellets in vacuum are of great interest as target for many experiments at hadron and lepton accelerators as well as at high-power laser systems. Especially in case of large distances between the position of target beam generation and the interaction point with an accelerator or laser beam, a high target beam stability in space and time is crucial. Here we present recent results on the long-term stability of frozen hydrogen filaments in vacuum, which have been obtained using a new cryogenic target generator. It could be shown that the trajectory of the frozen hydrogen target beam with a diameter of $10\,\mathrm{\mu m}$ remains stable for over 60 hours, with angular fluctuations below $0.08\,^\circ$. Furthermore, we present a novel strategy for the manipulation of the target beam, named cryobending. Here, the produced hydrogen filament is deflected by helium gas emerging from correction nozzles. We demonstrate the deflection of the hydrogen beam with one and two nozzles, achieving deflection angles up to $(15.7\pm 0.4)\,^\circ$. The presented results open the door for further developments of this beam position system, which enables target beam adjustment without any mechanical movement of the target components itself. Potential applications of these stable hydrogen target filaments are, e.g., the upcoming MAGIX (MAinz Gas Injection target eXperiment) or the planned $\mathrm{\bar{P}ANDA}$ (anti-Proton ANnihilation at DArmstadt) experiment.

Interior Point Differential Dynamic Programming, Redux

Authors:Ming Xu, Stephen Gould, Iman Shames
Date:2025-04-11 06:18:46

We present IPDDP2, a structure-exploiting algorithm for solving discrete-time, finite horizon optimal control problems with nonlinear constraints. Inequality constraints are handled using a primal-dual interior point formulation and step acceptance for equality constraints follows a line-search filter approach. The iterates of the algorithm are derived under the Differential Dynamic Programming (DDP) framework. Our numerical experiments evaluate IPDDP2 on four robotic motion planning problems. IPDDP2 reliably converges to low optimality error and exhibits local quadratic and global convergence from remote starting points. Notably, we showcase the robustness of IPDDP2 by using it to solve a contact-implicit, joint limited acrobot swing-up problem involving complementarity constraints from a range of initial conditions. We provide a full implementation of IPDDP2 in the Julia programming language.

CICV5G: A 5G Communication Delay Dataset for PnC in Cloud-based Intelligent Connected Vehicles

Authors:Xinrui Zhang, Peizhi Zhang, Junpeng Huang, Haojie Feng, Yining Ma, Feng Shen, Lu Xiong
Date:2025-04-11 04:37:52

Cloud-based intelligent connected vehicles (CICVs) leverage cloud computing and vehicle-to-everything (V2X) to enable efficient information exchange and cooperative control. However, communication delay is a critical factor in vehicle-cloud interactions, potentially deteriorating the planning and control (PnC) performance of CICVs. To explore whether the new generation of communication technology, 5G, can support the PnC of CICVs, we present CICV5G, a publicly available 5G communication delay dataset for the PnC of CICVs. This dataset offers real-time delay variations across diverse traffic environments, velocity, data transmission frequencies, and network conditions. It contains over 300,000 records, with each record consists of the network performance indicators (e.g., cell ID, reference signal received power, and signal-to-noise ratio) and PnC related data (e.g., position). Based on the CICV5G, we compare the performance of CICVs with that of autonomous vehicles and examine how delay impacts the PnC of CICVs. The object of this dataset is to support research in developing more accurate communication models and to provide a valuable reference for scheme development and network deployment for CICVs. To ensure that the research community can benefit from this work, our dataset and accompanying code are made publicly available.

Stereophotoclinometry Revisited

Authors:Travis Driver, Andrew Vaughan, Yang Cheng, Adnan Ansar, John Christian, Panagiotis Tsiotras
Date:2025-04-11 04:33:56

Image-based surface reconstruction and characterization is crucial for missions to small celestial bodies, as it informs mission planning, navigation, and scientific analysis. However, current state-of-the-practice methods, such as stereophotoclinometry (SPC), rely heavily on human-in-the-loop verification and high-fidelity a priori information. This paper proposes Photoclinometry-from-Motion (PhoMo), a novel framework that incorporates photoclinometry techniques into a keypoint-based structure-from-motion (SfM) system to estimate the surface normal and albedo at detected landmarks to improve autonomous surface and shape characterization of small celestial bodies from in-situ imagery. In contrast to SPC, we forego the expensive maplet estimation step and instead use dense keypoint measurements and correspondences from an autonomous keypoint detection and matching method based on deep learning. Moreover, we develop a factor graph-based approach allowing for simultaneous optimization of the spacecraft's pose, landmark positions, Sun-relative direction, and surface normals and albedos via fusion of Sun vector measurements and image keypoint measurements. The proposed framework is validated on real imagery taken by the Dawn mission to the asteroid 4 Vesta and the minor planet 1 Ceres and compared against an SPC reconstruction, where we demonstrate superior rendering performance compared to an SPC solution and precise alignment to a stereophotogrammetry (SPG) solution without relying on any a priori camera pose and topography information or humans-in-the-loop.

Neural Network-assisted Interval Reachability for Systems with Control Barrier Function-Based Safe Controllers

Authors:Damola Ajeyemi, Saber Jafarpour, Emiliano Dall'Anese
Date:2025-04-11 04:14:55

Control Barrier Functions (CBFs) have been widely utilized in the design of optimization-based controllers and filters for dynamical systems to ensure forward invariance of a given set of safe states. While CBF-based controllers offer safety guarantees, they can compromise the performance of the system, leading to undesirable behaviors such as unbounded trajectories and emergence of locally stable spurious equilibria. Computing reachable sets for systems with CBF-based controllers is an effective approach for runtime performance and stability verification, and can potentially serve as a tool for trajectory re-planning. In this paper, we propose a computationally efficient interval reachability method for performance verification of systems with optimization-based controllers by: (i) approximating the optimization-based controller by a pre-trained neural network to avoid solving optimization problems repeatedly, and (ii) using mixed monotone theory to construct an embedding system that leverages state-of-the-art neural network verification algorithms for bounding the output of the neural network. Results in terms of closeness of solutions of trajectories of the system with the optimization-based controller and the neural network are derived. Using a single trajectory of the embedding system along with our closeness of solutions result, we obtain an over-approximation of the reachable set of the system with optimization-based controllers. Numerical results are presented to corroborate the technical findings.

CATCH-FORM-ACTer: Compliance-Aware Tactile Control and Hybrid Deformation Regulation-Based Action Transformer for Viscoelastic Object Manipulation

Authors:Hongjun Ma, Weichang Li, Jingwei Zhang, Shenlai He, Xiaoyan Deng
Date:2025-04-11 03:40:22

Automating contact-rich manipulation of viscoelastic objects with rigid robots faces challenges including dynamic parameter mismatches, unstable contact oscillations, and spatiotemporal force-deformation coupling. In our prior work, a Compliance-Aware Tactile Control and Hybrid Deformation Regulation (CATCH-FORM-3D) strategy fulfills robust and effective manipulations of 3D viscoelastic objects, which combines a contact force-driven admittance outer loop and a PDE-stabilized inner loop, achieving sub-millimeter surface deformation accuracy. However, this strategy requires fine-tuning of object-specific parameters and task-specific calibrations, to bridge this gap, a CATCH-FORM-ACTer is proposed, by enhancing CATCH-FORM-3D with a framework of Action Chunking with Transformer (ACT). An intuitive teleoperation system performs Learning from Demonstration (LfD) to build up a long-horizon sensing, decision-making and execution sequences. Unlike conventional ACT methods focused solely on trajectory planning, our approach dynamically adjusts stiffness, damping, and diffusion parameters in real time during multi-phase manipulations, effectively imitating human-like force-deformation modulation. Experiments on single arm/bimanual robots in three tasks show better force fields patterns and thus 10%-20% higher success rates versus conventional methods, enabling precise, safe interactions for industrial, medical or household scenarios.

Graph Based Deep Reinforcement Learning Aided by Transformers for Multi-Agent Cooperation

Authors:Michael Elrod, Niloufar Mehrabi, Rahul Amin, Manveen Kaur, Long Cheng, Jim Martin, Abolfazl Razi
Date:2025-04-11 01:46:18

Mission planning for a fleet of cooperative autonomous drones in applications that involve serving distributed target points, such as disaster response, environmental monitoring, and surveillance, is challenging, especially under partial observability, limited communication range, and uncertain environments. Traditional path-planning algorithms struggle in these scenarios, particularly when prior information is not available. To address these challenges, we propose a novel framework that integrates Graph Neural Networks (GNNs), Deep Reinforcement Learning (DRL), and transformer-based mechanisms for enhanced multi-agent coordination and collective task execution. Our approach leverages GNNs to model agent-agent and agent-goal interactions through adaptive graph construction, enabling efficient information aggregation and decision-making under constrained communication. A transformer-based message-passing mechanism, augmented with edge-feature-enhanced attention, captures complex interaction patterns, while a Double Deep Q-Network (Double DQN) with prioritized experience replay optimizes agent policies in partially observable environments. This integration is carefully designed to address specific requirements of multi-agent navigation, such as scalability, adaptability, and efficient task execution. Experimental results demonstrate superior performance, with 90% service provisioning and 100% grid coverage (node discovery), while reducing the average steps per episode to 200, compared to 600 for benchmark methods such as particle swarm optimization (PSO), greedy algorithms and DQN.

Enhanced Cooperative Perception Through Asynchronous Vehicle to Infrastructure Framework with Delay Mitigation for Connected and Automated Vehicles

Authors:Nithish Kumar Saravanan, Varun Jammula, Yezhou Yang, Jeffrey Wishart, Junfeng Zhao
Date:2025-04-10 23:48:22

Perception is a key component of Automated vehicles (AVs). However, sensors mounted to the AVs often encounter blind spots due to obstructions from other vehicles, infrastructure, or objects in the surrounding area. While recent advancements in planning and control algorithms help AVs react to sudden object appearances from blind spots at low speeds and less complex scenarios, challenges remain at high speeds and complex intersections. Vehicle to Infrastructure (V2I) technology promises to enhance scene representation for AVs in complex intersections, providing sufficient time and distance to react to adversary vehicles violating traffic rules. Most existing methods for infrastructure-based vehicle detection and tracking rely on LIDAR, RADAR or sensor fusion methods, such as LIDAR-Camera and RADAR-Camera. Although LIDAR and RADAR provide accurate spatial information, the sparsity of point cloud data limits its ability to capture detailed object contours of objects far away, resulting in inaccurate 3D object detection results. Furthermore, the absence of LIDAR or RADAR at every intersection increases the cost of implementing V2I technology. To address these challenges, this paper proposes a V2I framework that utilizes monocular traffic cameras at road intersections to detect 3D objects. The results from the roadside unit (RSU) are then combined with the on-board system using an asynchronous late fusion method to enhance scene representation. Additionally, the proposed framework provides a time delay compensation module to compensate for the processing and transmission delay from the RSU. Lastly, the V2I framework is tested by simulating and validating a scenario similar to the one described in an industry report by Waymo. The results show that the proposed method improves the scene representation and the AV's perception range, giving enough time and space to react to adversary vehicles.

Learning Object Focused Attention

Authors:Vivek Trivedy, Amani Almalki, Longin Jan Latecki
Date:2025-04-10 23:23:26

We propose an adaptation to the training of Vision Transformers (ViTs) that allows for an explicit modeling of objects during the attention computation. This is achieved by adding a new branch to selected attention layers that computes an auxiliary loss which we call the object-focused attention (OFA) loss. We restrict the attention to image patches that belong to the same object class, which allows ViTs to gain a better understanding of configural (or holistic) object shapes by focusing on intra-object patches instead of other patches such as those in the background. Our proposed inductive bias fits easily into the attention framework of transformers since it only adds an auxiliary loss over selected attention layers. Furthermore, our approach has no additional overhead during inference. We also experiment with multiscale masking to further improve the performance of our OFA model and give a path forward for self-supervised learning with our method. Our experimental results demonstrate that ViTs with OFA achieve better classification results than their base models, exhibit a stronger generalization ability to out-of-distribution (OOD) and adversarially corrupted images, and learn representations based on object shapes rather than spurious correlations via general textures. For our OOD setting, we generate a novel dataset using the COCO dataset and Stable Diffusion inpainting which we plan to share with the community.

Modeling Robust Energy Systems Considering Weather Uncertainty and Nuclear Power Failures: A Case Study in Northern Europe

Authors:Kamran Forghani, Xiaoming Kan, Lina Reichenberg, Fredrik Hedenus
Date:2025-04-10 22:14:21

Capacity expansion models used for policy support have increasingly represented both the variability and uncertainty of weather-dependent generation (wind and solar). However, although also uncertain, as demonstrated by the performance of the French nuclear power fleet in 2022, uncertainty arising from nuclear power outages has been largely neglected in the literature. This paper presents the first capacity expansion model that considers uncertainty in nuclear power availability caused by unplanned outages. We propose a mathematical model that combines a scenario-based stochastic optimization approach (to deal with weather-related uncertainties) with a data-driven adjustable robust optimization approach (to deal with nuclear failure-related uncertainties). The robust model represents the bulky behavior of nuclear power plants, with large (1 GW) units that are either on or off, while at the same time letting the model decide on the optimal amount of nuclear capacity. We tested the model in a case for Northern Europe (seven nodes) with a time resolution of 1250 time steps. Our findings show that nuclear power outages do, in fact, impose a vulnerability on the energy system if not considered in the planning phase. Our proposed model performs well and finds solutions that prevent Loss-of-Load (at a price of robustness of 0.6%), even in more extreme weather conditions. Robust solutions are characterized by a higher capacity of gas plants, but, perhaps surprisingly, nuclear power capacity is barely affected.

Geometric and Dosimetric Validation of Deformable Image Registration for Prostate MR-guided Adaptive Radiotherapy

Authors:Victor N. Malkov, Iymad R. Mansour, Vickie Kong, Winnie Li, Jennifer Dang, Parisa Sadeghi, Inmaculada Navarro, Jerusha Padayachee, Peter Chung, Jeff D. Winter, Tony Tadic
Date:2025-04-10 17:47:47

Objective: Quantify geometric and dosimetric accuracy of a novel prostate MR-to-MR deformable image registration (DIR) approach to support MR-guided adaptive radiation therapy dose accumulation. Approach: We evaluated DIR accuracy in 25 patients treated with 30 Gy in 5 fractions on a 1.5 T MR-linac using an adaptive workflow. A reference MR was used for planning, with three images collected at each fraction: adapt MR for adaptive planning, verify MR for pretreatment position verification and beam-on for capturing anatomy during radiation delivery. We assessed three DIR approaches: intensity-based, intensity-based with controlling structures (CS) and novel intensity based with controlling structures and points of interest (CS+P). DIRs were performed between the reference and fraction images and within fractions. We propagated CTV, bladder, and rectum contours using the DIRs and compared to manual contours using Dice similarity coefficient, mean distance to agreement (DTAmean), and dose-volume metrics. Results: CS and CS+P improved geometric agreement between contours over intensity-only DIR. DTAmean for reference-to-beam-on intensity-only DIR was 0.131+/-0.009cm (CTV), 0.46+/-0.08cm (bladder), and 0.154+/-0.013cm (rectum). For the CS, the values were 0.018+/-0.002cm, 0.388+/-0.14cm, and 0.036+/-0.013cm. For CS+P these values were 0.015+/-0.001cm, 0.025+/-0.004cm, and 0.021+/-0.002cm. Dosimetrically, comparing CS and CS+P for reference to beam-on DIRs resulted in a change of CTV D98% from [-29cGy, 19cGy] to [-18cGy, 26cGy], rectum D1cc from [-106cGy, 72cGy] to [-52cGy, 74cGy], and bladder D5cc from [-51cGy, 544cGy] to [-79cGy, 36cGy]. Significance: CS improved geometric and dosimetric accuracy over intensity-only DIR, with CS+P providing the most consistent performance. However, session image segmentation remains a challenge, which may be addressed with automated contouring.

Optimal Control For Anti-Abeta Treatment in Alzheimer's Disease using a Reaction-Diffusion Model

Authors:Wenrui Hao, Chiu-Yen Kao, Sun Lee, Zhiyuan Li
Date:2025-04-10 17:22:09

Alzheimer's disease is a progressive neurodegenerative disorder that significantly impairs patient survival and quality of life. While current pharmacological treatments aim to slow disease progression, they remain insufficient in halting cognitive decline. Mathematical modeling has emerged as a powerful tool for understanding the dynamics of AD and optimizing treatment strategies. However, most existing models focus on temporal dynamics using ordinary differential equation-based approaches, often neglecting the critical role of spatial heterogeneity in disease progression. In this study, we employ a spatially explicit reaction-diffusion model to describe amyloid-beta (A beta) dynamics in the brain, incorporating treatment optimization while accounting for potential side effects. Our objective is to minimize amyloid-beta plaque concentration while balancing therapeutic efficacy against adverse effects, such as amyloid-related imaging abnormalities (ARIA). Under specific assumptions, we establish the well-posedness and uniqueness of the optimal solution. We employ numerical methods based on the Finite Element Method to compute personalized treatment strategies, leveraging real patient amyloid-beta positron emission tomography (PET) scan data. Our results demonstrate that optimal treatment strategies outperform constant dosing regimens, achieving significant reductions in amyloid burden while minimizing side effects. By integrating spatial dynamics and personalized treatment planning, our framework offers a novel approach to refining therapeutic interventions for Alzheimer's disease.

Fast Adaptation with Behavioral Foundation Models

Authors:Harshit Sikchi, Andrea Tirinzoni, Ahmed Touati, Yingchen Xu, Anssi Kanervisto, Scott Niekum, Amy Zhang, Alessandro Lazaric, Matteo Pirotta
Date:2025-04-10 16:14:17

Unsupervised zero-shot reinforcement learning (RL) has emerged as a powerful paradigm for pretraining behavioral foundation models (BFMs), enabling agents to solve a wide range of downstream tasks specified via reward functions in a zero-shot fashion, i.e., without additional test-time learning or planning. This is achieved by learning self-supervised task embeddings alongside corresponding near-optimal behaviors and incorporating an inference procedure to directly retrieve the latent task embedding and associated policy for any given reward function. Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process, the embedding, and the inference procedure. In this paper, we focus on devising fast adaptation strategies to improve the zero-shot performance of BFMs in a few steps of online interaction with the environment while avoiding any performance drop during the adaptation process. Notably, we demonstrate that existing BFMs learn a set of skills containing more performant policies than those identified by their inference procedure, making them well-suited for fast adaptation. Motivated by this observation, we propose both actor-critic and actor-only fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies on any downstream task. Notably, our approach mitigates the initial "unlearning" phase commonly observed when fine-tuning pre-trained RL models. We evaluate our fast adaptation strategies on top of four state-of-the-art zero-shot RL methods in multiple navigation and locomotion domains. Our results show that they achieve 10-40% improvement over their zero-shot performance in a few tens of episodes, outperforming existing baselines.

A quantum computing approach to beam angle optimization

Authors:Nimita Shinde, Ya-Nan Zhu, Haozheng Shen, Hao Gao
Date:2025-04-10 15:24:37

Background: Beam angle optimization (BAO) is a critical component of radiation therapy (RT) treatment planning, where small changes in beam configuration can significantly impact treatment quality, especially for proton RT. Mathematically, BAO is a mixed integer programming (MIP) problem, which is NP-hard due to its exponential growing search space. Traditional optimization techniques often struggle with computational efficiency, necessitating the development of novel approaches. Purpose: This study introduces QC-BAO, a hybrid quantum-classical approach that leverages quantum computing to solve the MIP formulation of BAO. Methods: The proposed approach, QC-BAO, models BAO as an MIP problem, incorporating binary variables for beam angle selection and continuous variables for optimizing spot intensities for proton therapy. The proposed approach employs a hybrid quantum-classical framework, utilizing quantum computing to solve the binary decision component while integrating classical optimization techniques, including iterative convex relaxation and alternating direction method of multipliers. Results: Computational experiments were conducted on clinical test cases to evaluate QC-BAO's performance against clinically verified angles and a heuristic approach, GS-BAO. QC-BAO demonstrated improved treatment plan quality over both clinical and GS-BAO. The method consistently increased the conformity index (CI) for target coverage while reducing mean and maximum doses to organs-at-risk (OAR). Additionally, QC-BAO produced the lowest objective function value, confirming its superior optimization capability. Conclusions: The findings highlight the potential of quantum computing to enhance the solution to BAO problem by demonstrated improvement in plan quality using the proposed method, QC-BAO. This study paves the way for future clinical implementation of quantum-accelerated optimization in RT.

Anytime Single-Step MAPF Planning with Anytime PIBT

Authors:Nayesha Gandotra, Rishi Veerapaneni, Muhammad Suhail Saleem, Daniel Harabor, Jiaoyang Li, Maxim Likhachev
Date:2025-04-10 15:21:23

PIBT is a popular Multi-Agent Path Finding (MAPF) method at the core of many state-of-the-art MAPF methods including LaCAM, CS-PIBT, and WPPL. The main utility of PIBT is that it is a very fast and effective single-step MAPF solver and can return a collision-free single-step solution for hundreds of agents in less than a millisecond. However, the main drawback of PIBT is that it is extremely greedy in respect to its priorities and thus leads to poor solution quality. Additionally, PIBT cannot use all the planning time that might be available to it and returns the first solution it finds. We thus develop Anytime PIBT, which quickly finds a one-step solution identically to PIBT but then continuously improves the solution in an anytime manner. We prove that Anytime PIBT converges to the optimal solution given sufficient time. We experimentally validate that Anytime PIBT can rapidly improve single-step solution quality within milliseconds and even find the optimal single-step action. However, we interestingly find that improving the single-step solution quality does not have a significant effect on full-horizon solution costs.

HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss

Authors:Yi Huang, Ke Zhang, Wei Liu, Yuanyuan Wang, Vishal M. Patel, Le Lu, Xu Han, Dakai Jin, Ke Yan
Date:2025-04-10 15:04:42

Accurate segmentation of tubular structures in medical images, such as vessels and airway trees, is crucial for computer-aided diagnosis, radiotherapy, and surgical planning. However, significant challenges exist in algorithm design when faced with diverse sizes, complex topologies, and (often) incomplete data annotation of these structures. We address these difficulties by proposing a new tubular structure segmentation framework named HarmonySeg. First, we design a deep-to-shallow decoder network featuring flexible convolution blocks with varying receptive fields, which enables the model to effectively adapt to tubular structures of different scales. Second, to highlight potential anatomical regions and improve the recall of small tubular structures, we incorporate vesselness maps as auxiliary information. These maps are aligned with image features through a shallow-and-deep fusion module, which simultaneously eliminates unreasonable candidates to maintain high precision. Finally, we introduce a topology-preserving loss function that leverages contextual and shape priors to balance the growth and suppression of tubular structures, which also allows the model to handle low-quality and incomplete annotations. Extensive quantitative experiments are conducted on four public datasets. The results show that our model can accurately segment 2D and 3D tubular structures and outperform existing state-of-the-art methods. External validation on a private dataset also demonstrates good generalizability.

Siren Federate: Bridging document, relational, and graph models for exploratory graph analysis

Authors:Georgeta Bordea, Stephane Campinas, Matteo Catena, Renaud Delbru
Date:2025-04-10 14:52:03

Investigative workflows require interactive exploratory analysis on large heterogeneous knowledge graphs. Current databases show limitations in enabling such task. This paper discusses the architecture of Siren Federate, a system that efficiently supports exploratory graph analysis by bridging document-oriented, relational and graph models. Technical contributions include distributed join algorithms, adaptive query planning, query plan folding, semantic caching, and semi-join decomposition for path query. Semi-join decomposition addresses the exponential growth of intermediate results in path-based queries. Experiments show that Siren Federate exhibits low latency and scales well with the amount of data, the number of users, and the number of computing nodes.