planning - 2025-05-12

A Proton Treatment Planning Method for Combining FLASH and Spatially Fractionated Radiation Therapy to Enhance Normal Tissue Protection

Authors:Weijie Zhang, Xue Hong, Ya-Nan Zhu, Yuting Lin, Gregory Gan, Ronald C Chen, Hao Gao
Date:2025-05-09 17:57:30

Background: FLASH radiation therapy (FLASH-RT) uses ultra-high dose rates to induce the FLASH effect, enhancing normal tissue sparing. In proton Bragg peak FLASH-RT, this effect is confined to high-dose regions near the target at deep tissue levels. In contrast, Spatially Fractionated Radiation Therapy (SFRT) creates alternating high- and low-dose regions with high peak-to-valley dose ratios (PVDR), sparing tissues at shallow-to-intermediate depths. Purpose: This study investigates a novel proton modality (SFRT-FLASH) that synergizes FLASH-RT and SFRT to enhance normal tissue protection across all depths. Methods: Two SFRT techniques are integrated with FLASH-RT: proton GRID therapy (pGRID) with conventional beam sizes and proton minibeam radiation therapy (pMBRT) with submillimeter beams. These are implemented as pGRID-FLASH (SB-FLASH) and minibeam-FLASH (MB-FLASH), respectively. The pGRID technique uses a scissor-beam (SB) method to achieve uniform target coverage. To meet FLASH dose (5 Gy) and dose-rate (40 Gy/s) thresholds, a single-field uniform-dose-per-fraction strategy is used. Dose and dose-rate constraints are jointly optimized, including a CTV1cm structure (a 1 cm ring around the CTV) for each field. Results: Across four clinical cases, MB-FLASH and SB-FLASH plans were benchmarked against conventional (CONV), FLASH-RT (FLASH), pMBRT (MB), and pGRID (SB) plans. SFRT-FLASH achieved high FLASH effect coverage (~60-80% in CTV1cm) while preserving PVDR (~2.5-7) at shallow-to-intermediate depths. Conclusions: We present a proton treatment planning approach that combines the FLASH effect at depth with high PVDR near the surface, enhancing normal tissue protection and advancing proton therapy.

KRRF: Kinodynamic Rapidly-exploring Random Forest algorithm for multi-goal motion planning

Authors:Petr Ježek, Michal Minařík, Vojtěch Vonásek, Robert Pěnička
Date:2025-05-09 15:29:11

The problem of kinodynamic multi-goal motion planning is to find a trajectory over multiple target locations with an apriori unknown sequence of visits. The objective is to minimize the cost of the trajectory planned in a cluttered environment for a robot with a kinodynamic motion model. This problem has yet to be efficiently solved as it combines two NP-hard problems, the Traveling Salesman Problem~(TSP) and the kinodynamic motion planning problem. We propose a novel approximate method called Kinodynamic Rapidly-exploring Random Forest~(KRRF) to find a collision-free multi-goal trajectory that satisfies the motion constraints of the robot. KRRF simultaneously grows kinodynamic trees from all targets towards all other targets while using the other trees as a heuristic to boost the growth. Once the target-to-target trajectories are planned, their cost is used to solve the TSP to find the sequence of targets. The final multi-goal trajectory satisfying kinodynamic constraints is planned by guiding the RRT-based planner along the target-to-target trajectories in the TSP sequence. Compared with existing approaches, KRRF provides shorter target-to-target trajectories and final multi-goal trajectories with $1.1-2$ times lower costs while being computationally faster in most test cases. The method will be published as an open-source library.

The Application of Deep Learning for Lymph Node Segmentation: A Systematic Review

Authors:Jingguo Qu, Xinyang Han, Man-Lik Chui, Yao Pu, Simon Takadiyi Gunda, Ziman Chen, Jing Qin, Ann Dorothy King, Winnie Chiu-Wing Chu, Jing Cai, Michael Tin-Cheung Ying
Date:2025-05-09 15:17:00

Automatic lymph node segmentation is the cornerstone for advances in computer vision tasks for early detection and staging of cancer. Traditional segmentation methods are constrained by manual delineation and variability in operator proficiency, limiting their ability to achieve high accuracy. The introduction of deep learning technologies offers new possibilities for improving the accuracy of lymph node image analysis. This study evaluates the application of deep learning in lymph node segmentation and discusses the methodologies of various deep learning architectures such as convolutional neural networks, encoder-decoder networks, and transformers in analyzing medical imaging data across different modalities. Despite the advancements, it still confronts challenges like the shape diversity of lymph nodes, the scarcity of accurately labeled datasets, and the inadequate development of methods that are robust and generalizable across different imaging modalities. To the best of our knowledge, this is the first study that provides a comprehensive overview of the application of deep learning techniques in lymph node segmentation task. Furthermore, this study also explores potential future research directions, including multimodal fusion techniques, transfer learning, and the use of large-scale pre-trained models to overcome current limitations while enhancing cancer diagnosis and treatment planning strategies.

Towards Better Cephalometric Landmark Detection with Diffusion Data Generation

Authors:Dongqian Guo, Wencheng Han, Pang Lyu, Yuxi Zhou, Jianbing Shen
Date:2025-05-09 13:50:27

Cephalometric landmark detection is essential for orthodontic diagnostics and treatment planning. Nevertheless, the scarcity of samples in data collection and the extensive effort required for manual annotation have significantly impeded the availability of diverse datasets. This limitation has restricted the effectiveness of deep learning-based detection methods, particularly those based on large-scale vision models. To address these challenges, we have developed an innovative data generation method capable of producing diverse cephalometric X-ray images along with corresponding annotations without human intervention. To achieve this, our approach initiates by constructing new cephalometric landmark annotations using anatomical priors. Then, we employ a diffusion-based generator to create realistic X-ray images that correspond closely with these annotations. To achieve precise control in producing samples with different attributes, we introduce a novel prompt cephalometric X-ray image dataset. This dataset includes real cephalometric X-ray images and detailed medical text prompts describing the images. By leveraging these detailed prompts, our method improves the generation process to control different styles and attributes. Facilitated by the large, diverse generated data, we introduce large-scale vision detection models into the cephalometric landmark detection task to improve accuracy. Experimental results demonstrate that training with the generated data substantially enhances the performance. Compared to methods without using the generated data, our approach improves the Success Detection Rate (SDR) by 6.5%, attaining a notable 82.2%. All code and data are available at: https://um-lab.github.io/cepha-generation

Designing RoutScape: Geospatial Prototyping with XR for Flood Evacuation Planning

Authors:Johndayll Lewis Arizala, Joshua Permito, Steven Errol Escopete, John Kovie Niño, Jordan Aiko Deja
Date:2025-05-09 13:41:11

Flood response planning in local communities is often hindered by fragmented communication across Disaster Risk Reduction and Management (DRRM) councils. In this work, we explore how extended reality (XR) can support more effective planning through narrative-driven design. We present Routscape, an XR prototype for visualizing flood scenarios and evacuation routes, developed through iterative prototyping and user-centered design with DRRM officers. By grounding the system in real-world experiences and localized narratives, we highlight how XR can aid in fostering shared understanding and spatial sensemaking in disaster preparedness efforts.

Towards Quantum Resilience: Data-Driven Migration Strategy Design

Authors:Nahid Aliyev, Ozan Cetin, Emil Huseynov
Date:2025-05-09 11:12:09

The advancements in quantum computing are a threat to classical cryptographic systems. The traditional cryptographic methods that utilize factorization-based or discrete-logarithm-based algorithms, such as RSA and ECC, are some of these. This paper thoroughly investigates the vulnerabilities of traditional cryptographic methods against quantum attacks and provides a decision-support framework to help organizations in recommending mitigation plans and determining appropriate transition strategies to post-quantum cryptography. A semi-synthetic dataset, consisting of key features such as key size, network complexity, and sensitivity levels, is crafted, with each configuration labeled according to its recommended mitigation plan. Using decision tree and random forest models, a classifier is trained to recommend appropriate mitigation/transition plans such as continuous monitoring, scheduled transitions, and immediate hybrid implementation. The proposed approach introduces a data-driven and dynamic solution for organizations to assess the scale of the migration, specifying a structured roadmap toward quantum resilience. The results highlight important features that influence strategy decisions and support actionable recommendations for cryptographic modernization based on system context.

Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions

Authors:Dhruvesh Patel, Aishwarya Sahoo, Avinash Amballa, Tahira Naseem, Tim G. J. Rudner, Andrew McCallum
Date:2025-05-09 03:29:15

Autoregressive models (ARMs), which predict subsequent tokens one-by-one ``from left to right,'' have achieved significant success across a wide range of sequence generation tasks. However, they struggle to accurately represent sequences that require satisfying sophisticated constraints or whose sequential dependencies are better addressed by out-of-order generation. Masked Diffusion Models (MDMs) address some of these limitations, but the process of unmasking multiple tokens simultaneously in MDMs can introduce incoherences, and MDMs cannot handle arbitrary infilling constraints when the number of tokens to be filled in is not known in advance. In this work, we introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence -- that is, they select jointly both the position and the vocabulary element to be inserted. By inserting tokens one at a time, ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences where token dependencies do not follow a left-to-right sequential structure. To train ILMs, we propose a tailored network parameterization and use a simple denoising objective. Our empirical evaluation demonstrates that ILMs outperform both ARMs and MDMs on common planning tasks. Furthermore, we show that ILMs outperform MDMs and perform on par with ARMs in an unconditional text generation task while offering greater flexibility than MDMs in arbitrary-length text infilling.

Physics-informed Temporal Difference Metric Learning for Robot Motion Planning

Authors:Ruiqi Ni, Zherong Pan, Ahmed H Qureshi
Date:2025-05-09 00:02:22

The motion planning problem involves finding a collision-free path from a robot's starting to its target configuration. Recently, self-supervised learning methods have emerged to tackle motion planning problems without requiring expensive expert demonstrations. They solve the Eikonal equation for training neural networks and lead to efficient solutions. However, these methods struggle in complex environments because they fail to maintain key properties of the Eikonal equation, such as optimal value functions and geodesic distances. To overcome these limitations, we propose a novel self-supervised temporal difference metric learning approach that solves the Eikonal equation more accurately and enhances performance in solving complex and unseen planning tasks. Our method enforces Bellman's principle of optimality over finite regions, using temporal difference learning to avoid spurious local minima while incorporating metric learning to preserve the Eikonal equation's essential geodesic properties. We demonstrate that our approach significantly outperforms existing self-supervised learning methods in handling complex environments and generalizing to unseen environments, with robot configurations ranging from 2 to 12 degrees of freedom (DOF).

Prospects for Probing Sub-GeV Leptophilic Dark Matter with the Future VLAST

Authors:Tian-Peng Tang, Meiwen Yang, Kai-Kai Duan, Yue-Lin Sming Tsai, Yi-Zhong Fan
Date:2025-05-08 15:53:12

The proposed Very Large Area Space Telescope (VLAST), with its expected unprecedented sensitivity in the MeV-GeV range, can also address the longstanding "MeV Gap" in gamma-ray observations. We explore the capability of VLAST to detect sub-GeV leptophilic dark matter (DM) annihilation, focusing on scalar and vector mediators and emphasizing the resonance region where the mediator mass is approximately twice the DM mass. While $s$-wave annihilation is tightly constrained by relic density and cosmic microwave background observations, $p$-wave and mixed $(s+p)$-wave scenarios remain viable, particularly near resonance. Additionally, direct detection experiments, especially those probing DM-electron scattering, significantly constrain nonresonance parameter space but are less effective in the resonance regime. VLAST can uniquely probe this surviving region, outperforming existing and planned instruments, and establishing itself as a crucial tool for indirect detection of thermal relic DM.

Quantum-Aware Network Planning and Integration

Authors:Cédric Ware, Mounia Lourdiane
Date:2025-05-08 15:41:34

In order to broaden the adoption of highly-demanded quantum functionalities such as QKD, there is a need for having quantum signals coexist with classical traffic over the same physical medium, typically optical fibers in already-deployed networks. Beyond the experimental point-to-point demonstrations of the past few years, efforts are now underway to integrate QKD at the network level: developing interfaces with the software-defined-network ecosystem; but also network planning tools that satisfy physical-layer contraints jointly on the classical and quantum signals. We have found that in certain situations, na\"ive network planning prioritizing quantum traffic drastically degrades classical capacity, whereas a quantum-aware wavelength assignment heuristic allows coexistence with minimal impact on both capacities. More such techniques will be required to enable widespread deployment of QKD and other future quantum functionalities.

HQC-NBV: A Hybrid Quantum-Classical View Planning Approach

Authors:Xiaotong Yu, Chang Wen Chen
Date:2025-05-08 13:05:07

Efficient view planning is a fundamental challenge in computer vision and robotic perception, critical for tasks ranging from search and rescue operations to autonomous navigation. While classical approaches, including sampling-based and deterministic methods, have shown promise in planning camera viewpoints for scene exploration, they often struggle with computational scalability and solution optimality in complex settings. This study introduces HQC-NBV, a hybrid quantum-classical framework for view planning that leverages quantum properties to efficiently explore the parameter space while maintaining robustness and scalability. We propose a specific Hamiltonian formulation with multi-component cost terms and a parameter-centric variational ansatz with bidirectional alternating entanglement patterns that capture the hierarchical dependencies between viewpoint parameters. Comprehensive experiments demonstrate that quantum-specific components provide measurable performance advantages. Compared to the classical methods, our approach achieves up to 49.2% higher exploration efficiency across diverse environments. Our analysis of entanglement architecture and coherence-preserving terms provides insights into the mechanisms of quantum advantage in robotic exploration tasks. This work represents a significant advancement in integrating quantum computing into robotic perception systems, offering a paradigm-shifting solution for various robot vision tasks.

Online Velocity Profile Generation and Tracking for Sampling-Based Local Planning Algorithms in Autonomous Racing Environments

Authors:Alexander Langmann, Levent Ögretmen, Frederik Werner, Johannes Betz
Date:2025-05-08 11:53:56

This work presents an online velocity planner for autonomous racing that adapts to changing dynamic constraints, such as grip variations from tire temperature changes and rubber accumulation. The method combines a forward-backward solver for online velocity optimization with a novel spatial sampling strategy for local trajectory planning, utilizing a three-dimensional track representation. The computed velocity profile serves as a reference for the local planner, ensuring adaptability to environmental and vehicle dynamics. We demonstrate the approach's robust performance and computational efficiency in racing scenarios and discuss its limitations, including sensitivity to deviations from the predefined racing line and high jerk characteristics of the velocity profile.

Spatially Disaggregated Energy Consumption and Emissions in End-use Sectors for Germany and Spain

Authors:Shruthi Patil, Noah Pflugradt, Jann M. Weinand, Jürgen Kropp, Detlef Stolten
Date:2025-05-08 11:21:44

High-resolution energy consumption and emissions datasets are essential for localized policy-making, resource optimization, and climate action planning. They enable municipalities to monitor mitigation strategies and foster engagement among governments, businesses, and communities. However, smaller municipalities often face data limitations that hinder tailored climate strategies. This study generates detailed final energy consumption and emissions data at the local administrative level for Germany and Spain. Using national datasets, we apply spatial disaggregation techniques with open data sources. A key innovation is the application of XGBoost for imputing missing data, combined with a stepwise spatial disaggregation process incorporating district- and province-level statistics. Prioritizing reproducibility, our open-data approach provides a scalable framework for municipalities to develop actionable climate plans. To ensure transparency, we assess the reliability of imputed values and assign confidence ratings to the disaggregated data.

Predictive Control of EV Overnight Charging with Multi-Session Flexibility

Authors:Felix Wieberneit, Emanuele Crisostomi, Anthony Quinn, Robert Shorten
Date:2025-05-08 09:35:27

The majority of electric vehicles (EVs) are charged domestically overnight, where the precise timing of power allocation is not important to the user, thus representing a source of flexibility that can be leveraged by charging control algorithms. In this paper, we relax the common assumption, that EVs require full charge every morning, enabling additional flexibility to defer charging of surplus energy to subsequent nights, which can enhance the performance of controlled charging. In particular, we consider a simple domestic smart plug, scheduling power delivery with the objective to minimize CO$_2$ emissions over prediction horizons of multiple sessions -- up to seven days ahead -- utilising model predictive control (MPC). Based on carbon intensity data from the UK National Grid, we demonstrate significant potential for emission reductions with multi-session planning of 40 to 46\% compared to uncontrolled charging and 19 to 26\% compared to single-session planning. Furthermore, we assess, how the driving and charging behaviour of EV users affects the available flexibility and consequentially the potential for emission reductions. Finally, using grid carbon intensity data from 14 different UK regions, we report significant variations in absolute emission reductions based on the local energy mix.

Current status of the CMS Inner Tracker upgrade for HL-LHC

Authors:Lorenzo Damenti
Date:2025-05-08 08:01:05

During the High Luminosity programme of the LHC collider (called HL-LHC), planned to start in 2030, the instantaneous luminosity will be increased from \num{\sim 2e34}~\si{cm^{-2}s^{-1}} to an unprecedented figure of about \num{\sim 7.5e34}~\si{cm^{-2}s^{-1}}. This will allow the Compact Muon Solenoid (CMS) to collect up to \num{\sim 4000}~\si{fb^{-1}} of integrated luminosity over a decade. In order to cope with the much higher pp-collisions rate, CMS will undergo an extensive improvement known as \pII upgrade: in particular, the silicon tracker system will be entirely replaced to comply with the extremely challenging experimental conditions. This contribution will review the main upgrades of the CMS Inner Tracker and will present the most relevant design and technological choices. Moreover, the ongoing validation of prototypes and the preparation for the large-scale production will be discussed.

CPP-DIP: Multi-objective Coverage Path Planning for MAVs in Dispersed and Irregular Plantations

Authors:Weijie Kuang, Hann Woei Ho, Ye Zhou
Date:2025-05-08 06:52:22

Coverage Path Planning (CPP) is vital in precision agriculture to improve efficiency and resource utilization. In irregular and dispersed plantations, traditional grid-based CPP often causes redundant coverage over non-vegetated areas, leading to waste and pollution. To overcome these limitations, we propose CPP-DIP, a multi-objective CPP framework designed for Micro Air Vehicles (MAVs). The framework transforms the CPP task into a Traveling Salesman Problem (TSP) and optimizes flight paths by minimizing travel distance, turning angles, and intersection counts. Unlike conventional approaches, our method does not rely on GPS-based environmental modeling. Instead, it uses aerial imagery and a Histogram of Oriented Gradients (HOG)-based approach to detect trees and extract image coordinates. A density-aware waypoint strategy is applied: Kernel Density Estimation (KDE) is used to reduce redundant waypoints in dense regions, while a greedy algorithm ensures complete coverage in sparse areas. To verify the generality of the framework, we solve the resulting TSP using three different methods: Greedy Heuristic Insertion (GHI), Ant Colony Optimization (ACO), and Monte Carlo Reinforcement Learning (MCRL). Then an object-based optimization is applied to further refine the resulting path. Additionally, CPP-DIP integrates ForaNav, our insect-inspired navigation method, for accurate tree localization and tracking. The experimental results show that MCRL offers a balanced solution, reducing the travel distance by 16.9 % compared to ACO while maintaining a similar performance to GHI. It also improves path smoothness by reducing turning angles by 28.3 % and 59.9 % relative to ACO and GHI, respectively, and effectively eliminates intersections. These results confirm the robustness and effectiveness of CPP-DIP in different TSP solvers.

A Vehicle System for Navigating Among Vulnerable Road Users Including Remote Operation

Authors:Oscar de Groot, Alberto Bertipaglia, Hidde Boekema, Vishrut Jain, Marcell Kegl, Varun Kotian, Ted Lentsch, Yancong Lin, Chrysovalanto Messiou, Emma Schippers, Farzam Tajdari, Shiming Wang, Zimin Xia, Mubariz Zaffar, Ronald Ensing, Mario Garzon, Javier Alonso-Mora, Holger Caesar, Laura Ferranti, Riender Happee, Julian F. P. Kooij, Georgios Papaioannou, Barys Shyrokau, Dariu M. Gavrila
Date:2025-05-08 06:39:47

We present a vehicle system capable of navigating safely and efficiently around Vulnerable Road Users (VRUs), such as pedestrians and cyclists. The system comprises key modules for environment perception, localization and mapping, motion planning, and control, integrated into a prototype vehicle. A key innovation is a motion planner based on Topology-driven Model Predictive Control (T-MPC). The guidance layer generates multiple trajectories in parallel, each representing a distinct strategy for obstacle avoidance or non-passing. The underlying trajectory optimization constrains the joint probability of collision with VRUs under generic uncertainties. To address extraordinary situations ("edge cases") that go beyond the autonomous capabilities - such as construction zones or encounters with emergency responders - the system includes an option for remote human operation, supported by visual and haptic guidance. In simulation, our motion planner outperforms three baseline approaches in terms of safety and efficiency. We also demonstrate the full system in prototype vehicle tests on a closed track, both in autonomous and remotely operated modes.

LVLM-MPC Collaboration for Autonomous Driving: A Safety-Aware and Task-Scalable Control Architecture

Authors:Kazuki Atsuta, Kohei Honda, Hiroyuki Okuda, Tatsuya Suzuki
Date:2025-05-08 06:35:30

This paper proposes a novel Large Vision-Language Model (LVLM) and Model Predictive Control (MPC) integration framework that delivers both task scalability and safety for Autonomous Driving (AD). LVLMs excel at high-level task planning across diverse driving scenarios. However, since these foundation models are not specifically designed for driving and their reasoning is not consistent with the feasibility of low-level motion planning, concerns remain regarding safety and smooth task switching. This paper integrates LVLMs with MPC Builder, which automatically generates MPCs on demand, based on symbolic task commands generated by the LVLM, while ensuring optimality and safety. The generated MPCs can strongly assist the execution or rejection of LVLM-driven task switching by providing feedback on the feasibility of the given tasks and generating task-switching-aware MPCs. Our approach provides a safe, flexible, and adaptable control framework, bridging the gap between cutting-edge foundation models and reliable vehicle operation. We demonstrate the effectiveness of our approach through a simulation experiment, showing that our system can safely and effectively handle highway driving while maintaining the flexibility and adaptability of LVLMs.

Robust Model-Based In-Hand Manipulation with Integrated Real-Time Motion-Contact Planning and Tracking

Authors:Yongpeng Jiang, Mingrui Yu, Xinghao Zhu, Masayoshi Tomizuka, Xiang Li
Date:2025-05-08 06:31:19

Robotic dexterous in-hand manipulation, where multiple fingers dynamically make and break contact, represents a step toward human-like dexterity in real-world robotic applications. Unlike learning-based approaches that rely on large-scale training or extensive data collection for each specific task, model-based methods offer an efficient alternative. Their online computing nature allows for ready application to new tasks without extensive retraining. However, due to the complexity of physical contacts, existing model-based methods encounter challenges in efficient online planning and handling modeling errors, which limit their practical applications. To advance the effectiveness and robustness of model-based contact-rich in-hand manipulation, this paper proposes a novel integrated framework that mitigates these limitations. The integration involves two key aspects: 1) integrated real-time planning and tracking achieved by a hierarchical structure; and 2) joint optimization of motions and contacts achieved by integrated motion-contact modeling. Specifically, at the high level, finger motion and contact force references are jointly generated using contact-implicit model predictive control. The high-level module facilitates real-time planning and disturbance recovery. At the low level, these integrated references are concurrently tracked using a hand force-motion model and actual tactile feedback. The low-level module compensates for modeling errors and enhances the robustness of manipulation. Extensive experiments demonstrate that our approach outperforms existing model-based methods in terms of accuracy, robustness, and real-time performance. Our method successfully completes five challenging tasks in real-world environments, even under appreciable external disturbances.

AI and Vision based Autonomous Navigation of Nano-Drones in Partially-Known Environments

Authors:Mattia Sartori, Chetna Singhal, Neelabhro Roy, Davide Brunelli, James Gross
Date:2025-05-08 06:16:36

The miniaturisation of sensors and processors, the advancements in connected edge intelligence, and the exponential interest in Artificial Intelligence are boosting the affirmation of autonomous nano-size drones in the Internet of Robotic Things ecosystem. However, achieving safe autonomous navigation and high-level tasks such as exploration and surveillance with these tiny platforms is extremely challenging due to their limited resources. This work focuses on enabling the safe and autonomous flight of a pocket-size, 30-gram platform called Crazyflie 2.1 in a partially known environment. We propose a novel AI-aided, vision-based reactive planning method for obstacle avoidance under the ambit of Integrated Sensing, Computing and Communication paradigm. We deal with the constraints of the nano-drone by splitting the navigation task into two parts: a deep learning-based object detector runs on the edge (external hardware) while the planning algorithm is executed onboard. The results show the ability to command the drone at $\sim8$ frames-per-second and a model performance reaching a COCO mean-average-precision of $60.8$. Field experiments demonstrate the feasibility of the solution with the drone flying at a top speed of $1$ m/s while steering away from an obstacle placed in an unknown position and reaching the target destination. The outcome highlights the compatibility of the communication delay and the model performance with the requirements of the real-time navigation task. We provide a feasible alternative to a fully onboard implementation that can be extended to autonomous exploration with nano-drones.

CAG-VLM: Fine-Tuning of a Large-Scale Model to Recognize Angiographic Images for Next-Generation Diagnostic Systems

Authors:Yuto Nakamura, Satoshi Kodera, Haruki Settai, Hiroki Shinohara, Masatsugu Tamura, Tomohiro Noguchi, Tatsuki Furusawa, Ryo Takizawa, Tempei Kabayama, Norihiko Takeda
Date:2025-05-08 05:44:52

Coronary angiography (CAG) is the gold-standard imaging modality for evaluating coronary artery disease, but its interpretation and subsequent treatment planning rely heavily on expert cardiologists. To enable AI-based decision support, we introduce a two-stage, physician-curated pipeline and a bilingual (Japanese/English) CAG image-report dataset. First, we sample 14,686 frames from 539 exams and annotate them for key-frame detection and left/right laterality; a ConvNeXt-Base CNN trained on this data achieves 0.96 F1 on laterality classification, even on low-contrast frames. Second, we apply the CNN to 243 independent exams, extract 1,114 key frames, and pair each with its pre-procedure report and expert-validated diagnostic and treatment summary, yielding a parallel corpus. We then fine-tune three open-source VLMs (PaliGemma2, Gemma3, and ConceptCLIP-enhanced Gemma3) via LoRA and evaluate them using VLScore and cardiologist review. Although PaliGemma2 w/LoRA attains the highest VLScore, Gemma3 w/LoRA achieves the top clinician rating (mean 7.20/10); we designate this best-performing model as CAG-VLM. These results demonstrate that specialized, fine-tuned VLMs can effectively assist cardiologists in generating clinical reports and treatment recommendations from CAG images.

Randomized Routing to Remote Queues

Authors:Shuangchi He, Yunfang Yang, Yao Yu
Date:2025-05-08 04:37:53

We study load balancing for a queueing system where parallel stations are distant from customers. In the presence of traveling delays, the join-the-shortest-queue (JSQ) policy induces queue length oscillations and prolongs the mean waiting time. A variant of the JSQ policy, dubbed the randomized join-the-shortest-queue (RJSQ) policy, is devised to mitigate the oscillation phenomenon. By the RJSQ policy, customers are sent to each station with a probability approximately proportional to its service capacity; only a small fraction of customers are purposely routed to the shortest queue. The additional probability of routing a customer to the shortest queue, referred to as the balancing fraction, dictates the policy's performance. When the balancing fraction is within a certain range, load imbalance between the stations is negligible in heavy traffic, so that complete resource pooling is achieved. We specify the optimal order of magnitude for the balancing fraction, by which heuristic formulas are proposed to fine-tune the RJSQ policy. A joint problem of capacity planning and load balancing is considered for geographically separated stations. With well planned service capacities, the RJSQ policy sends all but a small fraction of customers to the nearest stations, rendering the system asymptotically equivalent to an aggregated single-server system with all customers having minimum traveling delays. If each customer's service requirement does not depend on the station, the RJSQ policy is asymptotically optimal for reducing workload.

Building-Guided Pseudo-Label Learning for Cross-Modal Building Damage Mapping

Authors:Jiepan Li, He Huang, Yu Sheng, Yujun Guo, Wei He
Date:2025-05-08 04:37:12

Accurate building damage assessment using bi-temporal multi-modal remote sensing images is essential for effective disaster response and recovery planning. This study proposes a novel Building-Guided Pseudo-Label Learning Framework to address the challenges of mapping building damage from pre-disaster optical and post-disaster SAR images. First, we train a series of building extraction models using pre-disaster optical images and building labels. To enhance building segmentation, we employ multi-model fusion and test-time augmentation strategies to generate pseudo-probabilities, followed by a low-uncertainty pseudo-label training method for further refinement. Next, a change detection model is trained on bi-temporal cross-modal images and damaged building labels. To improve damage classification accuracy, we introduce a building-guided low-uncertainty pseudo-label refinement strategy, which leverages building priors from the previous step to guide pseudo-label generation for damaged buildings, reducing uncertainty and enhancing reliability. Experimental results on the 2025 IEEE GRSS Data Fusion Contest dataset demonstrate the effectiveness of our approach, which achieved the highest mIoU score (54.28%) and secured first place in the competition.

Real-Time Model Predictive Control of Vehicles with Convex-Polygon-Aware Collision Avoidance in Tight Spaces

Authors:Haruki Kojima, Kohei Honda, Hiroyuki Okuda, Tatsuya Suzuki
Date:2025-05-08 04:26:07

This paper proposes vehicle motion planning methods with obstacle avoidance in tight spaces by incorporating polygonal approximations of both the vehicle and obstacles into a model predictive control (MPC) framework. Representing these shapes is crucial for navigation in tight spaces to ensure accurate collision detection. However, incorporating polygonal approximations leads to disjunctive OR constraints in the MPC formulation, which require a mixed integer programming and cause significant computational cost. To overcome this, we propose two different collision-avoidance constraints that reformulate the disjunctive OR constraints as tractable conjunctive AND constraints: (1) a Support Vector Machine (SVM)-based formulation that recasts collision avoidance as a SVM optimization problem, and (2) a Minimum Signed Distance to Edges (MSDE) formulation that leverages minimum signed-distance metrics. We validate both methods through extensive simulations, including tight-space parking scenarios and varied-shape obstacle courses, as well as hardware experiments on an RC-car platform. Our results demonstrate that the SVM-based approach achieves superior navigation accuracy in constrained environments; the MSDE approach, by contrast, runs in real time with only a modest reduction in collision-avoidance performance.

SatAOI: Delimitating Area of Interest for Swing-Arm Troweling Robot for Construction

Authors:Jia-Rui Lin, Shaojie Zhou, Peng Pan, Ruijia Cai, Gang Chen
Date:2025-05-08 00:55:16

In concrete troweling for building construction, robots can significantly reduce workload and improve automation level. However, as a primary task of coverage path planning (CPP) for troweling, delimitating area of interest (AOI) in complex scenes is still challenging, especially for swing-arm robots with more complex working modes. Thus, this research proposes an algorithm to delimitate AOI for swing-arm troweling robot (SatAOI algorithm). By analyzing characteristics of the robot and obstacle maps, mathematical models and collision principles are established. On this basis, SatAOI algorithm achieves AOI delimitation by global search and collision detection. Experiments on different obstacle maps indicate that AOI can be effectively delimitated in scenes under different complexity, and the algorithm can fully consider the connectivity of obstacle maps. This research serves as a foundation for CPP algorithm and full process simulation of swing-arm troweling robots.

Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Authors:Ranjan Sapkota, Yang Cao, Konstantinos I. Roumeliotis, Manoj Karkee
Date:2025-05-07 19:46:43

Vision-Language-Action (VLA) models mark a transformative advancement in artificial intelligence, aiming to unify perception, natural language understanding, and embodied action within a single computational framework. This foundational review presents a comprehensive synthesis of recent advancements in Vision-Language-Action models, systematically organized across five thematic pillars that structure the landscape of this rapidly evolving field. We begin by establishing the conceptual foundations of VLA systems, tracing their evolution from cross-modal learning architectures to generalist agents that tightly integrate vision-language models (VLMs), action planners, and hierarchical controllers. Our methodology adopts a rigorous literature review framework, covering over 80 VLA models published in the past three years. Key progress areas include architectural innovations, parameter-efficient training strategies, and real-time inference accelerations. We explore diverse application domains such as humanoid robotics, autonomous vehicles, medical and industrial robotics, precision agriculture, and augmented reality navigation. The review further addresses major challenges across real-time control, multimodal action representation, system scalability, generalization to unseen tasks, and ethical deployment risks. Drawing from the state-of-the-art, we propose targeted solutions including agentic AI adaptation, cross-embodiment generalization, and unified neuro-symbolic planning. In our forward-looking discussion, we outline a future roadmap where VLA models, VLMs, and agentic AI converge to power socially aligned, adaptive, and general-purpose embodied agents. This work serves as a foundational reference for advancing intelligent, real-world robotics and artificial general intelligence. >Vision-language-action, Agentic AI, AI Agents, Vision-language Models

Charged Lepton Flavor Violating Experiments with Muons

Authors:Dylan Palo
Date:2025-05-07 19:43:07

We report on the status of charged lepton flavor violating (CLFV) experiments with muons. We focus on the three "golden channels": $\mu^{+} \rightarrow e^{+} \gamma$, $\mu^{+} \rightarrow e^{+} e^{-} e^{+}$ and $\mu^{-} N \rightarrow e^{-} N$. The collection of upcoming experiments aim for sensitivity improvements up to $10^{4}$ with respect to previous searches. The MEG II experiment, searching for $\mu^{+} \rightarrow e^{+} \gamma$, is currently in its 4th year of physics data-taking with a published result from its first year of data. The Mu3e experiment is an upcoming experiment searching for $\mu^{+} \rightarrow e^{+} e^{-} e^{+}$ with plans of physics data-taking as soon as 2025. The Mu2e and COMET experiments are upcoming searches for $\mu^{-} N \rightarrow e^{-} N$ with the goal of physics data-taking starting in 2027 and 2026 respectively. This proceeding summarizes the signal signature, expected background, resolutions, and timelines for the mentioned searches.

Stow: Robotic Packing of Items into Fabric Pods

Authors:Nicolas Hudson, Josh Hooks, Rahul Warrier, Curt Salisbury, Ross Hartley, Kislay Kumar, Bhavana Chandrashekhar, Paul Birkmeyer, Bosch Tang, Matt Frost, Shantanu Thakar, Tony Piaskowy, Petter Nilsson, Josh Petersen, Neel Doshi, Alan Slatter, Ankit Bhatia, Cassie Meeker, Yuechuan Xue, Dylan Cox, Alex Kyriazis, Bai Lou, Nadeem Hasan, Asif Rana, Nikhil Chacko, Ruinian Xu, Siamak Faal, Esi Seraj, Mudit Agrawal, Kevin Jamieson, Alessio Bisagni, Valerie Samzun, Christine Fuller, Alex Keklak, Alex Frenkel, Lillian Ratliff, Aaron Parness
Date:2025-05-07 17:07:09

This paper presents a compliant manipulation system capable of placing items onto densely packed shelves. The wide diversity of items and strict business requirements for high producing rates and low defect generation have prohibited warehouse robotics from performing this task. Our innovations in hardware, perception, decision-making, motion planning, and control have enabled this system to perform over 500,000 stows in a large e-commerce fulfillment center. The system achieves human levels of packing density and speed while prioritizing work on overhead shelves to enhance the safety of humans working alongside the robots.

Model-Based AI planning and Execution Systems for Robotics

Authors:Or Wertheim, Ronen I. Brafman
Date:2025-05-07 15:17:38

Model-based planning and execution systems offer a principled approach to building flexible autonomous robots that can perform diverse tasks by automatically combining a host of basic skills. This idea is almost as old as modern robotics. Yet, while diverse general-purpose reasoning architectures have been proposed since, general-purpose systems that are integrated with modern robotic platforms have emerged only recently, starting with the influential ROSPlan system. Since then, a growing number of model-based systems for robot task-level control have emerged. In this paper, we consider the diverse design choices and issues existing systems attempt to address, the different solutions proposed so far, and suggest avenues for future development.

Supporting renewable energy planning and operation with data-driven high-resolution ensemble weather forecast

Authors:Jingnan Wang, Jie Chao, Shangshang Yang, Congyi Nai, Kaijun Ren, Kefeng Deng, Xi Chen, Yaxin Liu, Hanqiuzi Wen, Ziniu Xiao, Lifeng Zhang, Xiaodong Wang, Jiping Guan, Baoxiang Pan
Date:2025-05-07 13:20:36

The planning and operation of renewable energy, especially wind power, depend crucially on accurate, timely, and high-resolution weather information. Coarse-grid global numerical weather forecasts are typically downscaled to meet these requirements, introducing challenges of scale inconsistency, process representation error, computation cost, and entanglement of distinct uncertainty sources from chaoticity, model bias, and large-scale forcing. We address these challenges by learning the climatological distribution of a target wind farm using its high-resolution numerical weather simulations. An optimal combination of this learned high-resolution climatological prior with coarse-grid large scale forecasts yields highly accurate, fine-grained, full-variable, large ensemble of weather pattern forecasts. Using observed meteorological records and wind turbine power outputs as references, the proposed methodology verifies advantageously compared to existing numerical/statistical forecasting-downscaling pipelines, regarding either deterministic/probabilistic skills or economic gains. Moreover, a 100-member, 10-day forecast with spatial resolution of 1 km and output frequency of 15 min takes < 1 hour on a moderate-end GPU, as contrast to $\mathcal{O}(10^3)$ CPU hours for conventional numerical simulation. By drastically reducing computational costs while maintaining accuracy, our method paves the way for more efficient and reliable renewable energy planning and operation.