planning - 2025-05-15

Camera-Only 3D Panoptic Scene Completion for Autonomous Driving through Differentiable Object Shapes

Authors:Nicola Marinello, Simen Cassiman, Jonas Heylen, Marc Proesmans, Luc Van Gool

Date:2025-05-14 17:05:12

Autonomous vehicles need a complete map of their surroundings to plan and act. This has sparked research into the tasks of 3D occupancy prediction, 3D scene completion, and 3D panoptic scene completion, which predict a dense map of the ego vehicle's surroundings as a voxel grid. Scene completion extends occupancy prediction by predicting occluded regions of the voxel grid, and panoptic scene completion further extends this task by also distinguishing object instances within the same class; both aspects are crucial for path planning and decision-making. However, 3D panoptic scene completion is currently underexplored. This work introduces a novel framework for 3D panoptic scene completion that extends existing 3D semantic scene completion models. We propose an Object Module and Panoptic Module that can easily be integrated with 3D occupancy and scene completion methods presented in the literature. Our approach leverages the available annotations in occupancy benchmarks, allowing individual object shapes to be learned as a differentiable problem. The code is available at https://github.com/nicolamarinello/OffsetOcc .

The Blinking Crystallinity of Europa: A Competition between Irradiation and Thermal Alteration

Authors:Cyril Mergny, Frédéric Schmidt, Felix Keil

Date:2025-05-14 15:41:20

The surface of Europa experiences a competition between thermally-induced crystallization and radiation-induced amorphization processes, leading to changes of its crystalline structure. The non-linear crystallization and temperature-dependent amorphization rate, incorporating ions, electrons and UV doses, are integrated into our multiphysics surface model (MSM) LunaIcy, enabling simulations of these coupled processes on icy moons. Thirty simulations spanning 100 000 years, covering the full ranges of albedo and latitude values on Europa, explore the competition between crystallization and irradiation. This is the first modeling of depth-dependent crystallinity profiles on icy moons. The results of our simulations are coherent with existing spectroscopic studies of Europa, both methods showing a primarily amorphous phase at the surface, followed by a crystalline phase after the first millimeter depth. Our method provides quantitative insights into how various parameters found on Europa can influence the subsurface crystallinity profiles. Interpolating upon our simulations, we have generated a crystallinity map of Europa showing, within the top millimeter, highly crystalline ice near the equator, amorphous ice at the poles, and a mix of the two at mid-latitudes. Regions/depths with balanced competition between crystallization and amorphization rates are of high interest due to their periodic fluctuations in crystalline fraction. Our interpolated map reveals periodic variations, with seasonal amplitudes reaching up to 35% of crystalline fraction. These variations could be detected through spectroscopy, and we propose a plan to observe them in forthcoming missions.

aUToPath: Unified Planning and Control for Autonomous Vehicles in Urban Environments Using Hybrid Lattice and Free-Space Search

Authors:Tanmay P. Patel, Connor Wilson, Ellina R. Zhang, Morgan Tran, Chang Keun Paik, Steven L. Waslander, Timothy D. Barfoot

Date:2025-05-14 15:25:50

This paper presents aUToPath, a unified online framework for global path-planning and control to address the challenge of autonomous navigation in cluttered urban environments. A key component of our framework is a novel hybrid planner that combines pre-computed lattice maps with dynamic free-space sampling to efficiently generate optimal driveable corridors in cluttered scenarios. Our system also features sequential convex programming (SCP)-based model predictive control (MPC) to refine the corridors into smooth, dynamically consistent trajectories. A single optimization problem is used to both generate a trajectory and its corresponding control commands; this addresses limitations of decoupled approaches by guaranteeing a safe and feasible path. Simulation results of the novel planner on randomly generated obstacle-rich scenarios demonstrate the success rate of a free-space Adaptively Informed Trees* (AIT*)-based planner, and runtimes comparable to a lattice-based planner. Real-world experiments of the full system on a Chevrolet Bolt EUV further validate performance in dense obstacle fields, demonstrating no violations of traffic, kinematic, or vehicle constraints, and a 100% success rate across eight trials.

Mitigating Configuration Differences Between Development and Production Environments: A Catalog of Strategies

Authors:Marcos Nazario, Rodrigo Bonifacio, Gustavo Pinto

Date:2025-05-14 13:48:33

Context: The Configuration Management of the development and production environments is an important aspect of IT operations. However, managing the configuration differences between these two environments can be challenging, leading to inconsistent behavior, unexpected errors, and increased downtime. Objective: In this study, we sought to investigate the strategies software companies employ to mitigate the configuration differences between the development and production environments. Our goal is to provide a comprehensive understanding of these strategies used to contribute to reducing the risk of configuration-related issues. Method: To achieve this goal, we interviewed 17 participants and leveraged the Thematic Analysis methodology to analyze the interview data. These participants shed some light on the current practices, processes, challenges, or issues they have encountered. Results: Based on the interviews, we systematically formulated and structured a catalog of eight strategies that explain how software producing companies mitigate these configuration differences. These strategies vary from 1) creating detailed configuration management plans, 2) using automation tools, and 3) developing processes to test and validate changes through containers and virtualization technologies. Conclusion: By implementing these strategies, companies can improve their ability to respond quickly and effectively to changes in the production environment. In addition, they can also ensure compliance with industry standards and regulations.

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

Authors:Zhengyan Sheng, Jinghao He, Liping Chen, Kong Aik Lee, Zhen-Hua Ling

Date:2025-05-14 13:35:53

Voice timbre refers to the unique quality or character of a person's voice that distinguishes it from others as perceived by human hearing. The Voice Timbre Attribute Detection (VtaD) 2025 challenge focuses on explaining the voice timbre attribute in a comparative manner. In this challenge, the human impression of voice timbre is verbalized with a set of sensory descriptors, including bright, coarse, soft, magnetic, and so on. The timbre is explained from the comparison between two voices in their intensity within a specific descriptor dimension. The VtaD 2025 challenge starts in May and culminates in a special proposal at the NCMMSC2025 conference in October 2025 in Zhenjiang, China.

Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform

Authors:Qinghui Liu, Jon Nesvold, Hanna Raaum, Elakkyen Murugesu, Martin Røvang, Bradley J Maclntosh, Atle Bjørnerud, Karoline Skogen

Date:2025-05-14 13:33:38

Background: There are many challenges and opportunities in the clinical deployment of AI tools in radiology. The current study describes a radiology software platform called NeoMedSys that can enable efficient deployment and refinements of AI models. We evaluated the feasibility and effectiveness of running NeoMedSys for three months in real-world clinical settings and focused on improvement performance of an in-house developed AI model (VIOLA-AI) designed for intracranial hemorrhage (ICH) detection. Methods: NeoMedSys integrates tools for deploying, testing, and optimizing AI models with a web-based medical image viewer, annotation system, and hospital-wide radiology information systems. A pragmatic investigation was deployed using clinical cases of patients presenting to the largest Emergency Department in Norway (site-1) with suspected traumatic brain injury (TBI) or patients with suspected stroke (site-2). We assessed ICH classification performance as VIOLA-AI encountered new data and underwent pre-planned model retraining. Performance metrics included sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (AUC). Results: NeoMedSys facilitated iterative improvements in the AI model, significantly enhancing its diagnostic accuracy. Automated bleed detection and segmentation were reviewed in near real-time to facilitate re-training VIOLA-AI. The iterative refinement process yielded a marked improvement in classification sensitivity, rising to 90.3% (from 79.2%), and specificity that reached 89.3% (from 80.7%). The bleed detection ROC analysis for the entire sample demonstrated a high area-under-the-curve (AUC) of 0.949 (from 0.873). Model refinement stages were associated with notable gains, highlighting the value of real-time radiologist feedback.

Improved Corner Cutting Constraints for Mixed-Integer Motion Planning of a Differential Drive Micro-Mobility Vehicle

Authors:Angelo Caregnato-Neto, Janito Vaqueiro Ferreira

Date:2025-05-14 13:08:04

This paper addresses the problem of motion planning for differential drive micro-mobility platforms. This class of vehicle is designed to perform small-distance transportation of passengers and goods in structured environments. Our approach leverages mixed-integer linear programming (MILP) to compute global optimal collision-free trajectories taking into account the kinematics and dynamics of the vehicle. We propose novel constraints for intersample collision avoidance and demonstrate its effectiveness using pick-up and delivery missions and statistical analysis of Monte Carlo simulations. The results show that the novel formulation provides the best trajectories in terms of time expenditure and control effort when compared to two state-of-the-art approaches.

TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

Authors:Xuefeng Jiang, Yuan Ma, Pengxiang Li, Leimeng Xu, Xin Wen, Kun Zhan, Zhongpu Xia, Peng Jia, XianPeng Lang, Sheng Sun

Date:2025-05-14 12:10:41

In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promising direction.In this work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information serves as the multi-modal conditional input of the denoising decoder. To tackle the mode collapse dilemma in generating high-quality diverse trajectories, we introduce a simple yet effective multi-modal representation decorrelation optimization mechanism during the training process.TransDiffuser achieves PDMS of 94.85 on the NAVSIM benchmark, surpassing previous state-of-the-art methods without any anchor-based prior trajectories.

Recent Advances in Medical Imaging Segmentation: A Survey

Authors:Fares Bougourzi, Abdenour Hadid

Date:2025-05-14 10:48:37

Medical imaging is a cornerstone of modern healthcare, driving advancements in diagnosis, treatment planning, and patient care. Among its various tasks, segmentation remains one of the most challenging problem due to factors such as data accessibility, annotation complexity, structural variability, variation in medical imaging modalities, and privacy constraints. Despite recent progress, achieving robust generalization and domain adaptation remains a significant hurdle, particularly given the resource-intensive nature of some proposed models and their reliance on domain expertise. This survey explores cutting-edge advancements in medical image segmentation, focusing on methodologies such as Generative AI, Few-Shot Learning, Foundation Models, and Universal Models. These approaches offer promising solutions to longstanding challenges. We provide a comprehensive overview of the theoretical foundations, state-of-the-art techniques, and recent applications of these methods. Finally, we discuss inherent limitations, unresolved issues, and future research directions aimed at enhancing the practicality and accessibility of segmentation models in medical imaging. We are maintaining a \href{https://github.com/faresbougourzi/Awesome-DL-for-Medical-Imaging-Segmentation}{GitHub Repository} to continue tracking and updating innovations in this field.

Robot-Assisted Drone Recovery on a Wavy Surface Using Error-State Kalman Filter and Receding Horizon Model Predictive Control

Authors:Yimou Wu, Mingyang Liang, Ruoyu Xu

Date:2025-05-14 05:04:31

Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for Robot-Assisted Drone Recovery on a Wavy Surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using an Error-State Kalman Filter (ESKF), and secondly, effective motion planning for a manipulator via Receding Horizon Control (RHC). Specifically, the ESKF predicts the drone's future position 0.5s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited torque constraints. We provide a system design that comprises a manipulator subsystem and a UAV subsystem. On the UAV side, we detail how position control and suspended payload strategies are implemented. On the manipulator side, we show how an RHC scheme outperforms traditional low-level control algorithms. Simulation and real-world experiments - using wave-disturbed motion data - demonstrate that our approach achieves a high success rate - above 95% and outperforms conventional baseline methods by up to 10% in efficiency and 20% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art (SOTA) performance and offers a practical solution for maritime drone operations.

Model Identification Adaptive Control with $ρ$-POMDP Planning

Authors:Michelle Ho, Arec Jamgochian, Mykel J. Kochenderfer

Date:2025-05-14 04:06:51

Accurate system modeling is crucial for safe, effective control, as misidentification can lead to accumulated errors, especially under partial observability. We address this problem by formulating informative input design (IID) and model identification adaptive control (MIAC) as belief space planning problems, modeled as partially observable Markov decision processes with belief-dependent rewards ($\rho$-POMDPs). We treat system parameters as hidden state variables that must be localized while simultaneously controlling the system. We solve this problem with an adapted belief-space iterative Linear Quadratic Regulator (BiLQR). We demonstrate it on fully and partially observable tasks for cart-pole and steady aircraft flight domains. Our method outperforms baselines such as regression, filtering, and local optimal control methods, even under instantaneous disturbances to system parameters.

PreCare: Designing AI Assistants for Advance Care Planning (ACP) to Enhance Personal Value Exploration, Patient Knowledge, and Decisional Confidence

Authors:Yu Lun Hsu, Yun-Rung Chou, Chiao-Ju Chang, Yu-Cheng Chang, Zer-Wei Lee, Rokas Gipiškis, Rachel Li, Chih-Yuan Shih, Jen-Kuei Peng, Hsien-Liang Huang, Jaw-Shiun Tsai, Mike Y. Chen

Date:2025-05-14 03:53:35

Advance Care Planning (ACP) allows individuals to specify their preferred end-of-life life-sustaining treatments before they become incapacitated by injury or terminal illness (e.g., coma, cancer, dementia). While online ACP offers high accessibility, it lacks key benefits of clinical consultations, including personalized value exploration, immediate clarification of decision consequences. To bridge this gap, we conducted two formative studies: 1) shadowed and interviewed 3 ACP teams consisting of physicians, nurses, and social workers (18 patients total), and 2) interviewed 14 users of ACP websites. Building on these insights, we designed PreCare in collaboration with 6 ACP professionals. PreCare is a website with 3 AI-driven assistants designed to guide users through exploring personal values, gaining ACP knowledge, and supporting informed decision-making. A usability study (n=12) showed that PreCare achieved a System Usability Scale (SUS) rating of excellent. A comparative evaluation (n=12) showed that PreCare's AI assistants significantly improved exploration of personal values, knowledge, and decisional confidence, and was preferred by 92% of participants.

PLanet: Formalizing Experimental Design

Authors:London Bielicke, Anna Zhang, Shruti Tyagi, Emery Berger, Adam Chlipala, Eunice Jun

Date:2025-05-14 02:57:09

Carefully constructed experimental designs are essential for drawing valid, generalizable conclusions from scientific studies. Unfortunately, experimental design plans can be difficult to specify, communicate clearly, and relate to alternatives. In response, we introduce a grammar of experimental design that provides composable operators for constructing assignment procedures (e.g., Latin square). We implement this grammar in PLanet, a domain-specific language (DSL) that constructs assignment plans in three stages: experimental unit specification, trial-order construction, and order-to-unit mapping. We evaluate PLanet's expressivity by taking a purposive sample of recent CHI and UIST publications, representing their experiments as programs in PLanet, and identifying ambiguities and alternatives. In our evaluation, PLanet could express 11 out of 12 experiments found in sampled papers. Additionally, we found that PLanet constructs helped make complex design choices explicit when the researchers omit technical language describing their study designs.

Deployable and Generalizable Motion Prediction: Taxonomy, Open Challenges and Future Directions

Authors:Letian Wang, Marc-Antoine Lavoie, Sandro Papais, Barza Nisar, Yuxiao Chen, Wenhao Ding, Boris Ivanovic, Hao Shao, Abulikemu Abuduweili, Evan Cook, Yang Zhou, Peter Karkus, Jiachen Li, Changliu Liu, Marco Pavone, Steven Waslander

Date:2025-05-14 02:21:23

Motion prediction, the anticipation of future agent states or scene evolution, is rooted in human cognition, bridging perception and decision-making. It enables intelligent systems, such as robots and self-driving cars, to act safely in dynamic, human-involved environments, and informs broader time-series reasoning challenges. With advances in methods, representations, and datasets, the field has seen rapid progress, reflected in quickly evolving benchmark results. Yet, when state-of-the-art methods are deployed in the real world, they often struggle to generalize to open-world conditions and fall short of deployment standards. This reveals a gap between research benchmarks, which are often idealized or ill-posed, and real-world complexity. To address this gap, this survey revisits the generalization and deployability of motion prediction models, with an emphasis on the applications of robotics, autonomous driving, and human motion. We first offer a comprehensive taxonomy of motion prediction methods, covering representations, modeling strategies, application domains, and evaluation protocols. We then study two key challenges: (1) how to push motion prediction models to be deployable to realistic deployment standards, where motion prediction does not act in a vacuum, but functions as one module of closed-loop autonomy stacks - it takes input from the localization and perception, and informs downstream planning and control. 2) how to generalize motion prediction models from limited seen scenarios/datasets to the open-world settings. Throughout the paper, we highlight critical open challenges to guide future work, aiming to recalibrate the community's efforts, fostering progress that is not only measurable but also meaningful for real-world applications.

EcoSphere: A Decision-Support Tool for Automated Carbon Emission and Cost Optimization in Sustainable Urban Development

Authors:Siavash Ghorbany, Ming Hu, Siyuan Yao, Matthew Sisk, Chaoli Wang

Date:2025-05-14 01:19:44

The construction industry is a major contributor to global greenhouse gas emissions, with embodied carbon being a key component. This study develops EcoSphere, an innovative software designed to evaluate and balance embodied and operational carbon emissions with construction and environmental costs in urban planning. Using high-resolution data from the National Structure Inventory, combined with computer vision and natural language processing applied to Google Street View and satellite imagery, EcoSphere categorizes buildings by structural and material characteristics with a bottom-up approach, creating a baseline emissions dataset. By simulating policy scenarios and mitigation strategies, EcoSphere provides policymakers and non-experts with actionable insights for sustainable development in cities and provide them with a vision of the environmental and financial results of their decisions. Case studies in Chicago and Indianapolis showcase how EcoSphere aids in assessing policy impacts on carbon emissions and costs, supporting data-driven progress toward carbon neutrality.

Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning

Authors:Ardian Selmonaj, Oleg Szehr, Giacomo Del Rio, Alessandro Antonucci, Adrian Schneider, Michael Rüegsegger

Date:2025-05-13 22:13:48

This work presents a Hierarchical Multi-Agent Reinforcement Learning framework for analyzing simulated air combat scenarios involving heterogeneous agents. The objective is to identify effective Courses of Action that lead to mission success within preset simulations, thereby enabling the exploration of real-world defense scenarios at low cost and in a safe-to-fail setting. Applying deep Reinforcement Learning in this context poses specific challenges, such as complex flight dynamics, the exponential size of the state and action spaces in multi-agent systems, and the capability to integrate real-time control of individual units with look-ahead planning. To address these challenges, the decision-making process is split into two levels of abstraction: low-level policies control individual units, while a high-level commander policy issues macro commands aligned with the overall mission targets. This hierarchical structure facilitates the training process by exploiting policy symmetries of individual agents and by separating control from command tasks. The low-level policies are trained for individual combat control in a curriculum of increasing complexity. The high-level commander is then trained on mission targets given pre-trained control policies. The empirical validation confirms the advantages of the proposed framework.

ChicGrasp: Imitation-Learning based Customized Dual-Jaw Gripper Control for Delicate, Irregular Bio-products Manipulation

Authors:Amirreza Davar, Zhengtong Xu, Siavash Mahmoudi, Pouya Sohrabipour, Chaitanya Pallerla, Yu She, Wan Shou, Philip Crandall, Dongyi Wang

Date:2025-05-13 21:56:44

Automated poultry processing lines still rely on humans to lift slippery, easily bruised carcasses onto a shackle conveyor. Deformability, anatomical variance, and strict hygiene rules make conventional suction and scripted motions unreliable. We present ChicGrasp, an end--to--end hardware--software co-design for this task. An independently actuated dual-jaw pneumatic gripper clamps both chicken legs, while a conditional diffusion-policy controller, trained from only 50 multi--view teleoperation demonstrations (RGB + proprioception), plans 5 DoF end--effector motion, which includes jaw commands in one shot. On individually presented raw broiler carcasses, our system achieves a 40.6\% grasp--and--lift success rate and completes the pick to shackle cycle in 38 s, whereas state--of--the--art implicit behaviour cloning (IBC) and LSTM-GMM baselines fail entirely. All CAD, code, and datasets will be open-source. ChicGrasp shows that imitation learning can bridge the gap between rigid hardware and variable bio--products, offering a reproducible benchmark and a public dataset for researchers in agricultural engineering and robot learning.

Modern causal inference approaches to improve power for subgroup analysis in randomized controlled trials

Authors:Antonio D'Alessandro, Jiyu Kim, Samrachana Adhikari, Donald Goff, Falco Bargagli Stoffi, Michele Santacatterina

Date:2025-05-13 20:57:16

In randomized controlled trials (RCTs), subgroup analyses are often planned to evaluate the heterogeneity of treatment effects within pre-specified subgroups of interest. However, these analyses frequently have small sample sizes, reducing the power to detect heterogeneous effects. A way to increase power is by borrowing external data from similar RCTs or observational studies. In this project, we target the conditional average treatment effect (CATE) as the estimand of interest, provide identification assumptions, and propose a doubly robust estimator that uses machine learning and Bayesian nonparametric techniques. Borrowing data, however, may present the additional challenge of practical violations of the positivity assumption, the conditional probability of receiving treatment in the external data source may be small, leading to large inverse weights and erroneous inferences, thus negating the potential power gains from borrowing external data. To overcome this challenge, we also propose a covariate balancing approach, an automated debiased machine learning (DML) estimator, and a calibrated DML estimator. We show improved power in various simulations and offer practical recommendations for the application of the proposed methods. Finally, we apply them to evaluate the effectiveness of citalopram, a drug commonly used to treat depression, for negative symptoms in first-episode schizophrenia patients across subgroups defined by duration of untreated psychosis, using data from two RCTs and an observational study.

Multi-step manipulation task and motion planning guided by video demonstration

Authors:Kateryna Zorina, David Kovar, Mederic Fourmy, Florent Lamiraux, Nicolas Mansard, Justin Carpentier, Josef Sivic, Vladimir Petrik

Date:2025-05-13 20:27:16

This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. Towards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later. We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. To demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (I) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa. For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP).

DeepCPD: Deep Learning Based In-Car Child Presence Detection Using WiFi

Authors:Sakila S. Jayaweera, Beibei Wang, Wei-Hsiang Wang, K. J. Ray Liu

Date:2025-05-13 19:58:36

Child presence detection (CPD) is a vital technology for vehicles to prevent heat-related fatalities or injuries by detecting the presence of a child left unattended. Regulatory agencies around the world are planning to mandate CPD systems in the near future. However, existing solutions have limitations in terms of accuracy, coverage, and additional device requirements. While WiFi-based solutions can overcome the limitations, existing approaches struggle to reliably distinguish between adult and child presence, leading to frequent false alarms, and are often sensitive to environmental variations. In this paper, we present DeepCPD, a novel deep learning framework designed for accurate child presence detection in smart vehicles. DeepCPD utilizes an environment-independent feature-the auto-correlation function (ACF) derived from WiFi channel state information (CSI)-to capture human-related signatures while mitigating environmental distortions. A Transformer-based architecture, followed by a multilayer perceptron (MLP), is employed to differentiate adults from children by modeling motion patterns and subtle body size differences. To address the limited availability of in-vehicle child and adult data, we introduce a two-stage learning strategy that significantly enhances model generalization. Extensive experiments conducted across more than 25 car models and over 500 hours of data collection demonstrate that DeepCPD achieves an overall accuracy of 92.86%, outperforming a CNN baseline by a substantial margin (79.55%). Additionally, the model attains a 91.45% detection rate for children while maintaining a low false alarm rate of 6.14%.

Template-Guided Reconstruction of Pulmonary Segments with Neural Implicit Functions

Authors:Kangxian Xie, Yufei Zhu, Kaiming Kuang, Li Zhang, Hongwei Bran Li, Mingchen Gao, Jiancheng Yang

Date:2025-05-13 19:31:01

High-quality 3D reconstruction of pulmonary segments plays a crucial role in segmentectomy and surgical treatment planning for lung cancer. Due to the resolution requirement of the target reconstruction, conventional deep learning-based methods often suffer from computational resource constraints or limited granularity. Conversely, implicit modeling is favored due to its computational efficiency and continuous representation at any resolution. We propose a neural implicit function-based method to learn a 3D surface to achieve anatomy-aware, precise pulmonary segment reconstruction, represented as a shape by deforming a learnable template. Additionally, we introduce two clinically relevant evaluation metrics to assess the reconstruction comprehensively. Further, due to the absence of publicly available shape datasets to benchmark reconstruction algorithms, we developed a shape dataset named Lung3D, including the 3D models of 800 labeled pulmonary segments and the corresponding airways, arteries, veins, and intersegmental veins. We demonstrate that the proposed approach outperforms existing methods, providing a new perspective for pulmonary segment reconstruction. Code and data will be available at https://github.com/M3DV/ImPulSe.

Efficiently Manipulating Clutter via Learning and Search-Based Reasoning

Authors:Baichuan Huang

Date:2025-05-13 17:53:49

This thesis presents novel algorithms to advance robotic object rearrangement, a critical task for autonomous systems in applications like warehouse automation and household assistance. Addressing challenges of high-dimensional planning, complex object interactions, and computational demands, our work integrates deep learning for interaction prediction, tree search for action sequencing, and parallelized computation for efficiency. Key contributions include the Deep Interaction Prediction Network (DIPN) for accurate push motion forecasting (over 90% accuracy), its synergistic integration with Monte Carlo Tree Search (MCTS) for effective non-prehensile object retrieval (100% completion in specific challenging scenarios), and the Parallel MCTS with Batched Simulations (PMBS) framework, which achieves substantial planning speed-up while maintaining or improving solution quality. The research further explores combining diverse manipulation primitives, validated extensively through simulated and real-world experiments.

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Authors:Yatai Ji, Zhengqiu Zhu, Yong Zhao, Beidan Liu, Chen Gao, Yihao Zhao, Sihang Qiu, Yue Hu, Quanjun Yin, Yong Li

Date:2025-05-13 17:34:54

Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects using visual and textual cues without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object distinction, and the exploration-exploitation dilemma. To bridge this gap and support the AVOS task, we introduce CityAVOS, the first benchmark dataset for autonomous search of common urban objects. This dataset comprises 2,420 tasks across six object categories with varying difficulty levels, enabling comprehensive evaluation of UAV agents' search capabilities. To solve the AVOS tasks, we also propose PRPSearcher (Perception-Reasoning-Planning Searcher), a novel agentic method powered by multi-modal large language models (MLLMs) that mimics human three-tier cognition. Specifically, PRPSearcher constructs three specialized maps: an object-centric dynamic semantic map enhancing spatial perception, a 3D cognitive map based on semantic attraction values for target reasoning, and a 3D uncertainty map for balanced exploration-exploitation search. Also, our approach incorporates a denoising mechanism to mitigate interference from similar objects and utilizes an Inspiration Promote Thought (IPT) prompting mechanism for adaptive action planning. Experimental results on CityAVOS demonstrate that PRPSearcher surpasses existing baselines in both success rate and search efficiency (on average: +37.69% SR, +28.96% SPL, -30.69% MSS, and -46.40% NE). While promising, the performance gap compared to humans highlights the need for better semantic reasoning and spatial exploration capabilities in AVOS tasks. This work establishes a foundation for future advances in embodied target search. Dataset and source code are available at https://anonymous.4open.science/r/CityAVOS-3DF8.

Assumption-robust Causal Inference

Authors:Aditya Ghosh, Dominik Rothenhäusler

Date:2025-05-13 16:39:19

In observational causal inference, it is common to encounter multiple adjustment sets that appear equally plausible. It is often untestable which of these adjustment sets are valid to adjust for (i.e., satisfies ignorability). This discrepancy can pose practical challenges as it is typically unclear how to reconcile multiple, possibly conflicting estimates of the average treatment effect (ATE). A naive approach is to report the whole range (convex hull of the union) of the resulting confidence intervals. However, the width of this interval might not shrink to zero in large samples and can be unnecessarily wide in real applications. To address this issue, we propose a summary procedure that generates a single estimate, one confidence interval, and identifies a set of units for which the causal effect estimate remains valid, provided at least one adjustment set is valid. The width of our proposed confidence interval shrinks to zero with sample size at $n^{-1/2}$ rate, unlike the original range which is of constant order. Thus, our assumption-robust approach enables reliable causal inference on the ATE even in scenarios where most of the adjustment sets are invalid. Admittedly, this robustness comes at a cost: our inferential guarantees apply to a target population close to, but different from, the one originally intended. We use synthetic and real-data examples to demonstrate that our proposed procedure provides substantially tighter confidence intervals for the ATE as compared to the whole range. In particular, for a real-world dataset on 401(k) retirement plans our method produces a confidence interval 50\% shorter than the whole range of confidence intervals based on multiple adjustment sets.

Optimal Trajectory Planning with Collision Avoidance for Autonomous Vehicle Maneuvering

Authors:Jason Zalev

Date:2025-05-13 16:36:20

To perform autonomous driving maneuvers, such as parallel or perpendicular parking, a vehicle requires continual speed and steering adjustments to follow a generated path. In consequence, the path's quality is a limiting factor of the vehicle maneuver's performance. While most path planning approaches include finding a collision-free route, optimal trajectory planning involves solving the best transition from initial to final states, minimizing the action over all paths permitted by a kinematic model. Here we propose a novel method based on sequential convex optimization, which permits flexible and efficient optimal trajectory generation. The objective is to achieve the fastest time, shortest distance, and fewest number of path segments to satisfy motion requirements, while avoiding sensor blind-spots. In our approach, vehicle kinematics are represented by a discretized Dubins model. To avoid collisions, each waypoint is constrained by linear inequalities representing closest distance of obstacles to a polygon specifying the vehicle's extent. To promote smooth and valid trajectories, the solved kinematic state and control variables are constrained and regularized by penalty terms in the model's cost function, which enforces physical restrictions including limits for steering angle, acceleration and speed. In this paper, we analyze trajectories obtained for several parking scenarios. Results demonstrate efficient and collision-free motion generated by the proposed technique.

VizCV: AI-assisted visualization of researchers' publications tracks

Authors:Vladimír Lazárik, Marco Agus, Barbora Kozlíková, Pere-Pau Vázquez

Date:2025-05-13 15:47:59

Analyzing how the publication records of scientists and research groups have evolved over the years is crucial for assessing their expertise since it can support the management of academic environments by assisting with career planning and evaluation. We introduce VizCV, a novel web-based end-to-end visual analytics framework that enables the interactive exploration of researchers' scientific trajectories. It incorporates AI-assisted analysis and supports automated reporting of career evolution. Our system aims to model career progression through three key dimensions: a) research topic evolution to detect and visualize shifts in scholarly focus over time, b) publication record and the corresponding impact, c) collaboration dynamics depicting the growth and transformation of a researcher's co-authorship network. AI-driven insights provide automated explanations of career transitions, detecting significant shifts in research direction, impact surges, or collaboration expansions. The system also supports comparative analysis between researchers, allowing users to compare topic trajectories and impact growth. Our interactive, multi-tab and multiview system allows for the exploratory analysis of career milestones under different perspectives, such as the most impactful articles, emerging research themes, or obtaining a detailed analysis of the contribution of the researcher in a subfield. The key contributions include AI/ML techniques for: a) topic analysis, b) dimensionality reduction for visualizing patterns and trends, c) the interactive creation of textual descriptions of facets of data through configurable prompt generation and large language models, that include key indicators, to help understanding the career development of individuals or groups.

A fully flexible joint lattice position and dose optimization method for LATTICE therapy

Authors:Xin Tong, Weijie Zhang, Ya-Nan Zhu, Xue Hong, Chao Wang, Jufri Setianegara, Yuting Lin, Hao Gao

Date:2025-05-13 15:41:22

Lattice radiotherapy (LATTICE) is a form of spatially fractionated radiation therapy (SFRT) designed to deliver high doses to tumor regions while sparing surrounding tissues. Traditional LATTICE uses rigid vertex patterns, limiting adaptability for irregular tumors or those near critical organs. This study introduces a novel planning method with flexible vertex placement and joint optimization of vertex positions and dose distribution, enhancing treatment precision. The method integrates vertex positioning with other treatment variables within a constrained optimization framework, allowing dynamic adjustments. Results showed that plans generated with the new method (NEW) demonstrated superior or comparable quality to conventional LATTICE plans, with improvements in the optimization objective and peak-to-valley dose ratio (PVDR). This approach offers significant improvements in target dose conformity and OAR sparing, providing an enhanced LATTICE technique.

MC-Swarm: Minimal-Communication Multi-Agent Trajectory Planning and Deadlock Resolution for Quadrotor Swarm

Authors:Yunwoo Lee, Jungwon Park

Date:2025-05-13 14:05:07

For effective multi-agent trajectory planning, it is important to consider lightweight communication and its potential asynchrony. This paper presents a distributed trajectory planning algorithm for a quadrotor swarm that operates asynchronously and requires no communication except during the initial planning phase. Moreover, our algorithm guarantees no deadlock under asynchronous updates and absence of communication during flight. To effectively ensure these points, we build two main modules: coordination state updater and trajectory optimizer. The coordination state updater computes waypoints for each agent toward its goal and performs subgoal optimization while considering deadlocks, as well as safety constraints with respect to neighbor agents and obstacles. Then, the trajectory optimizer generates a trajectory that ensures collision avoidance even with the asynchronous planning updates of neighboring agents. We provide a theoretical guarantee of collision avoidance with deadlock resolution and evaluate the effectiveness of our method in complex simulation environments, including random forests and narrow-gap mazes. Additionally, to reduce the total mission time, we design a faster coordination state update using lightweight communication. Lastly, our approach is validated through extensive simulations and real-world experiments with cluttered environment scenarios.

Simulation and measurement of Black Body Radiation background in a Transition Edge Sensor

Authors:José Alejandro Rubiera Gimeno, Katharina-Sophie Isleif, Friederike Januschek, Axel Lindner, Manuel Meyer, Gulden Othman, Elmeri Rivasto, Rikhav Shah, Christina Schwemmbauer

Date:2025-05-13 13:28:05

The Any Light Particle Search II (ALPS II) experiment at DESY, Hamburg, is a Light-Shining-through-a-Wall (LSW) experiment aiming to probe the existence of axions and axion-like particles (ALPs), which are candidates for dark matter. Data collection in ALPS II is underway utilizing a heterodyne-based detection scheme. A complementary run for confirmation or as an alternative method is planned using single photon detection, requiring a sensor capable of measuring low-energy photons ($1064\,\mathrm{nm}$, $1.165\,\mathrm{eV}$) with high efficiency (higher than $50\,\%$) and a low background rate (below $7.7\cdot10^{-6}\,\mathrm{cps}$). To meet these requirements, we are investigating a tungsten Transition Edge Sensor (TES) provided by NIST, which operates in its superconducting transition region at millikelvin temperatures. This sensor exploits the drastic change in resistance caused by the absorption of a single photon. We find that the background observed in the setup with a fiber-coupled TES is consistent with Black Body Radiation (BBR) as the primary background contributor. A framework was developed to simulate BBR propagation to the TES under realistic conditions. The framework not only allows the exploration of background reduction strategies, such as improving the TES energy resolution, but also reproduces, within uncertainties, the spectral distribution of the observed background. These simulations have been validated with experimental data, in agreement with the modeled background distribution, and show that the improved energy resolution reduces the background rate in the $1064\,\mathrm{nm}$ signal region by one order of magnitude, to approximately $10^{-4}\,\mathrm{cps}$. However, this rate must be reduced further to meet the ALPS II requirements.

CoVoL: A Cooperative Vocabulary Learning Game for Children with Autism

Authors:Pawel Chodkiewicz, Pragya Verma, Grischa Liebel

Date:2025-05-13 12:48:02

Children with Autism commonly face difficulties in vocabulary acquisition, which can have an impact on their social communication. Using digital tools for vocabulary learning can prove beneficial for these children, as they can provide a predictable environment and effective individualized feedback. While existing work has explored the use of technology-assisted vocabulary learning for children with Autism, no study has incorporated turn-taking to facilitate learning and use of vocabulary similar to that used in real-world social contexts. To address this gap, we propose the design of a cooperative two-player vocabulary learning game, CoVoL. CoVoL allows children to engage in game-based vocabulary learning useful for real-world social communication scenarios. We discuss our first prototype and its evaluation. Additionally, we present planned features which are based on feedback obtained through ten interviews with researchers and therapists, as well as an evaluation plan for the final release of CoVoL.