planning - 2025-06-22

Particle-Grid Neural Dynamics for Learning Deformable Object Models from RGB-D Videos

Authors:Kaifeng Zhang, Baoyu Li, Kris Hauser, Yunzhu Li

Date:2025-06-18 17:59:38

Modeling the dynamics of deformable objects is challenging due to their diverse physical properties and the difficulty of estimating states from limited visual information. We address these challenges with a neural dynamics framework that combines object particles and spatial grids in a hybrid representation. Our particle-grid model captures global shape and motion information while predicting dense particle movements, enabling the modeling of objects with varied shapes and materials. Particles represent object shapes, while the spatial grid discretizes the 3D space to ensure spatial continuity and enhance learning efficiency. Coupled with Gaussian Splattings for visual rendering, our framework achieves a fully learning-based digital twin of deformable objects and generates 3D action-conditioned videos. Through experiments, we demonstrate that our model learns the dynamics of diverse objects -- such as ropes, cloths, stuffed animals, and paper bags -- from sparse-view RGB-D recordings of robot-object interactions, while also generalizing at the category level to unseen instances. Our approach outperforms state-of-the-art learning-based and physics-based simulators, particularly in scenarios with limited camera views. Furthermore, we showcase the utility of our learned models in model-based planning, enabling goal-conditioned object manipulation across a range of tasks. The project page is available at https://kywind.github.io/pgnd .

Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

Authors:Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang

Date:2025-06-18 17:58:17

AI agents today are mostly siloed - they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action - but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce Embodied Web Agents, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning. To operationalize this concept, we first develop the Embodied Web Agents task environments, a unified simulation platform that tightly integrates realistic 3D indoor and outdoor environments with functional web interfaces. Building upon this platform, we construct and release the Embodied Web Agents Benchmark, which encompasses a diverse suite of tasks including cooking, navigation, shopping, tourism, and geolocation - all requiring coordinated reasoning across physical and digital realms for systematic assessment of cross-domain intelligence. Experimental results reveal significant performance gaps between state-of-the-art AI systems and human capabilities, establishing both challenges and opportunities at the intersection of embodied cognition and web-scale knowledge access. All datasets, codes and websites are publicly available at our project page https://embodied-web-agent.github.io/.

SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

Authors:Yao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp

Date:2025-06-18 17:54:55

The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic system generation that constructs agentic systems from scratch and jointly optimizes agent functionality and collaboration as interdependent components through language-driven exploration. To enable efficient search over system-level structures, SwarmAgentic maintains a population of candidate systems and evolves them via feedback-guided updates, drawing inspiration from Particle Swarm Optimization (PSO). We evaluate our method on six real-world, open-ended, and exploratory tasks involving high-level planning, system-level coordination, and creative reasoning. Given only a task description and an objective function, SwarmAgentic outperforms all baselines, achieving a +261.8% relative improvement over ADAS on the TravelPlanner benchmark, highlighting the effectiveness of full automation in structurally unconstrained tasks. This framework marks a significant step toward scalable and autonomous agentic system design, bridging swarm intelligence with fully automated system multi-agent generation. Our code is publicly released at https://yaoz720.github.io/SwarmAgentic/.

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents

Authors:Karmesh Yadav, Yusuf Ali, Gunshi Gupta, Yarin Gal, Zsolt Kira

Date:2025-06-18 17:06:28

Large vision-language models have recently demonstrated impressive performance in planning and control tasks, driving interest in their application to real-world robotics. However, deploying these models for reasoning in embodied contexts is limited by their ability to incorporate long-term experience collected across multiple days and represented by vast collections of images. Current VLMs typically struggle to process more than a few hundred images concurrently, highlighting the need for more efficient mechanisms to handle long-term memory in embodied settings. To effectively evaluate these models for long-horizon control, a benchmark must specifically target scenarios where memory is crucial for success. Existing long-video QA benchmarks overlook embodied challenges like object manipulation and navigation, which demand low-level skills and fine-grained reasoning over past interactions. Moreover, effective memory integration in embodied agents involves both recalling relevant historical information and executing actions based on that information, making it essential to study these aspects together rather than in isolation. In this work, we introduce a new benchmark for long-range embodied tasks in the Habitat simulator. This benchmark evaluates memory-based capabilities across 60 tasks requiring sustained engagement and contextual awareness in an environment. The tasks can also be procedurally extended to longer and more challenging versions, enabling scalable evaluation of memory and reasoning. We also present baselines that integrate state-of-the-art VLMs with low level navigation policies, assessing their performance on these memory-intensive tasks and highlight areas for improvement.

Automated MRI Tumor Segmentation using hybrid U-Net with Transformer and Efficient Attention

Authors:Syed Haider Ali, Asrar Ahmad, Muhammad Ali, Asifullah Khan, Muhammad Shahban, Nadeem Shaukat

Date:2025-06-18 15:36:37

Cancer is an abnormal growth with potential to invade locally and metastasize to distant organs. Accurate auto-segmentation of the tumor and surrounding normal tissues is required for radiotherapy treatment plan optimization. Recent AI-based segmentation models are generally trained on large public datasets, which lack the heterogeneity of local patient populations. While these studies advance AI-based medical image segmentation, research on local datasets is necessary to develop and integrate AI tumor segmentation models directly into hospital software for efficient and accurate oncology treatment planning and execution. This study enhances tumor segmentation using computationally efficient hybrid UNet-Transformer models on magnetic resonance imaging (MRI) datasets acquired from a local hospital under strict privacy protection. We developed a robust data pipeline for seamless DICOM extraction and preprocessing, followed by extensive image augmentation to ensure model generalization across diverse clinical settings, resulting in a total dataset of 6080 images for training. Our novel architecture integrates UNet-based convolutional neural networks with a transformer bottleneck and complementary attention modules, including efficient attention, Squeeze-and-Excitation (SE) blocks, Convolutional Block Attention Module (CBAM), and ResNeXt blocks. To accelerate convergence and reduce computational demands, we used a maximum batch size of 8 and initialized the encoder with pretrained ImageNet weights, training the model on dual NVIDIA T4 GPUs via checkpointing to overcome Kaggle's runtime limits. Quantitative evaluation on the local MRI dataset yielded a Dice similarity coefficient of 0.764 and an Intersection over Union (IoU) of 0.736, demonstrating competitive performance despite limited data and underscoring the importance of site-specific model development for clinical deployment.

CLAIM: Clinically-Guided LGE Augmentation for Realistic and Diverse Myocardial Scar Synthesis and Segmentation

Authors:Farheen Ramzan, Yusuf Kiberu, Nikesh Jathanna, Shahnaz Jamil-Copley, Richard H. Clayton, Chen, Chen

Date:2025-06-18 15:21:34

Deep learning-based myocardial scar segmentation from late gadolinium enhancement (LGE) cardiac MRI has shown great potential for accurate and timely diagnosis and treatment planning for structural cardiac diseases. However, the limited availability and variability of LGE images with high-quality scar labels restrict the development of robust segmentation models. To address this, we introduce CLAIM: \textbf{C}linically-Guided \textbf{L}GE \textbf{A}ugmentation for Real\textbf{i}stic and Diverse \textbf{M}yocardial Scar Synthesis and Segmentation framework, a framework for anatomically grounded scar generation and segmentation. At its core is the SMILE module (Scar Mask generation guided by cLinical knowledgE), which conditions a diffusion-based generator on the clinically adopted AHA 17-segment model to synthesize images with anatomically consistent and spatially diverse scar patterns. In addition, CLAIM employs a joint training strategy in which the scar segmentation network is optimized alongside the generator, aiming to enhance both the realism of synthesized scars and the accuracy of the scar segmentation performance. Experimental results show that CLAIM produces anatomically coherent scar patterns and achieves higher Dice similarity with real scar distributions compared to baseline models. Our approach enables controllable and realistic myocardial scar synthesis and has demonstrated utility for downstream medical imaging task.

Aerial Grasping via Maximizing Delta-Arm Workspace Utilization

Authors:Haoran Chen, Weiliang Deng, Biyu Ye, Yifan Xiong, Ximin Lyu

Date:2025-06-18 15:13:49

The workspace limits the operational capabilities and range of motion for the systems with robotic arms. Maximizing workspace utilization has the potential to provide more optimal solutions for aerial manipulation tasks, increasing the system's flexibility and operational efficiency. In this paper, we introduce a novel planning framework for aerial grasping that maximizes workspace utilization. We formulate an optimization problem to optimize the aerial manipulator's trajectory, incorporating task constraints to achieve efficient manipulation. To address the challenge of incorporating the delta arm's non-convex workspace into optimization constraints, we leverage a Multilayer Perceptron (MLP) to map position points to feasibility probabilities.Furthermore, we employ Reversible Residual Networks (RevNet) to approximate the complex forward kinematics of the delta arm, utilizing efficient model gradients to eliminate workspace constraints. We validate our methods in simulations and real-world experiments to demonstrate their effectiveness.

Procedures for Constraining Robotic Fiber Positioning for Highly Multiplexed Spectroscopic Surveys: The Case of FPS for SDSS-V

Authors:Ilija Medan, Tom Dwelly, Kevin R. Covey, Eleonora Zari, Michael R. Blanton, Joleen K. Carlberg, S. Drew Chojnowski, Alexander Ji, Yue Shen, John Donor, José Sánchez-Gallego, Sean Morrison, Héctor J. Ibarra-Medel, Conor Sayres, Keivan G. Stassun

Date:2025-06-18 14:07:27

One crucial aspect of planning any large scale astronomical survey is constructing an observing strategy that maximizes reduced data quality. This is especially important for surveys that are rather heterogeneous and broad-ranging in their science goals. The Sloan Digital Sky Survey V (SDSS-V), which now utilizes the Focal Plane System (FPS) to robotically place fibers that feed the spectrographs, certainly meets these criteria. The addition of the FPS facilities an increase in survey efficiency, number of targets and target diversity, but also means the positions of fibers must be constrained to allow for simultaneous observations of sometimes competing programs. The constraints on the positions of the fibers are clearly driven by properties of the science targets e.g., the type of target, brightness of the target, position of the target relative to others in the field, etc. The parameters used to describe these constraints will also depend on the intended science goal of the observation, which will vary with the types of objects requested for the particular observation and the planned sky conditions for the observation. In this work, we detail the SDSS-V data collection scenarios, which consist of sets of parameters that serve as the framework for constraining fiber placements. The numerical values of these parameters were set based on either past experiences or from a series of new tests, which we describe in detail here. These parameters allow a survey like SDSS-V to be algorithmically planned to maximize the science output, while guaranteeing data quality throughout its operation.

Multi-dimensional evaluation on a rural integrated energy system including solar, wind, biomass and geothermal energy

Authors:Ruonan Lia, Chang Wena, Mingyu Yan, Congcong Wu, Ahmed Lotfy Elrefai, Xiaotong Zhang, Sahban Wael Saeed Alnaser

Date:2025-06-18 12:15:46

This study focuses on the novel municipal-scale rural integrated energy system (RIES), which encompasses energy supply and application. By constructing a seven-dimensional evaluation system including energy efficiency, energy supply, low-carbon sustainability, environmental impact, energy economy, social benefits, and integrated energy system development, this research combines the improved analytic hierarchy process (IAHP) and entropy weight method (EWM) by sum of squares of deviations to balance expert experience and data objectivity. Furthermore, the cloud model is introduced to handle the fuzziness and randomness in the evaluation. This method can quantify the differences in system performance before and after the planning implementation. The results indicate that after planning, the comprehensive score has increased from 83.12 to 87.55, the entropy value has decreased from 6.931 to 5.336, indicating enhanced system stability. The hyper-entropy has dropped from 3.08 to 2.278, reflecting a reduction in uncertainty. The research findings provide a scientific basis for the planning optimization, policy-making, and sustainable development of rural integrated energy systems, possessing both theoretical innovation and practical guiding value.

Efficient Navigation Among Movable Obstacles using a Mobile Manipulator via Hierarchical Policy Learning

Authors:Taegeun Yang, Jiwoo Hwang, Jeil Jeong, Minsung Yoon, Sung-Eui Yoon

Date:2025-06-18 11:49:57

We propose a hierarchical reinforcement learning (HRL) framework for efficient Navigation Among Movable Obstacles (NAMO) using a mobile manipulator. Our approach combines interaction-based obstacle property estimation with structured pushing strategies, facilitating the dynamic manipulation of unforeseen obstacles while adhering to a pre-planned global path. The high-level policy generates pushing commands that consider environmental constraints and path-tracking objectives, while the low-level policy precisely and stably executes these commands through coordinated whole-body movements. Comprehensive simulation-based experiments demonstrate improvements in performing NAMO tasks, including higher success rates, shortened traversed path length, and reduced goal-reaching times, compared to baselines. Additionally, ablation studies assess the efficacy of each component, while a qualitative analysis further validates the accuracy and reliability of the real-time obstacle property estimation.

Comparison of Innovative Strategies for the Coverage Problem: Path Planning, Search Optimization, and Applications in Underwater Robotics

Authors:Ahmed Ibrahim, Francisco F. C. Rego, Éric Busvelle

Date:2025-06-18 11:46:27

In many applications, including underwater robotics, the coverage problem requires an autonomous vehicle to systematically explore a defined area while minimizing redundancy and avoiding obstacles. This paper investigates coverage path planning strategies to enhance the efficiency of underwater gliders, particularly in maximizing the probability of detecting a radioactive source while ensuring safe navigation. We evaluate three path-planning approaches: the Traveling Salesman Problem (TSP), Minimum Spanning Tree (MST), and Optimal Control Problem (OCP). Simulations were conducted in MATLAB, comparing processing time, uncovered areas, path length, and traversal time. Results indicate that OCP is preferable when traversal time is constrained, although it incurs significantly higher computational costs. Conversely, MST-based approaches provide faster but less optimal solutions. These findings offer insights into selecting appropriate algorithms based on mission priorities, balancing efficiency and computational feasibility.

OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models

Authors:Lanfeng Zhong, Xin Liao, Shichuan Zhang, Shaoting Zhang, Guotai Wang

Date:2025-06-18 09:47:45

Pathology image classification plays a crucial role in accurate medical diagnosis and treatment planning. Training high-performance models for this task typically requires large-scale annotated datasets, which are both expensive and time-consuming to acquire. Active Learning (AL) offers a solution by iteratively selecting the most informative samples for annotation, thereby reducing the labeling effort. However, most AL methods are designed under the assumption of a closed-set scenario, where all the unannotated images belong to target classes. In real-world clinical environments, the unlabeled pool often contains a substantial amount of Out-Of-Distribution (OOD) data, leading to low efficiency of annotation in traditional AL methods. Furthermore, most existing AL methods start with random selection in the first query round, leading to a significant waste of labeling costs in open-set scenarios. To address these challenges, we propose OpenPath, a novel open-set active learning approach for pathological image classification leveraging a pre-trained Vision-Language Model (VLM). In the first query, we propose task-specific prompts that combine target and relevant non-target class prompts to effectively select In-Distribution (ID) and informative samples from the unlabeled pool. In subsequent queries, Diverse Informative ID Sampling (DIS) that includes Prototype-based ID candidate Selection (PIS) and Entropy-Guided Stochastic Sampling (EGSS) is proposed to ensure both purity and informativeness in a query, avoiding the selection of OOD samples. Experiments on two public pathology image datasets show that OpenPath significantly enhances the model's performance due to its high purity of selected samples, and outperforms several state-of-the-art open-set AL methods. The code is available at \href{https://github.com/HiLab-git/OpenPath}{https://github.com/HiLab-git/OpenPath}..

MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning

Authors:Leonid Ivanov, Vasily Yuryev, Dmitry Yudin

Date:2025-06-18 09:42:30

In autonomous driving, high-definition (HD) maps and semantic maps in bird's-eye view (BEV) are essential for accurate localization, planning, and decision-making. This paper introduces an enhanced End-to-End model named MapFM for online vectorized HD map generation. We show significantly boost feature representation quality by incorporating powerful foundation model for encoding camera images. To further enrich the model's understanding of the environment and improve prediction quality, we integrate auxiliary prediction heads for semantic segmentation in the BEV representation. This multi-task learning approach provides richer contextual supervision, leading to a more comprehensive scene representation and ultimately resulting in higher accuracy and improved quality of the predicted vectorized HD maps. The source code is available at https://github.com/LIvanoff/MapFM.

New Physics Opportunities at Neutrino Facilities: BSM Physics at Accelerator, Atmospheric, and Reactor Neutrino Experiments

Authors:Koun Choi, Doojin Kim, Jong-Chul Park, Seodong Shin, Pouya Bakhti, Ki-Young Choi, Chang Hyon Ha, Kazumi Hata, Wooyoung Jang, Yu Seon Jeong, Young Ju Ko, Hyun Su Lee, Weijun Li, Yu-Feng Li, Mehedi Masud, Kenny C. Y. Ng, Jungsic Park, Min-Gwa Park, Komninos-John Plows, Meshkat Rajaee, Eunil Won, Byeongsu Yang, Seong Moon Yoo, Jaehoon Yu, Seokhoon Yun

Date:2025-06-18 09:36:48

Since the discovery of the Higgs boson, the long-standing task at hand in particle physics is the search for new physics beyond the Standard Model, which accounts for only about 5\% of the Universe. In light of this situation, the neutrino sector has drawn significant attention due to neutrino oscillations, which require physics beyond the Standard Model and have prompted a wide array of active and planned experimental programs. Notably, neutrino facilities offer substantial potential to search for new physics beyond neutrino oscillations, owing to their precision measurement capabilities, diverse experimental configurations, and various neutrino sources. This paper provides a review of the landscape of new physics that can be probed at current and future neutrino experiments, categorized into laboratory-produced and cosmogenic signals. We discuss recent experimental results interpreted through the lens of new physics, as well as detailed plans and projected sensitivities of next-generation facilities. This review is based on presentations from the 4th Workshop on New Physics Opportunities in Neutrino Facilities (NPN 2024), held at IBS in Daejeon, Korea, on June 3-5, 2024. Particular emphasis is placed on accelerator-based neutrino experiments and a range of neutrino programs in East Asia. We also outline key tasks necessary to realize the promising new physics opportunities ahead.

Designing Intent: A Multimodal Framework for Human-Robot Cooperation in Industrial Workspaces

Authors:Francesco Chiossi, Julian Rasch, Robin Welsch, Albrecht Schmidt, Florian Michahelles

Date:2025-06-18 09:23:54

As robots enter collaborative workspaces, ensuring mutual understanding between human workers and robotic systems becomes a prerequisite for trust, safety, and efficiency. In this position paper, we draw on the cooperation scenario of the AIMotive project in which a human and a cobot jointly perform assembly tasks to argue for a structured approach to intent communication. Building on the Situation Awareness-based Agent Transparency (SAT) framework and the notion of task abstraction levels, we propose a multidimensional design space that maps intent content (SAT1, SAT3), planning horizon (operational to strategic), and modality (visual, auditory, haptic). We illustrate how this space can guide the design of multimodal communication strategies tailored to dynamic collaborative work contexts. With this paper, we lay the conceptual foundation for a future design toolkit aimed at supporting transparent human-robot interaction in the workplace. We highlight key open questions and design challenges, and propose a shared agenda for multimodal, adaptive, and trustworthy robotic collaboration in hybrid work environments.

DOVA-PATBM: An Intelligent, Adaptive, and Scalable Framework for Optimizing Large-Scale EV Charging Infrastructure

Authors:Chuan Li, Shunyu Zhao, Vincent Gauthier, Hassine Moungla

Date:2025-06-18 09:15:18

The accelerating uptake of battery-electric vehicles demands infrastructure planning tools that are both data-rich and geographically scalable. Whereas most prior studies optimise charging locations for single cities, state-wide and national networks must reconcile the conflicting requirements of dense metropolitan cores, car-dependent exurbs, and power-constrained rural corridors. We present DOVA-PATBM (Deployment Optimisation with Voronoi-oriented, Adaptive, POI-Aware Temporal Behaviour Model), a geo-computational framework that unifies these contexts in a single pipeline. The method rasterises heterogeneous data (roads, population, night lights, POIs, and feeder lines) onto a hierarchical H3 grid, infers intersection importance with a zone-normalised graph neural network centrality model, and overlays a Voronoi tessellation that guarantees at least one five-port DC fast charger within every 30 km radius. Hourly arrival profiles, learned from loop-detector and floating-car traces, feed a finite M/M/c queue to size ports under feeder-capacity and outage-risk constraints. A greedy maximal-coverage heuristic with income-weighted penalties then selects the minimum number of sites that satisfy coverage and equity targets. Applied to the State of Georgia, USA, DOVA-PATBM (i) increases 30 km tile coverage by 12 percentage points, (ii) halves the mean distance that low-income residents travel to the nearest charger, and (iii) meets sub-transmission headroom everywhere -- all while remaining computationally tractable for national-scale roll-outs. These results demonstrate that a tightly integrated, GNN-driven, multi-resolution approach can bridge the gap between academic optimisation and deployable infrastructure policy.

Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study

Authors:Mohamad A. Hady, Siyi Hu, Mahardhika Pratama, Jimmy Cao, Ryszard Kowalczyk

Date:2025-06-18 07:42:11

The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.

Advanced approach for Agile/Scrum Process: RetroAI++

Authors:Maria Spichkova, Kevin Iwan, Madeleine Zwart, Hina Lee, Yuwon Yoon, Xiaohan Qin

Date:2025-06-18 06:38:43

In Agile/Scrum software development, sprint planning and retrospective analysis are the key elements of project management. The aim of our work is to support software developers in these activities. In this paper, we present our prototype tool RetroAI++, based on emerging intelligent technologies. In our RetroAI++ prototype, we aim to automate and refine the practical application of Agile/Scrum processes within Sprint Planning and Retrospectives. Leveraging AI insights, our prototype aims to automate and refine the many processes involved in the Sprint Planning, Development and Retrospective stages of Agile/Scrum development projects, offering intelligent suggestions for sprint organisation as well as meaningful insights for retrospective reflection.

Transit for All: Mapping Equitable Bike2Subway Connection using Region Representation Learning

Authors:Min Namgung, JangHyeon Lee, Fangyi Ding, Yao-Yi Chiang

Date:2025-06-18 03:31:07

Ensuring equitable public transit access remains challenging, particularly in densely populated cities like New York City (NYC), where low-income and minority communities often face limited transit accessibility. Bike-sharing systems (BSS) can bridge these equity gaps by providing affordable first- and last-mile connections. However, strategically expanding BSS into underserved neighborhoods is difficult due to uncertain bike-sharing demand at newly planned ("cold-start") station locations and limitations in traditional accessibility metrics that may overlook realistic bike usage potential. We introduce Transit for All (TFA), a spatial computing framework designed to guide the equitable expansion of BSS through three components: (1) spatially-informed bike-sharing demand prediction at cold-start stations using region representation learning that integrates multimodal geospatial data, (2) comprehensive transit accessibility assessment leveraging our novel weighted Public Transport Accessibility Level (wPTAL) by combining predicted bike-sharing demand with conventional transit accessibility metrics, and (3) strategic recommendations for new bike station placements that consider potential ridership and equity enhancement. Using NYC as a case study, we identify transit accessibility gaps that disproportionately impact low-income and minority communities in historically underserved neighborhoods. Our results show that strategically placing new stations guided by wPTAL notably reduces disparities in transit access related to economic and demographic factors. From our study, we demonstrate that TFA provides practical guidance for urban planners to promote equitable transit and enhance the quality of life in underserved urban communities.

Time-Optimized Safe Navigation in Unstructured Environments through Learning Based Depth Completion

Authors:Jeffrey Mao, Raghuram Cauligi Srinivas, Steven Nogar, Giuseppe Loianno

Date:2025-06-17 21:01:05

Quadrotors hold significant promise for several applications such as agriculture, search and rescue, and infrastructure inspection. Achieving autonomous operation requires systems to navigate safely through complex and unfamiliar environments. This level of autonomy is particularly challenging due to the complexity of such environments and the need for real-time decision making especially for platforms constrained by size, weight, and power (SWaP), which limits flight time and precludes the use of bulky sensors like Light Detection and Ranging (LiDAR) for mapping. Furthermore, computing globally optimal, collision-free paths and translating them into time-optimized, safe trajectories in real time adds significant computational complexity. To address these challenges, we present a fully onboard, real-time navigation system that relies solely on lightweight onboard sensors. Our system constructs a dense 3D map of the environment using a novel visual depth estimation approach that fuses stereo and monocular learning-based depth, yielding longer-range, denser, and less noisy depth maps than conventional stereo methods. Building on this map, we introduce a novel planning and trajectory generation framework capable of rapidly computing time-optimal global trajectories. As the map is incrementally updated with new depth information, our system continuously refines the trajectory to maintain safety and optimality. Both our planner and trajectory generator outperforms state-of-the-art methods in terms of computational efficiency and guarantee obstacle-free trajectories. We validate our system through robust autonomous flight experiments in diverse indoor and outdoor environments, demonstrating its effectiveness for safe navigation in previously unknown settings.

Recursive Variational Autoencoders for 3D Blood Vessel Generative Modeling

Authors:Paula Feldman, Miguel Fainstein, Viviana Siless, Claudio Delrieux, Emmanuel Iarussi

Date:2025-06-17 18:47:27

Anatomical trees play an important role in clinical diagnosis and treatment planning. Yet, accurately representing these structures poses significant challenges owing to their intricate and varied topology and geometry. Most existing methods to synthesize vasculature are rule based, and despite providing some degree of control and variation in the structures produced, they fail to capture the diversity and complexity of actual anatomical data. We developed a Recursive variational Neural Network (RvNN) that fully exploits the hierarchical organization of the vessel and learns a low-dimensional manifold encoding branch connectivity along with geometry features describing the target surface. After training, the RvNN latent space can be sampled to generate new vessel geometries. By leveraging the power of generative neural networks, we generate 3D models of blood vessels that are both accurate and diverse, which is crucial for medical and surgical training, hemodynamic simulations, and many other purposes. These results closely resemble real data, achieving high similarity in vessel radii, length, and tortuosity across various datasets, including those with aneurysms. To the best of our knowledge, this work is the first to utilize this technique for synthesizing blood vessels.

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

Authors:Jiahua Ma, Yiran Qin, Yixiong Li, Xuanqi Liao, Yulan Guo, Ruimao Zhang

Date:2025-06-17 17:59:12

Diffusion Policy (DP) enables robots to learn complex behaviors by imitating expert demonstrations through action diffusion. However, in practical applications, hardware limitations often degrade data quality, while real-time constraints restrict model inference to instantaneous state and scene observations. These limitations seriously reduce the efficacy of learning from expert demonstrations, resulting in failures in object localization, grasp planning, and long-horizon task execution. To address these challenges, we propose Causal Diffusion Policy (CDP), a novel transformer-based diffusion model that enhances action prediction by conditioning on historical action sequences, thereby enabling more coherent and context-aware visuomotor policy learning. To further mitigate the computational cost associated with autoregressive inference, a caching mechanism is also introduced to store attention key-value pairs from previous timesteps, substantially reducing redundant computations during execution. Extensive experiments in both simulated and real-world environments, spanning diverse 2D and 3D manipulation tasks, demonstrate that CDP uniquely leverages historical action sequences to achieve significantly higher accuracy than existing methods. Moreover, even when faced with degraded input observation quality, CDP maintains remarkable precision by reasoning through temporal continuity, which highlights its practical robustness for robotic control under realistic, imperfect conditions.

RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills

Authors:Chunru Lin, Haotian Yuan, Yian Wang, Xiaowen Qiu, Tsun-Hsuan Wang, Minghao Guo, Bohan Wang, Yashraj Narang, Dieter Fox, Chuang Gan

Date:2025-06-17 17:57:37

Endowing robots with tool design abilities is critical for enabling them to solve complex manipulation tasks that would otherwise be intractable. While recent generative frameworks can automatically synthesize task settings, such as 3D scenes and reward functions, they have not yet addressed the challenge of tool-use scenarios. Simply retrieving human-designed tools might not be ideal since many tools (e.g., a rolling pin) are difficult for robotic manipulators to handle. Furthermore, existing tool design approaches either rely on predefined templates with limited parameter tuning or apply generic 3D generation methods that are not optimized for tool creation. To address these limitations, we propose RobotSmith, an automated pipeline that leverages the implicit physical knowledge embedded in vision-language models (VLMs) alongside the more accurate physics provided by physics simulations to design and use tools for robotic manipulation. Our system (1) iteratively proposes tool designs using collaborative VLM agents, (2) generates low-level robot trajectories for tool use, and (3) jointly optimizes tool geometry and usage for task performance. We evaluate our approach across a wide range of manipulation tasks involving rigid, deformable, and fluid objects. Experiments show that our method consistently outperforms strong baselines in terms of both task success rate and overall performance. Notably, our approach achieves a 50.0\% average success rate, significantly surpassing other baselines such as 3D generation (21.4%) and tool retrieval (11.1%). Finally, we deploy our system in real-world settings, demonstrating that the generated tools and their usage plans transfer effectively to physical execution, validating the practicality and generalization capabilities of our approach.

Efficient and Real-Time Motion Planning for Robotics Using Projection-Based Optimization

Authors:Xuemin Chi, Hakan Girgin, Tobias Löw, Yangyang Xie, Teng Xue, Jihao Huang, Cheng Hu, Zhitao Liu, Sylvain Calinon

Date:2025-06-17 17:56:42

Generating motions for robots interacting with objects of various shapes is a complex challenge, further complicated by the robot geometry and multiple desired behaviors. While current robot programming tools (such as inverse kinematics, collision avoidance, and manipulation planning) often treat these problems as constrained optimization, many existing solvers focus on specific problem domains or do not exploit geometric constraints effectively. We propose an efficient first-order method, Augmented Lagrangian Spectral Projected Gradient Descent (ALSPG), which leverages geometric projections via Euclidean projections, Minkowski sums, and basis functions. We show that by using geometric constraints rather than full constraints and gradients, ALSPG significantly improves real-time performance. Compared to second-order methods like iLQR, ALSPG remains competitive in the unconstrained case. We validate our method through toy examples and extensive simulations, and demonstrate its effectiveness on a 7-axis Franka robot, a 6-axis P-Rob robot and a 1:10 scale car in real-world experiments. Source codes, experimental data and videos are available on the project webpage: https://sites.google.com/view/alspg-oc

Swarm-STL: A Framework for Motion Planning in Large-Scale, Multi-Swarm Systems

Authors:Shiyu Cheng, Luyao Niu, Bhaskar Ramasubramanian, Andrew Clark, Radha Poovendran

Date:2025-06-17 17:40:12

In multi-agent systems, signal temporal logic (STL) is widely used for path planning to accomplish complex objectives with formal safety guarantees. However, as the number of agents increases, existing approaches encounter significant computational challenges. Recognizing that many complex tasks require cooperation among multiple agents, we propose swarm STL specifications to describe the collective tasks that need to be achieved by a team of agents. Next, we address the motion planning problem for all the agents in two stages. First, we abstract a group of cooperating agents as a swarm and construct a reduced-dimension state space whose dimension does not increase with the number of agents. The path planning is performed at the swarm level, ensuring the safety and swarm STL specifications are satisfied. Then, we design low-level control strategies for agents within each swarm based on the path synthesized in the first step. The trajectories of agents generated by the two-step policy ensure satisfaction of the STL specifications. We evaluate our two-stage approach in both single-swarm and multi-swarm scenarios. The results demonstrate that all tasks are completed with safety guarantees. Compared to the baseline multi-agent planning approach, our method maintains computational efficiency as the number of agents increases, since the computational time scales with the number of swarms rather than the number of agents.

Linear Planar 3-SAT and Its Applications in Planning

Authors:Victorien Desbois, Ocan Sankur, François Schwarzentruber

Date:2025-06-17 16:51:03

Several fragments of the satisfiability problem have been studied in the literature. Among these, Linear 3-SAT is a satisfaction problem in which each clause (viewed as a set of literals) intersects with at most one other clause; moreover, any pair of clauses have at most one literal in common. Planar 3-SAT is a fragment which requires that the so-called variable-clause graph is planar. Both fragments are NP-complete and have applications in encoding NP-hard planning problems. In this paper, we investigate the complexity and applications of the fragment obtained combining both features. We define Linear Planar 3-SAT and prove its NP-completeness. We also study the reconfiguration problem of Linear Planar 3-SAT and show that it is PSPACE-complete. As an application, we use these new results to prove the NP-completeness of Bounded Connected Multi-Agent Pathfinding and the PSPACE-completeness of Connected Multi-Agent Pathfinding in two-dimensional grids.

A Digital Twin Framework for Adaptive Treatment Planning in Radiotherapy

Authors:Chih-Wei Chang, Sri Akkineni, Mingzhe Hu, Keyur D. Shah, Jun Zhou, Xiaofeng Yang

Date:2025-06-17 16:40:10

This study aims to develop and evaluate a digital twin (DT) framework to enhance adaptive proton therapy for prostate stereotactic body radiotherapy (SBRT), focusing on improving treatment precision for dominant intraprostatic lesions (DILs) while minimizing organ-at-risk (OAR) toxicity. We propose a decision-theoretic (DT) framework combining deep learning (DL)-based deformable image registration (DIR) with a prior treatment database to generate synthetic CTs (sCTs) for predicting interfractional anatomical changes. Using daily CBCT from five prostate SBRT patients with DILs, the framework precomputes multiple plans with high (DT-H) and low (DT-L) similarity sCTs. Plan optimization is performed in RayStation 2023B, assuming a constant RBE of 1.1 and robustly accounting for positional and range uncertainties. Plan quality is evaluated via a modified ProKnow score across two fractions, with reoptimization limited to 10 minutes. Daily CBCT evaluation showed clinical plans often violated OAR constraints (e.g., bladder V20.8Gy, rectum V23Gy), with DIL V100 < 90% in 2 patients, indicating SIFB failure. DT-H plans, using high-similarity sCTs, achieved better or comparable DIL/CTV coverage and lower OAR doses, with reoptimization completed within 10 min (e.g., DT-H-REopt-A score: 154.3-165.9). DT-L plans showed variable outcomes; lower similarity correlated with reduced DIL coverage (e.g., Patient 4: 84.7%). DT-H consistently outperformed clinical plans within time limits, while extended optimization brought DT-L and clinical plans closer to DT-H quality. This DT framework enables rapid, personalized adaptive proton therapy, improving DIL targeting and reducing toxicity. By addressing geometric uncertainties, it supports outcome gains in ultra-hypofractionated prostate RT and lays groundwork for future multimodal anatomical prediction.

AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

Authors:Aishan Liu, Zonghao Ying, Le Wang, Junjie Mu, Jinyang Guo, Jiakai Wang, Yuqing Ma, Siyuan Liang, Mingchuan Zhang, Xianglong Liu, Dacheng Tao

Date:2025-06-17 16:37:35

The rapid advancement of vision-language models (VLMs) and their integration into embodied agents have unlocked powerful capabilities for decision-making. However, as these systems are increasingly deployed in real-world environments, they face mounting safety concerns, particularly when responding to hazardous instructions. In this work, we propose AGENTSAFE, the first comprehensive benchmark for evaluating the safety of embodied VLM agents under hazardous instructions. AGENTSAFE simulates realistic agent-environment interactions within a simulation sandbox and incorporates a novel adapter module that bridges the gap between high-level VLM outputs and low-level embodied controls. Specifically, it maps recognized visual entities to manipulable objects and translates abstract planning into executable atomic actions in the environment. Building on this, we construct a risk-aware instruction dataset inspired by Asimovs Three Laws of Robotics, including base risky instructions and mutated jailbroken instructions. The benchmark includes 45 adversarial scenarios, 1,350 hazardous tasks, and 8,100 hazardous instructions, enabling systematic testing under adversarial conditions ranging from perception, planning, and action execution stages.

Application of a modified commercial laser mass spectrometer as a science analog of the Mars Organic Molecule Analyzer (MOMA)

Authors:Zachary K. Garvin, Anaïs Roussel, Luoth Chou, Marco E. Castillo, Xiang Li, William B. Brinckerhoff, Sarah Stewart Johnson

Date:2025-06-17 16:29:09

The ESA/NASA Rosalind Franklin rover, planned for launch in 2028, will carry the first laser desorption ionization mass spectrometer (LDI-MS) to Mars as part of the Mars Organic Molecule Analyzer (MOMA) instrument. MOMA will contribute to the astrobiology goals of the mission through the analysis of potential organic biosignatures. Due to the minimal availability of comparable equipment, laboratory analyses using similar techniques and instrumentation have been limited. In this study, we present a modified commercial benchtop LDI-MS designed to replicate MOMA functionality and to enable rapid testing of samples for MOMA validation experiments. We demonstrate that our instrument can detect organic standards in mineral matrices, with MS/MS enabling structural identification even in complex mixtures. Performance was additionally validated against an existing LDI-MS prototype through the comparison of spectra derived from natural samples from a Mars analog site in the Atacama Desert. Lastly, analysis of Mars analog synthetic mineral mixes highlights the capacity of the instrument to characterize both the mineralogical and organic signals in mission-relevant samples. This modified benchtop instrument will serve as a platform for collaborative research to prepare for MOMA operations, test LDI parameters, and generate pre-flight reference data in support of the mission science and astrobiology specific goals.

The Effect of Photometric Errors on the Measured Width of the Main Sequence in Star Clusters

Authors:Steven R. Spangler

Date:2025-06-17 14:42:43

This paper deals with the effect of errors in the B and V magnitudes, or measurements in any other color system, on the width of the main sequence in a color-magnitude (Hertzsprung-Russell) diagram. The width is defined as the dispersion in apparent (or absolute) magnitude at a fixed, measured photometric color. I find that the dispersion is larger than might be thought, a priori. A statistical analysis is presented which demonstrates that the error in the magnitude residual from a linear approximation to the main sequence is Gaussian, but with a standard deviation which is much larger, in general, than the errors in the individual B and V magnitudes. This result is confirmed by a Monte Carlo simulation of a main sequence population with specified errors in B and V magnitudes, and can be explained on the basis of simple algebraic arguments.