planning - 2025-03-12

Task-Oriented Co-Design of Communication, Computing, and Control for Edge-Enabled Industrial Cyber-Physical Systems

Authors:Yufeng Diao, Yichi Zhang, Daniele De Martini, Philip Guodong Zhao, Emma Liying Li
Date:2025-03-11 17:50:23

This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck (IB) to enhance data transmission efficiency by prioritizing task-specific information. To mitigate the perceived End-to-End (E2E) delays, we develop a Delay-Aware Trajectory-Guided Control Prediction (DTCP) strategy that integrates trajectory planning with control prediction, predicting commands based on E2E delay. Moreover, the DTCP is co-designed with task-oriented JSCC, focusing on transmitting task-specific information for timely and reliable autonomous driving. Experimental results in the CARLA simulator demonstrate that, under an E2E delay of 1 second (20 time slots), the proposed framework achieves a driving score of 48.12, which is 31.59 points higher than using Better Portable Graphics (BPG) while reducing bandwidth usage by 99.19%.

From Expert to Novice: An Empirical Study on Software Architecture Explanations

Authors:Satrio Adi Rukmono, Filip Zamfirov, Lina Ochoa, Michel Chaudron
Date:2025-03-11 17:16:03

The sharing of knowledge about software architecture is crucial in software development, particularly during the onboarding of new developers. However, existing documentation often falls short due to issues like incompleteness and ambiguity. Consequently, oral explanations are used for knowledge transfer. This study investigates what constitutes a good explanation of software architecture through an empirical study. It aims to explore how software architecture explanations are conducted, identify the main challenges, and suggest improvements. It addresses five key areas: relevant architectural concerns, explanation plans, supporting artefacts, typical questions, and expectations. An exploratory field study was conducted using semi-structured interviews with 17 software professionals, including 9 architecture explainers and 8 explainees. The study discovers that an explanation must balance both problem and technical domains while considering the explainee's role, experience, and the goal of the explanation. The concept of the explanation window, which adjusts the level of detail and scope, is introduced to address these variables. We also extend the Twin Peaks model to guide the interplay between problem and solution domains during architectural explanations by adding an emphasis to the context surrounding both domains. Future research should focus on developing better tools and processes to support architecture explanations.

Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer

Authors:Apan Dastider, Hao Fang, Mingjie Lin
Date:2025-03-11 17:02:08

Cross-embodiment robotic manipulation synthesis for complicated tasks is challenging, partially due to the scarcity of paired cross-embodiment datasets and the impediment of designing intricate controllers. Inspired by robotic learning via guided human expert demonstration, we here propose a novel cross-embodiment robotic manipulation algorithm via CycleVAE and human behavior transformer. First, we utilize unsupervised CycleVAE together with a bidirectional subspace alignment algorithm to align latent motion sequences between cross-embodiments. Second, we propose a casual human behavior transformer design to learn the intrinsic motion dynamics of human expert demonstrations. During the test case, we leverage the proposed transformer for the human expert demonstration generation, which will be aligned using CycleVAE for the final human-robotic manipulation synthesis. We validated our proposed algorithm through extensive experiments using a dexterous robotic manipulator with the robotic hand. Our results successfully generate smooth trajectories across intricate tasks, outperforming prior learning-based robotic motion planning algorithms. These results have implications for performing unsupervised cross-embodiment alignment and future autonomous robotics design. Complete video demonstrations of our experiments can be found in https://sites.google.com/view/humanrobots/home.

HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder

Authors:Yingqi Tang, Zhuoran Xu, Zhaotie Meng, Erkang Cheng
Date:2025-03-11 16:52:45

Although end-to-end autonomous driving (E2E-AD) technologies have made significant progress in recent years, there remains an unsatisfactory performance on closed-loop evaluation. The potential of leveraging planning in query design and interaction has not yet been fully explored. In this paper, we introduce a multi-granularity planning query representation that integrates heterogeneous waypoints, including spatial, temporal, and driving-style waypoints across various sampling patterns. It provides additional supervision for trajectory prediction, enhancing precise closed-loop control for the ego vehicle. Additionally, we explicitly utilize the geometric properties of planning trajectories to effectively retrieve relevant image features based on physical locations using deformable attention. By combining these strategies, we propose a novel end-to-end autonomous driving framework, termed HiP-AD, which simultaneously performs perception, prediction, and planning within a unified decoder. HiP-AD enables comprehensive interaction by allowing planning queries to iteratively interact with perception queries in the BEV space while dynamically extracting image features from perspective views. Experiments demonstrate that HiP-AD outperforms all existing end-to-end autonomous driving methods on the closed-loop benchmark Bench2Drive and achieves competitive performance on the real-world dataset nuScenes.

Deformable Linear Object Surface Placement Using Elastica Planning and Local Shape Control

Authors:I. Grinberg, A. Levin, E. D. Rimon
Date:2025-03-11 15:33:36

Manipulation of deformable linear objects (DLOs) in constrained environments is a challenging task. This paper describes a two-layered approach for placing DLOs on a flat surface using a single robot hand. The high-level layer is a novel DLO surface placement method based on Euler's elastica solutions. During this process one DLO endpoint is manipulated by the robot gripper while a variable interior point of the DLO serves as the start point of the portion aligned with the placement surface. The low-level layer forms a pipeline controller. The controller estimates the DLO current shape using a Residual Neural Network (ResNet) and uses low-level feedback to ensure task execution in the presence of modeling and placement errors. The resulting DLO placement approach can recover from states where the high-level manipulation planner has failed as required by practical robot manipulation systems. The DLO placement approach is demonstrated with simulations and experiments that use silicon mock-up objects prepared for fresh food applications.

Segmentation-Guided CT Synthesis with Pixel-Wise Conformal Uncertainty Bounds

Authors:David Vallmanya Poch, Yorick Estievenart, Elnura Zhalieva, Sukanya Patra, Mohammad Yaqub, Souhaib Ben Taieb
Date:2025-03-11 15:07:16

Accurate dose calculations in proton therapy rely on high-quality CT images. While planning CTs (pCTs) serve as a reference for dosimetric planning, Cone Beam CT (CBCT) is used throughout Adaptive Radiotherapy (ART) to generate sCTs for improved dose calculations. Despite its lower cost and reduced radiation exposure advantages, CBCT suffers from severe artefacts and poor image quality, making it unsuitable for precise dosimetry. Deep learning-based CBCT-to-CT translation has emerged as a promising approach. Still, existing methods often introduce anatomical inconsistencies and lack reliable uncertainty estimates, limiting their clinical adoption. To bridge this gap, we propose STF-RUE, a novel framework integrating two key components. First, STF, a segmentation-guided CBCT-to-CT translation method that enhances anatomical consistency by leveraging segmentation priors extracted from pCTs. Second, RUE, a conformal prediction method that augments predicted CTs with pixel-wise conformal prediction intervals, providing clinicians with robust reliability indicator. Comprehensive experiments using UNet++ and Fast-DDPM on two benchmark datasets demonstrate that STF-RUE significantly improves translation accuracy, as measured by a novel soft-tissue-focused metric designed for precise dose computation. Additionally, STF-RUE provides better-calibrated uncertainty sets for synthetic CT, reinforcing trust in synthetic CTs. By addressing both anatomical fidelity and uncertainty quantification, STF-RUE marks a crucial step toward safer and more effective adaptive proton therapy. Code is available at https://anonymous.4open.science/r/cbct2ct_translation-B2D9/.

DISTINGUISH Workflow: A New Paradigm of Dynamic Well Placement Using Generative Machine Learning

Authors:Sergey Alyaev, Kristian Fossum, Hibat Errahmen Djecta, Jan Tveranger, Ahmed H. Elsheikh
Date:2025-03-11 15:00:13

The real-time process of directional changes while drilling, known as geosteering, is crucial for hydrocarbon extraction and emerging directional drilling applications such as geothermal energy, civil infrastructure, and CO2 storage. The geo-energy industry seeks an automatic geosteering workflow that continually updates the subsurface uncertainties and captures the latest geological understanding given the most recent observations in real-time. We propose "DISTINGUISH": a real-time, AI-driven workflow designed to transform geosteering by integrating Generative Adversarial Networks (GANs) for geological parameterization, ensemble methods for model updating, and global discrete dynamic programming (DDP) optimization for complex decision-making during directional drilling operations. The DISTINGUISH framework relies on offline training of a GAN model to reproduce relevant geology realizations and a Forward Neural Network (FNN) to model Logging-While-Drilling (LWD) tools' response for a given geomodel. This paper introduces a first-of-its-kind workflow that progressively reduces GAN-geomodel uncertainty around and ahead of the drilling bit and adjusts the well plan accordingly. The workflow automatically integrates real-time LWD data with a DDP-based decision support system, enhancing predictive models of geology ahead of drilling and leading to better steering decisions. We present a simple yet representative benchmark case and document the performance target achieved by the DISTINGUISH workflow prototype. This benchmark will be a foundation for future methodological advancements and workflow refinements.

A Multimodal Physics-Informed Neural Network Approach for Mean Radiant Temperature Modeling

Authors:Pouya Shaeri, Saud AlKhaled, Ariane Middel
Date:2025-03-11 14:36:08

Outdoor thermal comfort is a critical determinant of urban livability, particularly in hot desert climates where extreme heat poses challenges to public health, energy consumption, and urban planning. Mean Radiant Temperature ($T_{mrt}$) is a key parameter for evaluating outdoor thermal comfort, especially in urban environments where radiation dynamics significantly impact human thermal exposure. Traditional methods of estimating $T_{mrt}$ rely on field measurements and computational simulations, both of which are resource intensive. This study introduces a Physics-Informed Neural Network (PINN) approach that integrates shortwave and longwave radiation modeling with deep learning techniques. By leveraging a multimodal dataset that includes meteorological data, built environment characteristics, and fisheye image-derived shading information, our model enhances predictive accuracy while maintaining physical consistency. Our experimental results demonstrate that the proposed PINN framework outperforms conventional deep learning models, with the best-performing configurations achieving an RMSE of 3.50 and an $R^2$ of 0.88. This approach highlights the potential of physics-informed machine learning in bridging the gap between computational modeling and real-world applications, offering a scalable and interpretable solution for urban thermal comfort assessments.

Soft Actor-Critic-based Control Barrier Adaptation for Robust Autonomous Navigation in Unknown Environments

Authors:Nicholas Mohammad, Nicola Bezzo
Date:2025-03-11 14:33:55

Motion planning failures during autonomous navigation often occur when safety constraints are either too conservative, leading to deadlocks, or too liberal, resulting in collisions. To improve robustness, a robot must dynamically adapt its safety constraints to ensure it reaches its goal while balancing safety and performance measures. To this end, we propose a Soft Actor-Critic (SAC)-based policy for adapting Control Barrier Function (CBF) constraint parameters at runtime, ensuring safe yet non-conservative motion. The proposed approach is designed for a general high-level motion planner, low-level controller, and target system model, and is trained in simulation only. Through extensive simulations and physical experiments, we demonstrate that our framework effectively adapts CBF constraints, enabling the robot to reach its final goal without compromising safety.

Collaborative Dynamic 3D Scene Graphs for Open-Vocabulary Urban Scene Understanding

Authors:Tim Steinke, Martin Büchner, Niclas Vödisch, Abhinav Valada
Date:2025-03-11 14:21:59

Mapping and scene representation are fundamental to reliable planning and navigation in mobile robots. While purely geometric maps using voxel grids allow for general navigation, obtaining up-to-date spatial and semantically rich representations that scale to dynamic large-scale environments remains challenging. In this work, we present CURB-OSG, an open-vocabulary dynamic 3D scene graph engine that generates hierarchical decompositions of urban driving scenes via multi-agent collaboration. By fusing the camera and LiDAR observations from multiple perceiving agents with unknown initial poses, our approach generates more accurate maps compared to a single agent while constructing a unified open-vocabulary semantic hierarchy of the scene. Unlike previous methods that rely on ground truth agent poses or are evaluated purely in simulation, CURB-OSG alleviates these constraints. We evaluate the capabilities of CURB-OSG on real-world multi-agent sensor data obtained from multiple sessions of the Oxford Radar RobotCar dataset. We demonstrate improved mapping and object prediction accuracy through multi-agent collaboration as well as evaluate the environment partitioning capabilities of the proposed approach. To foster further research, we release our code and supplementary material at https://ov-curb.cs.uni-freiburg.de.

Ranking dynamics of urban mobility

Authors:Hao Wang
Date:2025-03-11 13:29:13

Human mobility, a pivotal aspect of urban dynamics, displays a profound and multifaceted relationship with urban sustainability. Despite considerable efforts analyzing mobility patterns over decades, the ranking dynamics of urban mobility has received limited attention. This study aims to contribute to the field by investigating changes in rank and size of hourly inflows to various locations across 60 Chinese cities throughout the day. We find that the rank-size distribution of hourly inflows over the course of the day is stable across cities. To uncover the microdynamics beneath the stable aggregate distribution amidst shifting location inflows, we analyzed consecutive-hour inflow size and ranking variations. Our findings reveal a dichotomy: locations with higher daily average inflow display a clear monotonic trend, with more pronounced increases or decreases in consecutive-hour inflow. In contrast, ranking variations exhibit a non-monotonic pattern, distinguished by the stability of not only the top and bottom rankings but also those in moderately-inflowed locations. Finally, we compare ranking dynamics across cities using a ranking metric, the rank turnover. The results advance our understanding of urban mobility dynamics, providing a basis for applications in urban planning and traffic engineering.

MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model

Authors:Haonan Chen, Junxiao Li, Ruihai Wu, Yiwei Liu, Yiwen Hou, Zhixuan Xu, Jingxiang Guo, Chongkai Gao, Zhenyu Wei, Shensi Xu, Jiaqi Huang, Lin Shao
Date:2025-03-11 12:30:21

Garment folding is a common yet challenging task in robotic manipulation. The deformability of garments leads to a vast state space and complex dynamics, which complicates precise and fine-grained manipulation. Previous approaches often rely on predefined key points or demonstrations, limiting their generalization across diverse garment categories. This paper presents a framework, MetaFold, that disentangles task planning from action prediction, learning each independently to enhance model generalization. It employs language-guided point cloud trajectory generation for task planning and a low-level foundation model for action prediction. This structure facilitates multi-category learning, enabling the model to adapt flexibly to various user instructions and folding tasks. Experimental results demonstrate the superiority of our proposed framework. Supplementary materials are available on our website: https://meta-fold.github.io/.

Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach

Authors:Steeven Janny, Hervé Poirier, Leonid Antsfeld, Guillaume Bono, Gianluca Monaci, Boris Chidlovskii, Francesco Giuliari, Alessio Del Bue, Christian Wolf
Date:2025-03-11 11:16:47

Progress in Embodied AI has made it possible for end-to-end-trained agents to navigate in photo-realistic environments with high-level reasoning and zero-shot or language-conditioned behavior, but benchmarks are still dominated by simulation. In this work, we focus on the fine-grained behavior of fast-moving real robots and present a large-scale experimental study involving \numepisodes{} navigation episodes in a real environment with a physical robot, where we analyze the type of reasoning emerging from end-to-end training. In particular, we study the presence of realistic dynamics which the agent learned for open-loop forecasting, and their interplay with sensing. We analyze the way the agent uses latent memory to hold elements of the scene structure and information gathered during exploration. We probe the planning capabilities of the agent, and find in its memory evidence for somewhat precise plans over a limited horizon. Furthermore, we show in a post-hoc analysis that the value function learned by the agent relates to long-term planning. Put together, our experiments paint a new picture on how using tools from computer vision and sequential decision making have led to new capabilities in robotics and control. An interactive tool is available at europe.naverlabs.com/research/publications/reasoning-in-visual-navigation-of-end-to-end-trained-agents.

Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models

Authors:Ruibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Jürgen Schmidhuber
Date:2025-03-11 10:43:01

Long-form writing agents require flexible integration and interaction across information retrieval, reasoning, and composition. Current approaches rely on predetermined workflows and rigid thinking patterns to generate outlines before writing, resulting in constrained adaptability during writing. In this paper we propose a general agent framework that achieves human-like adaptive writing through recursive task decomposition and dynamic integration of three fundamental task types, i.e. retrieval, reasoning, and composition. Our methodology features: 1) a planning mechanism that interleaves recursive task decomposition and execution, eliminating artificial restrictions on writing workflow; and 2) integration of task types that facilitates heterogeneous task decomposition. Evaluations on both fiction writing and technical report generation show that our method consistently outperforms state-of-the-art approaches across all automatic evaluation metrics, which demonstrate the effectiveness and broad applicability of our proposed framework.

New Co-Simulation Variants for Emissions and Cost Reduction of Sustainable District Heating Planning

Authors:Haozhen Cheng, Verena Buccoliero, Alexander Kocher, Veit Hagenmeyer, Hüseyin K. Çakmak
Date:2025-03-11 09:43:36

Classical heating of residential areas is very energy-intensive, so alternatives are needed, including renewable energies and advanced heating technologies. Thus, the present paper introduces a new methodology for comprehensive variant analysis for future district heating planning, aiming at optimizing emissions and costs. For this, an extensive Modelica-based modeling study comprising models of heating center, heat grid pipelines and heating interface units to buildings are coupled in co-simulations. These enable a comparative analysis of the economic feasibility and sustainability for various technologies and energy carriers to be carried out. The new modular and highly parameterizable building model serves for validation of the introduced heat grid model. The results show that bio-methane as an energy source reduces carbon equivalent emissions by nearly 70% compared to conventional natural gas heating, and the use of hydrogen as an energy source reduces carbon equivalent emissions by 77% when equipped with a heat pump. In addition, the use of ground source heat pumps has a high economic viability when economic benefits are taken into account. The study findings highlight the importance of strategic planning and flexible design in the early stages of district development in order to achieve improved energy efficiency and a reduced carbon footprint.

Construction and Control of Validated Highly Configurable Multi-Physics Building Models for Multi-Energy System Analysis in a Co-Simulation Setup

Authors:Haozhen Cheng, Jan Stock, André Xhonneux, Hüseyin K. Çakmak, Veit Hagenmeyer
Date:2025-03-11 09:30:16

Improving energy efficiency by monitoring system behavior and predicting future energy scenarios in light of increased penetration of renewable energy sources are becoming increasingly important, especially for energy systems that distribute and provide heat. On this background, digital twins of cities become paramount in advancing urban energy system planning and infrastructure management. The use of recorded energy data from sensors in district digital twins in collaborative co-simulation platforms is a promising way to analyze detailed system behavior and estimate future scenarios. However, the development and coupling of multi-physics energy system models need to be validated before they can be used for further in-depth analyses. In the present paper, a new multi-physics/-modal and highly configurable building model is presented. Its accuracy and reliability are validated by comparison with data from the TABULA project, ensuring its relevance and applicability to real-world scenarios. The modularity and flexibility with regard to the system configurability of the developed building model is evaluated on various real building types. In addition, the applicability of the building model in a multi-energy system is highlighted by implementing the model in a collaborative co-simulation setup and by coupling it to a district heating grid model in yearly co-simulations. The simulation results for the proposed multi-physical/-modal building modeling concept show a very high level of agreement compared to published reference building data and can therefore be used individually as flexible and modular building models including both thermal and electrical systems for future sector-coupled energy system analyses in view of sustainability.

Control Barrier Functions for Prescribed-time Reach-Avoid-Stay Tasks using Spatiotemporal Tubes

Authors:Ratnangshu Das, Pranav Bakshi, Pushpak Jagtap
Date:2025-03-11 07:15:49

Prescribed-time reach-avoid-stay (PT-RAS) specifications are crucial in applications requiring precise timing, state constraints, and safety guarantees. While control carrier functions (CBFs) have emerged as a promising approach, providing formal guarantees of safety, constructing CBFs that satisfy PT-RAS specifications remains challenging. In this paper, we present a novel approach using a spatiotemporal tubes (STTs) framework to construct CBFs for PT-RAS tasks. The STT framework allows for the systematic design of CBFs that dynamically manage both spatial and temporal constraints, ensuring the system remains within a safe operational envelope while achieving the desired temporal objectives. The proposed method is validated with two case studies: temporal motion planning of an omnidirectional robot and temporal waypoint navigation of a drone with obstacles, using higher-order CBFs.

LATMOS: Latent Automaton Task Model from Observation Sequences

Authors:Weixiao Zhan, Qiyue Dong, Eduardo Sebastián, Nikolay Atanasov
Date:2025-03-11 06:46:49

Robot task planning from high-level instructions is an important step towards deploying fully autonomous robot systems in the service sector. Three key aspects of robot task planning present challenges yet to be resolved simultaneously, namely, (i) factorization of complex tasks specifications into simpler executable subtasks, (ii) understanding of the current task state from raw observations, and (iii) planning and verification of task executions. To address these challenges, we propose LATMOS, an automata-inspired task model that, given observations from correct task executions, is able to factorize the task, while supporting verification and planning operations. LATMOS combines an observation encoder to extract the features from potentially high-dimensional observations with automata theory to learn a sequential model that encapsulates an automaton with symbols in the latent feature space. We conduct extensive evaluations in three task model learning setups: (i) abstract tasks described by logical formulas, (ii) real-world human tasks described by videos and natural language prompts and (iii) a robot task described by image and state observations. The results demonstrate the improved plan generation and verification capabilities of LATMOS across observation modalities and tasks.

Electrifying Heavy-Duty Trucks: Battery-Swapping vs Fast Charging

Authors:Ruiting Wang, Antoine Martinez, Zaid Allybokus, Wente Zeng, Nicolas Obrecht, Scott Moura
Date:2025-03-11 06:23:28

The advantages and disadvantages of Battery Swapping Stations (BSS) for heavy-duty trucks are poorly understood, relative to Fast Charging Stations (FCS) systems. This study evaluates these two charging mechanisms for electric heavy-duty trucks, aiming to compare the systems' efficiency and identify the optimal design for each option. A model was developed to address the planning and operation of BSS in a charging network, considering in-station batteries as assets for various services. We assess performance metrics including transportation efficiency and battery utilization efficiency. Our evaluation reveals that BSS significantly increased transportation efficiency by reducing vehicle downtime compared to fast charging, but may require more batteries. BSS with medium-sized batteries offers improved transportation efficiency in terms of time and labor. FCS-reliant trucks require larger batteries to compensate for extended charging times. To understand the trade-off between these two metrics, a cost-benefit analysis was performed under different scenarios involving potential shifts in battery prices and labor costs. Additionally, BSS shows potential for significant $\text{CO}_2$ emission reductions and increased profitability through energy arbitrage and grid ancillary services. These findings emphasize the importance of integrating BSS into future electric truck charging networks and adopting carbon-aware operational frameworks.

Elastic Motion Policy: An Adaptive Dynamical System for Robust and Efficient One-Shot Imitation Learning

Authors:Tianyu Li, Sunan Sun, Shubhodeep Shiv Aditya, Nadia Figueroa
Date:2025-03-11 04:23:29

Behavior cloning (BC) has become a staple imitation learning paradigm in robotics due to its ease of teaching robots complex skills directly from expert demonstrations. However, BC suffers from an inherent generalization issue. To solve this, the status quo solution is to gather more data. Yet, regardless of how much training data is available, out-of-distribution performance is still sub-par, lacks any formal guarantee of convergence and success, and is incapable of allowing and recovering from physical interactions with humans. These are critical flaws when robots are deployed in ever-changing human-centric environments. Thus, we propose Elastic Motion Policy (EMP), a one-shot imitation learning framework that allows robots to adjust their behavior based on the scene change while respecting the task specification. Trained from a single demonstration, EMP follows the dynamical systems paradigm where motion planning and control are governed by first-order differential equations with convergence guarantees. We leverage Laplacian editing in full end-effector space, $\mathbb{R}^3\times SO(3)$, and online convex learning of Lyapunov functions, to adapt EMP online to new contexts, avoiding the need to collect new demonstrations. We extensively validate our framework in real robot experiments, demonstrating its robust and efficient performance in dynamic environments, with obstacle avoidance and multi-step task capabilities. Project Website: https://elastic-motion-policy.github.io/EMP/

Boundary Prompting: Elastic Urban Region Representation via Graph-based Spatial Tokenization

Authors:Haojia Zhu, Jiahui Jin, Dong Kan, Rouxi Shen, Ruize Wang, Xiangguo Sun, Jinghui Zhang
Date:2025-03-11 02:49:58

Urban region representation is essential for various applications such as urban planning, resource allocation, and policy development. Traditional methods rely on fixed, predefined region boundaries, which fail to capture the dynamic and complex nature of real-world urban areas. In this paper, we propose the Boundary Prompting Urban Region Representation Framework (BPURF), a novel approach that allows for elastic urban region definitions. BPURF comprises two key components: (1) A spatial token dictionary, where urban entities are treated as tokens and integrated into a unified token graph, and (2) a region token set representation model which utilize token aggregation and a multi-channel model to embed token sets corresponding to region boundaries. Additionally, we propose fast token set extraction strategy to enable online token set extraction during training and prompting. This framework enables the definition of urban regions through boundary prompting, supporting varying region boundaries and adapting to different tasks. Extensive experiments demonstrate the effectiveness of BPURF in capturing the complex characteristics of urban regions.

HEATS: A Hierarchical Framework for Efficient Autonomous Target Search with Mobile Manipulators

Authors:Hao Zhang, Yifei Wang, Weifan Zhang, Yu Wang, Haoyao Chen
Date:2025-03-11 02:41:03

Utilizing robots for autonomous target search in complex and unknown environments can greatly improve the efficiency of search and rescue missions. However, existing methods have shown inadequate performance due to hardware platform limitations, inefficient viewpoint selection strategies, and conservative motion planning. In this work, we propose HEATS, which enhances the search capability of mobile manipulators in complex and unknown environments. We design a target viewpoint planner tailored to the strengths of mobile manipulators, ensuring efficient and comprehensive viewpoint planning. Supported by this, a whole-body motion planner integrates global path search with local IPC optimization, enabling the mobile manipulator to safely and agilely visit target viewpoints, significantly improving search performance. We present extensive simulated and real-world tests, in which our method demonstrates reduced search time, higher target search completeness, and lower movement cost compared to classic and state-of-the-art approaches. Our method will be open-sourced for community benefit.

From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans

Authors:Qinji Yu, Yirui Wang, Ke Yan, Dandan Zheng, Dashan Ai, Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Yun Bian, Na Shen, Xiaowei Ding, Le Lu, Xianghua Ye, Dakai Jin
Date:2025-03-11 00:22:05

Lymph node (LN) assessment is an essential task in the routine radiology workflow, providing valuable insights for cancer staging, treatment planning and beyond. Identifying scatteredly-distributed and low-contrast LNs in 3D CT scans is highly challenging, even for experienced clinicians. Previous lesion and LN detection methods demonstrate effectiveness of 2.5D approaches (i.e, using 2D network with multi-slice inputs), leveraging pretrained 2D model weights and showing improved accuracy as compared to separate 2D or 3D detectors. However, slice-based 2.5D detectors do not explicitly model inter-slice consistency for LN as a 3D object, requiring heuristic post-merging steps to generate final 3D LN instances, which can involve tuning a set of parameters for each dataset. In this work, we formulate 3D LN detection as a tracking task and propose LN-Tracker, a novel LN tracking transformer, for joint end-to-end detection and 3D instance association. Built upon DETR-based detector, LN-Tracker decouples transformer decoder's query into the track and detection groups, where the track query autoregressively follows previously tracked LN instances along the z-axis of a CT scan. We design a new transformer decoder with masked attention module to align track query's content to the context of current slice, meanwhile preserving detection query's high accuracy in current slice. An inter-slice similarity loss is introduced to encourage cohesive LN association between slices. Extensive evaluation on four lymph node datasets shows LN-Tracker's superior performance, with at least 2.7% gain in average sensitivity when compared to other top 3D/2.5D detectors. Further validation on public lung nodule and prostate tumor detection tasks confirms the generalizability of LN-Tracker as it achieves top performance on both tasks. Datasets will be released upon acceptance.

Evaluating Path Planning Strategies for Efficient Nitrate Sampling in Crop Rows

Authors:Ruiji Liu, Abigail Breitfeld, Srinivasan Vijayarangan, George Kantor, Francisco Yandun
Date:2025-03-10 21:00:59

This paper presents a pipeline that combines high-resolution orthomosaic maps generated from UAS imagery with GPS-based global navigation to guide a skid-steered ground robot. We evaluated three path planning strategies: A* Graph search, Deep Q-learning (DQN) model, and Heuristic search, benchmarking them on planning time and success rate in realistic simulation environments. Experimental results reveal that the Heuristic search achieves the fastest planning times (0.28 ms) and a 100% success rate, while the A* approach delivers near-optimal performance, and the DQN model, despite its adaptability, incurs longer planning delays and occasional suboptimal routing. These results highlight the advantages of deterministic rule-based methods in geometrically constrained crop-row environments and lay the groundwork for future hybrid strategies in precision agriculture.

Safe Explicable Policy Search

Authors:Akkamahadevi Hanni, Jonathan Montaño, Yu Zhang
Date:2025-03-10 20:52:41

When users work with AI agents, they form conscious or subconscious expectations of them. Meeting user expectations is crucial for such agents to engage in successful interactions and teaming. However, users may form expectations of an agent that differ from the agent's planned behaviors. These differences lead to the consideration of two separate decision models in the planning process to generate explicable behaviors. However, little has been done to incorporate safety considerations, especially in a learning setting. We present Safe Explicable Policy Search (SEPS), which aims to provide a learning approach to explicable behavior generation while minimizing the safety risk, both during and after learning. We formulate SEPS as a constrained optimization problem where the agent aims to maximize an explicability score subject to constraints on safety and a suboptimality criterion based on the agent's model. SEPS innovatively combines the capabilities of Constrained Policy Optimization and Explicable Policy Search. We evaluate SEPS in safety-gym environments and with a physical robot experiment to show that it can learn explicable behaviors that adhere to the agent's safety requirements and are efficient. Results show that SEPS can generate safe and explicable behaviors while ensuring a desired level of performance w.r.t. the agent's objective, and has real-world relevance in human-AI teaming.

Operational route planning under uncertainty for Demand Adaptive Systems

Authors:Benedikt Lienkamp, Mike Hewitt, Axel Parmentier, Maximilian Schiffer
Date:2025-03-10 19:52:05

With an increasing need for more flexible mobility services, we consider an operational problem arising in the planning of Demand Adaptive Systems (DAS). Motivated by the decision of whether to accept or reject passenger requests in real time in a DAS, we introduce the operational route planning problem of DASs. To this end, we propose an algorithmic framework that allows an operator to plan which passengers to serve in a DAS in real-time. To do so, we model the operational route planning problem as a Markov decision process (MDP) and utilize a rolling horizon approach to approximate the MDP via a two-stage stochastic program in each timestep to decide on the next action. Furthermore, we determine the deterministic equivalent of our approximation through sample-based approximation. This allows us to decompose the deterministic equivalent of our two-stage stochastic program into several full information planning problems, which can be solved in parallel efficiently. Additionally, we propose a consensus-based heuristic and a myopic approach. We perform extensive numerical studies based on real-world data provided to us by the public transportation provider of Munich, Germany. We show that our exact decomposition yields the best results in under five seconds, and our heuristic approach reduces the serial computation time by 17 - 57% compared to our exact decomposition, with a solution quality decline of less than one percent. From a managerial perspective, we show that by switching a fixed-line bus route to a DAS, an operator can increase profit by up to 49% and the number of served passengers by up to 35% while only increasing the travel distance of the bus by 14%. Furthermore, we show that an operator can reduce their cost per passenger by 43 - 51% by increasing route flexibility and that incentivizing passengers to walk slightly longer distances reduces the cost per passenger by 83-85%.

Multi-layer Motion Planning with Kinodynamic and Spatio-Temporal Constraints

Authors:Jeel Chatrola, Abhiroop Ajith, Kevin Leahy, Constantinos Chamzas
Date:2025-03-10 18:34:08

We propose a novel, multi-layered planning approach for computing paths that satisfy both kinodynamic and spatiotemporal constraints. Our three-part framework first establishes potential sequences to meet spatial constraints, using them to calculate a geometric lead path. This path then guides an asymptotically optimal sampling-based kinodynamic planner, which minimizes an STL-robustness cost to jointly satisfy spatiotemporal and kinodynamic constraints. In our experiments, we test our method with a velocity-controlled Ackerman-car model and demonstrate significant efficiency gains compared to prior art. Additionally, our method is able to generate complex path maneuvers, such as crossovers, something that previous methods had not demonstrated.

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

Authors:Bo Jiang, Shaoyu Chen, Qian Zhang, Wenyu Liu, Xinggang Wang
Date:2025-03-10 17:59:42

OpenAI o1 and DeepSeek R1 achieve or even surpass human expert-level performance in complex domains like mathematics and science, with reinforcement learning (RL) and reasoning playing a crucial role. In autonomous driving, recent end-to-end models have greatly improved planning performance but still struggle with long-tailed problems due to limited common sense and reasoning abilities. Some studies integrate vision-language models (VLMs) into autonomous driving, but they typically rely on pre-trained models with simple supervised fine-tuning (SFT) on driving data, without further exploration of training strategies or optimizations specifically tailored for planning. In this paper, we propose AlphaDrive, a RL and reasoning framework for VLMs in autonomous driving. AlphaDrive introduces four GRPO-based RL rewards tailored for planning and employs a two-stage planning reasoning training strategy that combines SFT with RL. As a result, AlphaDrive significantly improves both planning performance and training efficiency compared to using only SFT or without reasoning. Moreover, we are also excited to discover that, following RL training, AlphaDrive exhibits some emergent multimodal planning capabilities, which is critical for improving driving safety and efficiency. To the best of our knowledge, AlphaDrive is the first to integrate GRPO-based RL with planning reasoning into autonomous driving. Code will be released to facilitate future research.

A Task and Motion Planning Framework Using Iteratively Deepened AND/OR Graph Networks

Authors:Hossein Karami, Antony Thomas, Fulvio Mastrogiovanni
Date:2025-03-10 17:28:22

In this paper, we present an approach for integrated task and motion planning based on an AND/OR graph network, which is used to represent task-level states and actions, and we leverage it to implement different classes of task and motion planning problems (TAMP). Several problems that fall under task and motion planning do not have a predetermined number of sub-tasks to achieve a goal. For example, while retrieving a target object from a cluttered workspace, in principle the number of object re-arrangements required to finally grasp it cannot be known ahead of time. To address this challenge, and in contrast to traditional planners, also those based on AND/OR graphs, we grow the AND/OR graph at run-time by progressively adding sub-graphs until grasping the target object becomes feasible, which yields a network of AND/OR graphs. The approach is extended to enable multi-robot task and motion planning, and (i) it allows us to perform task allocation while coordinating the activity of a given number of robots, and (ii) can handle multi-robot tasks involving an a priori unknown number of sub-tasks. The approach is evaluated and validated both in simulation and with a real dual-arm robot manipulator, that is, Baxter from Rethink Robotics. In particular, for the single-robot task and motion planning, we validated our approach in three different TAMP domains. Furthermore, we also use three different robots for simulation, namely, Baxter, Franka Emika Panda manipulators, and a PR2 robot. Experiments show that our approach can be readily scaled to scenarios with many objects and robots, and is capable of handling different classes of TAMP problems.

PIPE Planner: Pathwise Information Gain with Map Predictions for Indoor Robot Exploration

Authors:Seungjae Baek, Brady Moon, Seungchan Kim, Muqing Cao, Cherie Ho, Sebastian Scherer, Jeong hwan Jeon
Date:2025-03-10 16:27:00

Autonomous exploration in unknown environments requires estimating the information gain of an action to guide planning decisions. While prior approaches often compute information gain at discrete waypoints, pathwise integration offers a more comprehensive estimation but is often computationally challenging or infeasible and prone to overestimation. In this work, we propose the Pathwise Information Gain with Map Prediction for Exploration (PIPE) planner, which integrates cumulative sensor coverage along planned trajectories while leveraging map prediction to mitigate overestimation. To enable efficient pathwise coverage computation, we introduce a method to efficiently calculate the expected observation mask along the planned path, significantly reducing computational overhead. We validate PIPE on real-world floorplan datasets, demonstrating its superior performance over state-of-the-art baselines. Our results highlight the benefits of integrating predictive mapping with pathwise information gain for efficient and informed exploration.