planning - 2025-03-11

Force Aware Branch Manipulation To Assist Agricultural Tasks

Authors:Madhav Rijal, Rashik Shrestha, Trevor Smith, Yu Gu
Date:2025-03-10 16:13:47

This study presents a methodology to safely manipulate branches to aid various agricultural tasks. Humans in a real agricultural environment often manipulate branches to perform agricultural tasks effectively, but current agricultural robots lack this capability. This proposed strategy to manipulate branches can aid in different precision agriculture tasks, such as fruit picking in dense foliage, pollinating flowers under occlusion, and moving overhanging vines and branches for navigation. The proposed method modifies RRT* to plan a path that satisfies the branch geometric constraints and obeys branch deformable characteristics. Re-planning is done to obtain a path that helps the robot exert force within a desired range so that branches are not damaged during manipulation. Experimentally, this method achieved a success rate of 78\% across 50 trials, successfully moving a branch from different starting points to a target region.

A Review on Geometry and Surface Inspection in 3D Concrete Printing

Authors:K. Mawas, M. Maboudi, M. Gerke
Date:2025-03-10 15:48:17

Given the substantial growth in the use of additive manufacturing in construction (AMC), it is necessary to ensure the quality of printed specimens which can be much more complex than conventionally manufactured parts. This study explores the various aspects of geometry and surface quality control for 3D concrete printing (3DCP), with a particular emphasis on deposition-based methods, namely extrusion and shotcrete 3D printing (SC3DP). A comprehensive overview of existing quality control (QC) methods and strategies is provided and preceded by an in-depth discussion. Four categories of data capture technologies are investigated and their advantages and limitations in the context of AMC are discussed. Additionally, the effects of environmental conditions and objects' properties on data capture are also analyzed. The study extends to automated data capture planning methods for different sensors. Furthermore, various quality control strategies are explored across different stages of the fabrication cycle of the printed object including: (i) During printing, (ii) Layer-wise, (iii) Preassembly, and (iv) Assembly. In addition to reviewing the methods already applied in AMC, we also address various research gaps and future trends and highlight potential methodologies from adjacent domains that could be transferred to AMC.

CATPlan: Loss-based Collision Prediction in End-to-End Autonomous Driving

Authors:Ziliang Xiong, Shipeng Liu, Nathaniel Helgesen, Joakim Johnander, Per-Erik Forssen
Date:2025-03-10 15:10:40

In recent years, there has been increased interest in the design, training, and evaluation of end-to-end autonomous driving (AD) systems. One often overlooked aspect is the uncertainty of planned trajectories predicted by these systems, despite awareness of their own uncertainty being key to achieve safety and robustness. We propose to estimate this uncertainty by adapting loss prediction from the uncertainty quantification literature. To this end, we introduce a novel light-weight module, dubbed CATPlan, that is trained to decode motion and planning embeddings into estimates of the collision loss used to partially supervise end-to-end AD systems. During inference, these estimates are interpreted as collision risk. We evaluate CATPlan on the safety-critical, nerf-based, closed-loop benchmark NeuroNCAP and find that it manages to detect collisions with a $54.8\%$ relative improvement to average precision over a GMM-based baseline in which the predicted trajectory is compared to the forecasted trajectories of other road users. Our findings indicate that the addition of CATPlan can lead to safer end-to-end AD systems and hope that our work will spark increased interest in uncertainty quantification for such systems.

PER-DPP Sampling Framework and Its Application in Path Planning

Authors:Junzhe Wang
Date:2025-03-10 14:58:16

Autonomous navigation in intelligent mobile systems represents a core research focus within artificial intelligence-driven robotics. Contemporary path planning approaches face constraints in dynamic environmental responsiveness and multi-objective task scalability, limiting their capacity to address growing intelligent operation requirements. Decision-centric reinforcement learning frameworks, capitalizing on their unique strengths in adaptive environmental interaction and self-optimization, have gained prominence in advanced control system research. This investigation introduces methodological improvements to address sample homogeneity challenges in reinforcement learning experience replay mechanisms. By incorporating determinant point processes (DPP) for diversity assessment, we develop a dual-criteria sampling framework with adaptive selection protocols. This approach resolves representation bias in conventional prioritized experience replay (PER) systems while preserving algorithmic interoperability, offering improved decision optimization for dynamic operational scenarios. Key contributions comprise: Develop a hybrid sampling paradigm (PER-DPP) combining priority sequencing with diversity maximization.Based on this,create an integrated optimization scheme (PER-DPP-Elastic DQN) merging diversity-aware sampling with adaptive step-size regulation. Comparative simulations in 2D navigation scenarios demonstrate that the elastic step-size component temporarily delays initial convergence speed but synergistically enhances final-stage optimization with PER-DPP integration. The synthesized method generates navigation paths with optimized length efficiency and directional stability.

Temporal Triplane Transformers as Occupancy World Models

Authors:Haoran Xu, Peixi Peng, Guang Tan, Yiqian Chang, Yisen Zhao, Yonghong Tian
Date:2025-03-10 13:50:23

Recent years have seen significant advances in world models, which primarily focus on learning fine-grained correlations between an agent's motion trajectory and the resulting changes in its surrounding environment. However, existing methods often struggle to capture such fine-grained correlations and achieve real-time predictions. To address this, we propose a new 4D occupancy world model for autonomous driving, termed T$^3$Former. T$^3$Former begins by pre-training a compact triplane representation that efficiently compresses the 3D semantically occupied environment. Next, T$^3$Former extracts multi-scale temporal motion features from the historical triplane and employs an autoregressive approach to iteratively predict the next triplane changes. Finally, T$^3$Former combines the triplane changes with the previous ones to decode them into future occupancy results and ego-motion trajectories. Experimental results demonstrate the superiority of T$^3$Former, achieving 1.44$\times$ faster inference speed (26 FPS), while improving the mean IoU to 36.09 and reducing the mean absolute planning error to 1.0 meters.

Multi-Robot System for Cooperative Exploration in Unknown Environments: A Survey

Authors:Chuqi Wang, Chao Yu, Xin Xu, Yuman Gao, Xinyi Yang, Wenhao Tang, Shu'ang Yu, Yinuo Chen, Feng Gao, ZhuoZhu Jian, Xinlei Chen, Fei Gao, Boyu Zhou, Yu Wang
Date:2025-03-10 12:58:45

With the advancement of multi-robot technology, cooperative exploration tasks have garnered increasing attention. This paper presents a comprehensive review of multi-robot cooperative exploration systems. First, we review the evolution of robotic exploration and introduce a modular research framework tailored for multi-robot cooperative exploration. Based on this framework, we systematically categorize and summarize key system components. As a foundational module for multi-robot exploration, the localization and mapping module is primarily introduced by focusing on global and relative pose estimation, as well as multi-robot map merging techniques. The cooperative motion module is further divided into learning-based approaches and multi-stage planning, with the latter encompassing target generation, task allocation, and motion planning strategies. Given the communication constraints of real-world environments, we also analyze the communication module, emphasizing how robots exchange information within local communication ranges and under limited transmission capabilities. Finally, we discuss the challenges and future research directions for multi-robot cooperative exploration in light of real-world trends. This review aims to serve as a valuable reference for researchers and practitioners in the field.

Learning and planning for optimal synergistic human-robot coordination in manufacturing contexts

Authors:Samuele Sandrini, Marco Faroni, Nicola Pedrocchi
Date:2025-03-10 12:20:29

Collaborative robotics cells leverage heterogeneous agents to provide agile production solutions. Effective coordination is essential to prevent inefficiencies and risks for human operators working alongside robots. This paper proposes a human-aware task allocation and scheduling model based on Mixed Integer Nonlinear Programming to optimize efficiency and safety starting from task planning stages. The approach exploits synergies that encode the coupling effects between pairs of tasks executed in parallel by the agents, arising from the safety constraints imposed on robot agents. These terms are learned from previous executions using a Bayesian estimation; the inference of the posterior probability distribution of the synergy coefficients is performed using the Markov Chain Monte Carlo method. The synergy enhances task planning by adapting the nominal duration of the plan according to the effect of the operator's presence. Simulations and experimental results demonstrate that the proposed method produces improved human-aware task plans, reducing unuseful interference between agents, increasing human-robot distance, and achieving up to an 18\% reduction in process execution time.

Discrete Gaussian Process Representations for Optimising UAV-based Precision Weed Mapping

Authors:Jacob Swindell, Madeleine Darbyshire, Marija Popovic, Riccardo Polvara
Date:2025-03-10 11:50:15

Accurate agricultural weed mapping using UAVs is crucial for precision farming applications. Traditional methods rely on orthomosaic stitching from rigid flight paths, which is computationally intensive and time-consuming. Gaussian Process (GP)-based mapping offers continuous modelling of the underlying variable (i.e. weed distribution) but requires discretisation for practical tasks like path planning or visualisation. Current implementations often default to quadtrees or gridmaps without systematically evaluating alternatives. This study compares five discretisation methods: quadtrees, wedgelets, top-down binary space partition (BSP) trees using least square error (LSE), bottom-up BSP trees using graph merging, and variable-resolution hexagonal grids. Evaluations on real-world weed distributions measure visual similarity, mean squared error (MSE), and computational efficiency. Results show quadtrees perform best overall, but alternatives excel in specific scenarios: hexagons or BSP LSE suit fields with large, dominant weed patches, while quadtrees are optimal for dispersed small-scale distributions. These findings highlight the need to tailor discretisation approaches to weed distribution patterns (patch size, density, coverage) rather than relying on default methods. By choosing representations based on the underlying distribution, we can improve mapping accuracy and efficiency for precision agriculture applications.

Generative AI in Transportation Planning: A Survey

Authors:Longchao Da, Tiejin Chen, Zhuoheng Li, Shreyas Bachiraju, Huaiyuan Yao, Xiyang Hu, Zhengzhong Tu, Yue Zhao, Dongjie Wang, Xuanyu, Zhou, Ram Pendyala, Benjamin Stabler, Yezhou Yang, Xuesong Zhou, Hua Wei
Date:2025-03-10 10:33:31

The integration of generative artificial intelligence (GenAI) into transportation planning has the potential to revolutionize tasks such as demand forecasting, infrastructure design, policy evaluation, and traffic simulation. However, there is a critical need for a systematic framework to guide the adoption of GenAI in this interdisciplinary domain. In this survey, we, a multidisciplinary team of researchers spanning computer science and transportation engineering, present the first comprehensive framework for leveraging GenAI in transportation planning. Specifically, we introduce a new taxonomy that categorizes existing applications and methodologies into two perspectives: transportation planning tasks and computational techniques. From the transportation planning perspective, we examine the role of GenAI in automating descriptive, predictive, generative, simulation, and explainable tasks to enhance mobility systems. From the computational perspective, we detail advancements in data preparation, domain-specific fine-tuning, and inference strategies, such as retrieval-augmented generation and zero-shot learning tailored to transportation applications. Additionally, we address critical challenges, including data scarcity, explainability, bias mitigation, and the development of domain-specific evaluation frameworks that align with transportation goals like sustainability, equity, and system efficiency. This survey aims to bridge the gap between traditional transportation planning methodologies and modern AI techniques, fostering collaboration and innovation. By addressing these challenges and opportunities, we seek to inspire future research that ensures ethical, equitable, and impactful use of generative AI in transportation planning.

Hierarchical Neuro-Symbolic Decision Transformer

Authors:Ali Baheri, Cecilia O. Alm
Date:2025-03-10 10:22:13

We present a hierarchical neuro-symbolic control framework that couples classical symbolic planning with transformer-based policies to address complex, long-horizon decision-making tasks. At the high level, a symbolic planner constructs an interpretable sequence of operators based on logical propositions, ensuring systematic adherence to global constraints and goals. At the low level, each symbolic operator is translated into a sub-goal token that conditions a decision transformer to generate a fine-grained sequence of actions in uncertain, high-dimensional environments. We provide theoretical analysis showing how approximation errors from both the symbolic planner and the neural execution layer accumulate. Empirical evaluations in grid-worlds with multiple keys, locked doors, and item-collection tasks show that our hierarchical approach outperforms purely end-to-end neural approach in success rates and policy efficiency.

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation

Authors:Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger
Date:2025-03-10 10:04:58

Future robots are envisioned as versatile systems capable of performing a variety of household tasks. The big question remains, how can we bridge the embodiment gap while minimizing physical robot learning, which fundamentally does not scale well. We argue that learning from in-the-wild human videos offers a promising solution for robotic manipulation tasks, as vast amounts of relevant data already exist on the internet. In this work, we present VidBot, a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos. VidBot leverages a pipeline to extract explicit representations from them, namely 3D hand trajectories from videos, combining a depth foundation model with structure-from-motion techniques to reconstruct temporally consistent, metric-scale 3D affordance representations agnostic to embodiments. We introduce a coarse-to-fine affordance learning model that first identifies coarse actions from the pixel space and then generates fine-grained interaction trajectories with a diffusion model, conditioned on coarse actions and guided by test-time constraints for context-aware interaction planning, enabling substantial generalization to novel scenes and embodiments. Extensive experiments demonstrate the efficacy of VidBot, which significantly outperforms counterparts across 13 manipulation tasks in zero-shot settings and can be seamlessly deployed across robot systems in real-world environments. VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.

Learning Decision Trees as Amortized Structure Inference

Authors:Mohammed Mahfoud, Ghait Boukachab, MichaƂ Koziarski, Alex Hernandez-Garcia, Stefan Bauer, Yoshua Bengio, Nikolay Malkin
Date:2025-03-10 07:05:07

Building predictive models for tabular data presents fundamental challenges, notably in scaling consistently, i.e., more resources translating to better performance, and generalizing systematically beyond the training data distribution. Designing decision tree models remains especially challenging given the intractably large search space, and most existing methods rely on greedy heuristics, while deep learning inductive biases expect a temporal or spatial structure not naturally present in tabular data. We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data, formulating decision tree construction as a sequential planning problem. We train a deep reinforcement learning (GFlowNet) policy to solve this problem, yielding a generative model that samples decision trees from the Bayesian posterior. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks derived from real-world data, robustness to distribution shifts, and anomaly detection, all while yielding interpretable models with shorter description lengths. Samples from the trained DT-GFN model can be ensembled to construct a random forest, and we further show that the performance of scales consistently in ensemble size, yielding ensembles of predictors that continue to generalize systematically.

EnCortex: A General, Extensible and Scalable Framework for Decision Management in New-age Energy Systems

Authors:Millend Roy, Vaibhav Balloli, Anupam Sobti, Srinivasan Iyengar, Shivkumar Kalyanaraman, Tanuja Ganu, Akshay Nambi
Date:2025-03-10 06:13:23

With increased global warming, there has been a significant emphasis to replace fossil fuel-dependent energy sources with clean, renewable sources. These new-age energy systems are becoming more complex with an increasing proportion of renewable energy sources (like solar and wind), energy storage systems (like batteries), and demand side control in the mix. Most new-age sources being highly dependent on weather and climate conditions bring about high variability and uncertainty. Energy operators rely on such uncertain data to make different planning and operations decisions periodically, and sometimes in real-time, to maintain the grid stability and optimize their objectives (cost savings, carbon footprint, etc.). Hitherto, operators mostly rely on domain knowledge, heuristics, or solve point problems to take decisions. These approaches fall short because of their specific assumptions and limitations. Further, there is a lack of a unified framework for both research and production environments at scale. In this paper, we propose EnCortex to address these challenges. EnCortex provides a general, easy-to-use, extensible, and scalable energy decision framework that enables operators to plan, build and execute their real-world scenarios efficiently. We show that using EnCortex, we can define and compose complex new-age scenarios, owing to industry-standard abstractions of energy entities and the modularity of the framework. EnCortex provides a foundational structure to support several state-of-the-art optimizers with minimal effort. EnCortex supports both quick developments for research prototypes and scaling the solutions to production environments. We demonstrate the utility of EnCortex with three complex new-age real-world scenarios and show that significant cost and carbon footprint savings can be achieved.

Handle Object Navigation as Weighted Traveling Repairman Problem

Authors:Ruimeng Liu, Xinhang Xu, Shenghai Yuan, Lihua Xie
Date:2025-03-10 05:32:45

Zero-Shot Object Navigation (ZSON) requires agents to navigate to objects specified via open-ended natural language without predefined categories or prior environmental knowledge. While recent methods leverage foundation models or multi-modal maps, they often rely on 2D representations and greedy strategies or require additional training or modules with high computation load, limiting performance in complex environments and real applications. We propose WTRP-Searcher, a novel framework that formulates ZSON as a Weighted Traveling Repairman Problem (WTRP), minimizing the weighted waiting time of viewpoints. Using a Vision-Language Model (VLM), we score viewpoints based on object-description similarity, projected onto a 2D map with depth information. An open-vocabulary detector identifies targets, dynamically updating goals, while a 3D embedding feature map enhances spatial awareness and environmental recall. WTRP-Searcher outperforms existing methods, offering efficient global planning and improved performance in complex ZSON tasks. Code and more demos will be avaliable on https://github.com/lrm20011/WTRP_Searcher.

Co-optimization of Short- and Long-term Decisions for the Transmission Grid's Resilience to Flooding

Authors:Ashutosh Shukla, Erhan Kutanoglu, John Hasenbein
Date:2025-03-10 04:23:36

We present and analyze a three-stage stochastic optimization model that integrates output from a geoscience-based flood model with a power flow model for transmission grid resilience planning against flooding. The proposed model coordinates the decisions made across multiple stages of resilience planning and recommends an optimal allocation of the overall resilience investment budget across short- and long-term measures. While doing so, the model balances the cost of investment in both short- and long-term measures against the cost of load shed that results from unmitigated flooding forcing grid components go out-of-service. We also present a case study for the Texas Gulf Coast region to demonstrate how the proposed model can provide insights into various grid resilience questions. Specifically, we demonstrate that for a comprehensive yet reasonable range of economic values assigned to load loss, we should make significant investments in the permanent hardening of substations such that we achieve near-zero load shed. We also show that not accounting for short-term measures while making decisions about long-term measures can lead to significant overspending. Furthermore, we demonstrate that a technological development enabling to protect substations on short notice before imminent hurricanes could vastly influence and reduce the total investment budget that would otherwise be allocated for more expensive substation hardening. Lastly, we also show that for a wide range of values associated with the cost of mitigative long-term measures, the proportion allocated to such measures dominates the overall resilience spending.

Unlocking Generalization for Robotics via Modularity and Scale

Authors:Murtaza Dalal
Date:2025-03-10 00:38:31

How can we build generalist robot systems? Scale may not be enough due to the significant multimodality of robotics tasks, lack of easily accessible data and the challenges of deploying on physical hardware. Meanwhile, most deployed robotic systems today are inherently modular and can leverage the independent generalization capabilities of each module to perform well. Therefore, this thesis seeks to tackle the task of building generalist robot agents by integrating these components into one: combining modularity with large-scale learning for general purpose robot control. The first question we consider is: how can we build modularity and hierarchy into learning systems? Our key insight is that rather than having the agent learn hierarchy and low-level control end-to-end, we can enforce modularity via planning to enable more efficient and capable robot learners. Next, we come to the role of scale in building generalist robot systems. To scale, neural networks require vast amounts of diverse data, expressive architectures to fit the data and a source of supervision to generate the data. We leverage a powerful supervision source: classical planning, which can generalize, but is expensive to run and requires access to privileged information to perform well in practice. We use these planners to supervise large-scale policy learning in simulation to produce generalist agents. Finally, we consider how to unify modularity with large-scale policy learning to build real-world robot systems capable of performing zero-shot manipulation. We do so by tightly integrating key ingredients of modular high and mid-level planning, learned local control, procedural scene generation and large-scale policy learning for sim2real transfer. We demonstrate that this recipe can produce a single, generalist agent that can solve challenging long-horizon manipulation tasks in the real world.

Interactive Tumor Progression Modeling via Sketch-Based Image Editing

Authors:Gexin Huang, Ruinan Jin, Yucheng Tang, Can Zhao, Tatsuya Harada, Xiaoxiao Li, Gu Lin
Date:2025-03-10 00:04:19

Accurately visualizing and editing tumor progression in medical imaging is crucial for diagnosis, treatment planning, and clinical communication. To address the challenges of subjectivity and limited precision in existing methods, we propose SkEditTumor, a sketch-based diffusion model for controllable tumor progression editing. By leveraging sketches as structural priors, our method enables precise modifications of tumor regions while maintaining structural integrity and visual realism. We evaluate SkEditTumor on four public datasets - BraTS, LiTS, KiTS, and MSD-Pancreas - covering diverse organs and imaging modalities. Experimental results demonstrate that our method outperforms state-of-the-art baselines, achieving superior image fidelity and segmentation accuracy. Our contributions include a novel integration of sketches with diffusion models for medical image editing, fine-grained control over tumor progression visualization, and extensive validation across multiple datasets, setting a new benchmark in the field.

Chance-Constrained Trajectory Planning with Multimodal Environmental Uncertainty

Authors:Kai Ren, Heejin Ahn, Maryam Kamgarpour
Date:2025-03-09 21:18:35

We tackle safe trajectory planning under Gaussian mixture model (GMM) uncertainty. Specifically, we use a GMM to model the multimodal behaviors of obstacles' uncertain states. Then, we develop a mixed-integer conic approximation to the chance-constrained trajectory planning problem with deterministic linear systems and polyhedral obstacles. When the GMM moments are estimated via finite samples, we develop a tight concentration bound to ensure the chance constraint with a desired confidence. Moreover, to limit the amount of constraint violation, we develop a Conditional Value-at-Risk (CVaR) approach corresponding to the chance constraints and derive a tractable approximation for known and estimated GMM moments. We verify our methods with state-of-the-art trajectory prediction algorithms and autonomous driving datasets.

pRRTC: GPU-Parallel RRT-Connect for Fast, Consistent, and Low-Cost Motion Planning

Authors:Chih H. Huang, Pranav Jadhav, Brian Plancher, Zachary Kingston
Date:2025-03-09 20:23:12

Sampling-based motion planning algorithms, like the Rapidly-Exploring Random Tree (RRT) and its widely used variant, RRT-Connect, provide efficient solutions for high-dimensional planning problems faced by real-world robots. However, these methods remain computationally intensive, particularly in complex environments that require many collision checks. As such, to improve performance, recent efforts have explored parallelizing specific components of RRT, such as collision checking or running multiple planners independently, but no prior work has integrated parallelism at multiple levels of the algorithm for robotic manipulation. In this work, we present pRRTC, a GPU-accelerated implementation of RRT-Connect that achieves parallelism across the entire algorithm through multithreaded expansion and connection, SIMT-optimized collision checking, and hierarchical parallelism optimization, improving efficiency, consistency, and initial solution cost. We evaluate the effectiveness of pRRTC on the MotionBenchMaker dataset using robots with 7, 8, and 14 degrees-of-freedom, demonstrating up to 6x average speedup on constrained reaching tasks at high collision checking resolution compared to state-of-the-art. pRRTC also demonstrates a 5x reduction in solution time variance and 1.5x improvement in initial path costs compared to state-of-the-art motion planners in complex environments across all robots.

Probing the Design Space of InSb Topological Superconductor Nanowires for the Realization of Majorana Zero Modes

Authors:Mirko Poljak
Date:2025-03-09 19:25:33

Non-Abelian anyons such as Majorana zero modes (MZMs) have the potential to enable fault-tolerant quantum computing through topological protection. Experimentally reported InSb topological superconductor nanowires (TSNW) are investigated theoretically and numerically to evaluate their suitability to host MZMs. We employ eigenspectra analysis and quantum transport based on the non-equilibrium Green's function (NEGF) formalism to investigate the eigenenergies, Majorana wave functions via local density of states, transmission spectra for Andreev processes, and zero-bias conductance peaks (ZBCPs) in InSb TSNWs. For 1.6 {\mu}m- and 2.2 {\mu}m-long InSb TSNWs we demonstrate the existence of the optimum design space defined by the applied magnetic field and electrochemical potential, which leads to clear ZBCP signatures with a Majorana localization length down to ~340 nm.

CLAD: Constrained Latent Action Diffusion for Vision-Language Procedure Planning

Authors:Lei Shi, Andreas Bulling
Date:2025-03-09 14:31:46

We propose CLAD -- a Constrained Latent Action Diffusion model for vision-language procedure planning in instructional videos. Procedure planning is the challenging task of predicting intermediate actions given a visual observation of a start and a goal state. However, future interactive AI systems must also be able to plan procedures using multi-modal input, e.g., where visual observations are augmented with language descriptions. To tackle this vision-language procedure planning task, our method uses a Variational Autoencoder (VAE) to learn the latent representation of actions and observations as constraints and integrate them into the diffusion process. This approach exploits that the latent space of diffusion models already has semantics that can be used. We use the latent constraints to steer the diffusion model to better generate actions. We report extensive experiments on the popular CrossTask, Coin, and NIV datasets and show that our method outperforms state-of-the-art methods by a large margin. By evaluating ablated versions of our method, we further show that the proposed integration of the action and observation representations learnt in the VAE latent space is key to these performance improvements.

Quantum Speedup in Dissecting Roots and Solving Nonlinear Algebraic Equations

Authors:Nhat A. Nghiem
Date:2025-03-09 13:27:11

It is shown that quantum computer can detect the existence of root of a function almost exponentially more efficient than the classical counterpart. It is also shown that a quantum computer can produce quantum state corresponding to the solution of nonlinear algebraic equations quadratically faster than the best known classical approach. Various applications and implications are discussed, including a quantum algorithm for solving dense linear systems with quadratic speedup, determining equilibrium states, simulating the dynamics of nonlinear coupled oscillators, estimating Lyapunov exponent, an improved quantum partial differential equation solver, and a quantum-enhanced collision detector for robotic motion planning. This provides further evidence of quantum advantage without requiring coherent quantum access to classical data, delivering meaningful real-world applications.

Non-Equilibrium MAV-Capture-MAV via Time-Optimal Planning and Reinforcement Learning

Authors:Canlun Zheng, Zhanyu Guo, Zikang Yin, Chunyu Wang, Zhikun Wang, Shiyu Zhao
Date:2025-03-09 12:16:30

The capture of flying MAVs (micro aerial vehicles) has garnered increasing research attention due to its intriguing challenges and promising applications. Despite recent advancements, a key limitation of existing work is that capture strategies are often relatively simple and constrained by platform performance. This paper addresses control strategies capable of capturing high-maneuverability targets. The unique challenge of achieving target capture under unstable conditions distinguishes this task from traditional pursuit-evasion and guidance problems. In this study, we transition from larger MAV platforms to a specially designed, compact capture MAV equipped with a custom launching device while maintaining high maneuverability. We explore both time-optimal planning (TOP) and reinforcement learning (RL) methods. Simulations demonstrate that TOP offers highly maneuverable and shorter trajectories, while RL excels in real-time adaptability and stability. Moreover, the RL method has been tested in real-world scenarios, successfully achieving target capture even in unstable states.

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Authors:Haoqiang Kang, Enna Sachdeva, Piyush Gupta, Sangjae Bae, Kwonjoon Lee
Date:2025-03-09 08:38:10

Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while PPO focuses on maximizing cumulative rewards. These limitations often restrict solution diversity and hinder generalization in multi-step reasoning tasks. To address these challenges, we introduce a novel framework, GFlowVLM, a framework that fine-tune VLMs using Generative Flow Networks (GFlowNets) to promote generation of diverse solutions for complex reasoning tasks. GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. It takes observations and task descriptions as inputs to prompt chain-of-thought (CoT) reasoning which subsequently guides action selection. We use task based rewards to fine-tune VLM with GFlowNets. This approach enables VLMs to outperform prior fine-tuning methods, including SFT and RL. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld), showing enhanced training efficiency, solution diversity, and stronger generalization capabilities across both in-distribution and out-of-distribution scenarios.

Socioeconomic centers in cities worldwide

Authors:Shuai Pang, Junlong Zhang, Lei Dong
Date:2025-03-09 05:02:28

Urban centers serve as engines of regional development, yet accurately defining and identifying the socioeconomic centers of cities globally remains a big challenge. Existing mapping efforts are often limited to large cities in developed regions and rely on data sources that are unavailable in many developing countries. This data scarcity hinders the establishment of consistent urban indicators, such as accessibility, to assess progress towards the United Nations Sustainable Development Goals (SDGs). Here, we develop and validate a global map of the socioeconomic centers of cities for 2020 by integrating nighttime light and population density data within an advanced geospatial modeling framework. Our analysis reveals that monocentric cities -- the standard urban model -- still dominate our planet, accounting for over 80% of cities worldwide. However, these monocentric cities encompass only approximately 20% of the total urbanized area, urban population, and nighttime light intensity; this 80/20 pattern underscores significant disparities in urban development. Further analysis, combined with socioeconomic datasets, reveals a marked difference between developed and developing regions: high-income countries exhibit greater polycentricity than low-income countries, demonstrating a positive correlation between urban sprawl and economic growth. Our global dataset and findings provide critical insights into urban structure and development, with important implications for urban planning, policymaking, and the formulation of indicators for urban sustainability assessment.

Exponential-polynomial divergence based inference for nondestructive one-shot devices under progressive stress model

Authors:Shanya Baghel, Shuvashree Mondal
Date:2025-03-09 03:16:21

Nondestructive one-shot device (NOSD) testing plays a crucial role in engineering, particularly in the reliability assessment of high-stakes systems such as aerospace components, medical devices, and semiconductor technologies. Accurate reliability prognosis of NOSD testing data is essential for ensuring product durability, safety, and performance optimization. The conventional estimation methods like maximum likelihood estimation (MLE) are sensitive to data contamination, leading to biased results. Consequently, this study develops robust inferential analysis for NOSD testing data under a progressive stress model. The lifetime of NOSD is assumed to follow Log-logistic distribution. The estimation procedure addresses robustness by incorporating Exponential-polynomial divergence (EPD). Equipped with three tuning parameters, EPD based estimation is proven to be more flexible than density power divergence estimation frequently used for one-shot device testing data analysis. Further, we explore the asymptotic behaviour of minimum EPD estimator (MEPDE) for large sample size. The robustness of MEPDE is analytically studied through influence function. Since tradeoff between efficiency and robustness of EPD based estimation is governed by three tuning parameters, a novel approach leveraging Concrete Score Matching (CSM) is introduced to optimize the tuning parameters of MEPDE. Moreover, a comparative study with the existing methods of finding tuning parameters is conducted through extensive simulation experiment and data analysis. Another aspect of this study is determining an optimal plan to ensure a successful ALT experiment within specified budget and time constraints. It is designed on A-optimality criteria subject to the given constraints and is executed using the constraint particle swarm optimization (CPSO) algorithm.

Reduced-Order Model-Based Gait Generation for Snake Robot Locomotion using NMPC

Authors:Adarsh Salagame, Eric Sihite, Milad Ramezani, Alireza Ramezani
Date:2025-03-09 02:44:32

This paper presents an optimization-based motion planning methodology for snake robots operating in constrained environments. By using a reduced-order model, the proposed approach simplifies the planning process, enabling the optimizer to autonomously generate gaits while constraining the robot's footprint within tight spaces. The method is validated through high-fidelity simulations that accurately model contact dynamics and the robot's motion. Key locomotion strategies are identified and further demonstrated through hardware experiments, including successful navigation through narrow corridors.

Efficient Gradient-Based Inference for Manipulation Planning in Contact Factor Graphs

Authors:Jeongmin Lee, Sunkyung Park, Minji Lee, Dongjun Lee
Date:2025-03-08 18:25:26

This paper presents a framework designed to tackle a range of planning problems arise in manipulation, which typically involve complex geometric-physical reasoning related to contact and dynamic constraints. We introduce the Contact Factor Graph (CFG) to graphically model these diverse factors, enabling us to perform inference on the graphs to approximate the distribution and sample appropriate solutions. We propose a novel approach that can incorporate various phenomena of contact manipulation as differentiable factors, and develop an efficient inference algorithm for CFG that leverages this differentiability along with the conditional probabilities arising from the structured nature of contact. Our results demonstrate the capability of our framework in generating viable samples and approximating posterior distributions for various manipulation scenarios.

Production of $1^{-+}$ exotic charmonium-like states in electron-positron collisions

Authors:Xiao-Yu Zhang, Pan-Pan Shi, Feng-Kun Guo
Date:2025-03-08 16:11:56

The absence of observed charmonium-like states with the exotic quantum numbers $J^{PC}=1^{-+}$ has prompted us to investigate the production rates of the $1^{-+}$ $D^*\bar D_1(2420)$ and $D^*\bar D_2^*(2460)$ hadronic molecules, which we refer to as $\eta_{c1}$ and $\eta_{c1}^{\prime}$, respectively, in electron-positron collisions. Assuming a hadronic molecular nature for the vector charmonium-like states $\psi(4360)$ and $\psi(4415)$, we evaluate the radiative decay widths of $\psi(4360)\to\gamma\eta_{c1}$ and $\psi(4415)\to\gamma\eta_{c1}^{\prime}$. Using these decay widths, we estimate the cross sections for producing $\eta_{c1}$ and $\eta_{c1}^{\prime}$ in electron-positron annihilations, as well as the event numbers at the planned Super $\tau$-Charm Facility. Our results suggest that the ideal energy region for observing these states is around $4.44$ and $4.50$ GeV, just above the $D^* \bar D_1(2420)$ and $D^*\bar D_2^*(2460)$ thresholds, respectively.

Dynamically evolving segment anything model with continuous learning for medical image segmentation

Authors:Zhaori Liu, Mengyang Li, Hu Han, Enli Zhang, Shiguang Shan, Zhiming Zhao
Date:2025-03-08 14:37:52

Medical image segmentation is essential for clinical diagnosis, surgical planning, and treatment monitoring. Traditional approaches typically strive to tackle all medical image segmentation scenarios via one-time learning. However, in practical applications, the diversity of scenarios and tasks in medical image segmentation continues to expand, necessitating models that can dynamically evolve to meet the demands of various segmentation tasks. Here, we introduce EvoSAM, a dynamically evolving medical image segmentation model that continuously accumulates new knowledge from an ever-expanding array of scenarios and tasks, enhancing its segmentation capabilities. Extensive evaluations on surgical image blood vessel segmentation and multi-site prostate MRI segmentation demonstrate that EvoSAM not only improves segmentation accuracy but also mitigates catastrophic forgetting. Further experiments conducted by surgical clinicians on blood vessel segmentation confirm that EvoSAM enhances segmentation efficiency based on user prompts, highlighting its potential as a promising tool for clinical applications.