planning - 2025-05-29

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

Authors:Rui Li, Zixuan Hu, Wenxi Qu, Jinouwen Zhang, Zhenfei Yin, Sha Zhang, Xuantuo Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang, Wanli Ouyang, Lei Bai, Wangmeng Zuo, Ling-Yu Duan, Dongzhan Zhou, Shixiang Tang

Date:2025-05-28 17:50:53

Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.

HDDLGym: A Tool for Studying Multi-Agent Hierarchical Problems Defined in HDDL with OpenAI Gym

Authors:Ngoc La, Ruaridh Mon-Williams, Julie A. Shah

Date:2025-05-28 17:10:43

In recent years, reinforcement learning (RL) methods have been widely tested using tools like OpenAI Gym, though many tasks in these environments could also benefit from hierarchical planning. However, there is a lack of a tool that enables seamless integration of hierarchical planning with RL. Hierarchical Domain Definition Language (HDDL), used in classical planning, introduces a structured approach well-suited for model-based RL to address this gap. To bridge this integration, we introduce HDDLGym, a Python-based tool that automatically generates OpenAI Gym environments from HDDL domains and problems. HDDLGym serves as a link between RL and hierarchical planning, supporting multi-agent scenarios and enabling collaborative planning among agents. This paper provides an overview of HDDLGym's design and implementation, highlighting the challenges and design choices involved in integrating HDDL with the Gym interface, and applying RL policies to support hierarchical planning. We also provide detailed instructions and demonstrations for using the HDDLGym framework, including how to work with existing HDDL domains and problems from International Planning Competitions, exemplified by the Transport domain. Additionally, we offer guidance on creating new HDDL domains for multi-agent scenarios and demonstrate the practical use of HDDLGym in the Overcooked domain. By leveraging the advantages of HDDL and Gym, HDDLGym aims to be a valuable tool for studying RL in hierarchical planning, particularly in multi-agent contexts.

Articulatory modeling of the S-shaped F2 trajectories observed in Öhman's spectrographic analysis of VCV syllables

Authors:Frédéric Berthommier

Date:2025-05-28 15:12:53

The synthesis of Ohman's VCV sequences with intervocalic plosive consonants was first achieved 30 years ago using the DRM model. However, this approach remains primarily acoustic and lacks articulatory constraints. In this study, the same 75 VCVs are analyzed, but generated with the Maeda model, using trajectory planning that differentiates vowel-to-vowel transitions from consonantal influences. Synthetic data exhibit similar characteristics to Ohman's sequences, including the presence of S-shaped F2 trajectories. Furthermore, locus equations (LEs) for F2 and F3 are computed from synthetic CV data to investigate their underlying determinism, leading to a reassessment of conventional interpretations. The findings indicate that, although articulatory planning is structured separately for vowel and consonant groups, S-shaped F2 trajectories emerge from a composite mechanism governed by the coordinated synergy of all articulators.

GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control

Authors:Anthony Chen, Wenzhao Zheng, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Kurt Keutzer, Shangbang Zhang

Date:2025-05-28 14:46:51

Recent advancements in world models have revolutionized dynamic environment simulation, allowing systems to foresee future states and assess potential actions. In autonomous driving, these capabilities help vehicles anticipate the behavior of other road users, perform risk-aware planning, accelerate training in simulation, and adapt to novel scenarios, thereby enhancing safety and reliability. Current approaches exhibit deficiencies in maintaining robust 3D geometric consistency or accumulating artifacts during occlusion handling, both critical for reliable safety assessment in autonomous navigation tasks. To address this, we introduce GeoDrive, which explicitly integrates robust 3D geometry conditions into driving world models to enhance spatial understanding and action controllability. Specifically, we first extract a 3D representation from the input frame and then obtain its 2D rendering based on the user-specified ego-car trajectory. To enable dynamic modeling, we propose a dynamic editing module during training to enhance the renderings by editing the positions of the vehicles. Extensive experiments demonstrate that our method significantly outperforms existing models in both action accuracy and 3D spatial awareness, leading to more realistic, adaptable, and reliable scene modeling for safer autonomous driving. Additionally, our model can generalize to novel trajectories and offers interactive scene editing capabilities, such as object editing and object trajectory control.

Complete Catalog of Laser Locking Configurations for LISA

Authors:Gerhard Heinzel, Javier Álvarez-Vizoso, Miguel Dovale-Álvarez

Date:2025-05-28 14:37:13

The Laser Interferometer Space Antenna (LISA) will enable direct observations of low-frequency gravitational waves, offering unprecedented insight into astrophysical and cosmological phenomena. LISA's heterodyne interferometric measurement system requires phase-locking five of its six onboard lasers with tunable frequency offsets to ensure that all beatnotes remain within the metrology system's operational range, despite Doppler-induced frequency shifts. The selection of these offset frequencies -- collectively forming a frequency plan -- is a complex optimization problem constrained by the spacecraft's orbital dynamics and instrument limitations. While previous work established an algorithmic solution for deriving time-dependent frequency plans, this study takes a complementary approach by systematically analyzing and cataloging all possible laser locking configurations. We present an automated method to explore, validate, and classify viable locking schemes, identifying 36 unique non-frequency-swapping configurations and 72 additional frequency-swapping configurations for an arbitrary choice of primary laser. This exhaustive classification provides a foundation for frequency planning across the full range of operational scenarios.

Probing nuclear structure in relativistic p-O and O-O collisions at the LHC through the measurement of anisotropic flow coefficients

Authors:Aswathy Menon K R, Suraj Prasad, Neelkamal Mallick, Raghunath Sahoo, Gergely Gábor Barnaföldi

Date:2025-05-28 13:55:02

RHIC and LHC plan to inject $^{16}\rm O$ nuclei with a focus to investigate collectivity and the origin of quark-gluon plasma signatures in small collision systems. $^{16}\rm O$ nuclei is known to possess clusters of $\alpha$-particles ($^{4}\rm He$) inside the nucleus. In this paper, we study the anisotropic flow coefficients such as elliptic flow ($v_2$) and triangular flow ($v_3$), which are sensitive to the nuclear geometry of colliding nuclei, for p-O and O-O collisions at $\sqrt{s_{\rm NN}}=9.61$ TeV and 7 TeV respectively. The study is performed employing a hybrid model encompassing IPGlasma + MUSIC + iSS + UrQMD. The results of the clustered nuclear geometry are compared with those of the Woods-Saxon nuclear profile. Both initial and final state anisotropies are explored. This study is thus one of its first kind, where the study of anisotropic flow coefficients for p-O and O-O collisions is presented using a hybrid hydrodynamics model. We find a small effect of $\alpha$-clustering in p-O, while a significant one for O-O collisions. It is also observed that the magnitude of the effect correlates with the size of the $^{4}$He.

VME: A Satellite Imagery Dataset and Benchmark for Detecting Vehicles in the Middle East and Beyond

Authors:Noora Al-Emadi, Ingmar Weber, Yin Yang, Ferda Ofli

Date:2025-05-28 13:34:05

Detecting vehicles in satellite images is crucial for traffic management, urban planning, and disaster response. However, current models struggle with real-world diversity, particularly across different regions. This challenge is amplified by geographic bias in existing datasets, which often focus on specific areas and overlook regions like the Middle East. To address this gap, we present the Vehicles in the Middle East (VME) dataset, designed explicitly for vehicle detection in high-resolution satellite images from Middle Eastern countries. Sourced from Maxar, the VME dataset spans 54 cities across 12 countries, comprising over 4,000 image tiles and more than 100,000 vehicles, annotated using both manual and semi-automated methods. Additionally, we introduce the largest benchmark dataset for Car Detection in Satellite Imagery (CDSI), combining images from multiple sources to enhance global car detection. Our experiments demonstrate that models trained on existing datasets perform poorly on Middle Eastern images, while the VME dataset significantly improves detection accuracy in this region. Moreover, state-of-the-art models trained on CDSI achieve substantial improvements in global car detection.

Let's Predict Sentence by Sentence

Authors:Hyeonbin Hwang, Byeongguk Jeon, Seungone Kim, Jiyeon Kim, Hoyeon Chang, Sohee Yang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo

Date:2025-05-28 10:28:35

Autoregressive language models (LMs) generate one token at a time, yet human reasoning operates over higher-level abstractions - sentences, propositions, and concepts. This contrast raises a central question- Can LMs likewise learn to reason over structured semantic units rather than raw token sequences? In this work, we investigate whether pretrained LMs can be lifted into such abstract reasoning spaces by building on their learned representations. We present a framework that adapts a pretrained token-level LM to operate in sentence space by autoregressively predicting continuous embeddings of next sentences. We explore two embedding paradigms inspired by classical representation learning: 1) semantic embeddings, learned via autoencoding to preserve surface meaning; and 2) contextual embeddings, trained via next-sentence prediction to encode anticipatory structure. We evaluate both under two inference regimes: Discretized, which decodes each predicted embedding into text before re-encoding; and Continuous, which reasons entirely in embedding space for improved efficiency. Across four domains - mathematics, logic, commonsense, and planning - contextual embeddings under continuous inference show competitive performance with Chain-of-Thought (CoT) while reducing inference-time FLOPs on average by half. We also present early signs of scalability and modular adaptation. Finally, to visualize latent trajectories, we introduce SentenceLens, a diagnostic tool that decodes intermediate model states into interpretable sentences. Together, our results indicate that pretrained LMs can effectively transition to abstract, structured reasoning within latent embedding spaces.

Convergence of the $ppp$ correlation function within the hyperspherical adiabatic basis

Authors:E. Garrido, A. Kievsky, R. Del Grande, L. Serksnyte, M. Viviani, L. E. Marcucci

Date:2025-05-28 10:02:38

The computation of the three-particle correlation function involving three hadrons started just recently after the first publications of ALICE measurements. Key elements to be considered are the correct description of the asymptotics, antisymmetrization issues and, in most cases, the treatment of the Coulomb interaction. In the case of the $ppp$ correlation function, a first analysis was done where the hyperspherical adiabatic method was used to determine the $ppp$ wave function at different energies. Although the asymptotic behavior, antisymmetrization issues and the treatment of the Coulomb interaction were discussed in detail, the convergence properties of the adiabatic basis were studied at low energies around the formation of the correlation peak determined mainly by the $J^\pi=1/2^-$ and $3/2^-$ three-body states. Since many and very precise data have been taken or are planned to be measured at energies beyond the peak, we present an analysis of the convergence characteristics of the basis as the energy of the process increases. We show that in order to describe correctly the correlation tail it is necessary to consider three-body states up to $J^\pi=21/2^-$ whereas higher states can be considered as free. Once those states are incorporated solving the associate dynamical equations, the agreement with the experimental data is found to be excellent.

Attention-Enhanced Prompt Decision Transformers for UAV-Assisted Communications with AoI

Authors:Chi Lu, Yiyang Ni, Zhe Wang, Xiaoli Shi, Jun Li, Shi Jin

Date:2025-05-28 09:41:10

Decision Transformer (DT) has recently demonstrated strong generalizability in dynamic resource allocation within unmanned aerial vehicle (UAV) networks, compared to conventional deep reinforcement learning (DRL). However, its performance is hindered due to zero-padding for varying state dimensions, inability to manage long-term energy constraint, and challenges in acquiring expert samples for few-shot fine-tuning in new scenarios. To overcome these limitations, we propose an attention-enhanced prompt Decision Transformer (APDT) framework to optimize trajectory planning and user scheduling, aiming to minimize the average age of information (AoI) under long-term energy constraint in UAV-assisted Internet of Things (IoT) networks. Specifically, we enhance the convenional DT framework by incorporating an attention mechanism to accommodate varying numbers of terrestrial users, introducing a prompt mechanism based on short trajectory demonstrations for rapid adaptation to new scenarios, and designing a token-assisted method to address the UAV's long-term energy constraint. The APDT framework is first pre-trained on offline datasets and then efficiently generalized to new scenarios. Simulations demonstrate that APDT achieves twice faster in terms of convergence rate and reduces average AoI by $8\%$ compared to conventional DT.

Lifted Forward Planning in Relational Factored Markov Decision Processes with Concurrent Actions

Authors:Florian Andreas Marwitz, Tanya Braun, Ralf Möller, Marcel Gehrke

Date:2025-05-28 09:08:27

Decision making is a central problem in AI that can be formalized using a Markov Decision Process. A problem is that, with increasing numbers of (indistinguishable) objects, the state space grows exponentially. To compute policies, the state space has to be enumerated. Even more possibilities have to be enumerated if the size of the action space depends on the size of the state space, especially if we allow concurrent actions. To tackle the exponential blow-up in the action and state space, we present a first-order representation to store the spaces in polynomial instead of exponential size in the number of objects and introduce Foreplan, a relational forward planner, which uses this representation to efficiently compute policies for numerous indistinguishable objects and actions. Additionally, we introduce an even faster approximate version of Foreplan. Moreover, Foreplan identifies how many objects an agent should act on to achieve a certain task given restrictions. Further, we provide a theoretical analysis and an empirical evaluation of Foreplan, demonstrating a speedup of at least four orders of magnitude.

Multi-period Mean-Buffered Probability of Exceedance in Defined Contribution Portfolio Optimization

Authors:Duy-Minh Dang, Chang Chen

Date:2025-05-28 08:47:54

We investigate multi-period mean-risk portfolio optimization for long-horizon Defined Contribution plans, focusing on buffered Probability of Exceedance (bPoE), a more intuitive, dollar-based alternative to Conditional Value-at-Risk (CVaR). We formulate both pre-commitment and time-consistent Mean-bPoE and Mean-CVaR portfolio optimization problems under realistic investment constraints (e.g., no leverage, no short selling) and jump-diffusion dynamics. These formulations are naturally framed as bilevel optimization problems, with an outer search over the shortfall threshold and an inner optimization over rebalancing decisions. We establish an equivalence between the pre-commitment formulations through a one-to-one correspondence of their scalarization optimal sets, while showing that no such equivalence holds in the time-consistent setting. We develop provably convergent numerical schemes for the value functions associated with both pre-commitment and time-consistent formulations of these mean-risk control problems. Using nearly a century of market data, we find that time-consistent Mean-bPoE strategies closely resemble their pre-commitment counterparts. In particular, they maintain alignment with investors' preferences for a minimum acceptable terminal wealth level-unlike time-consistent Mean-CVaR, which often leads to counterintuitive control behavior. We further show that bPoE, as a strictly tail-oriented measure, prioritizes guarding against catastrophic shortfalls while allowing meaningful upside exposure, making it especially appealing for long-horizon wealth security. These findings highlight bPoE's practical advantages for long-horizon retirement planning.

Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning

Authors:Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Yishan Li, Yukun Yan, Shuo Wang, Zhiyuan Liu, Yu Gu, Minghe Yu, Ge Yu, Maosong Sun

Date:2025-05-28 08:17:57

Multimodal Retrieval-Augmented Generation (MRAG) has shown promise in mitigating hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge during generation. Existing MRAG methods typically adopt a static retrieval pipeline that fetches relevant information from multiple Knowledge Bases (KBs), followed by a refinement step. However, these approaches overlook the reasoning and planning capabilities of MLLMs to dynamically determine how to interact with different KBs during the reasoning process. To address this limitation, we propose R1-Router, a novel MRAG framework that learns to decide when and where to retrieve knowledge based on the evolving reasoning state. Specifically, R1-Router can generate follow-up queries according to the current reasoning step, routing these intermediate queries to the most suitable KB, and integrating external knowledge into a coherent reasoning trajectory to answer the original query. Furthermore, we introduce Step-wise Group Relative Policy Optimization (Step-GRPO), a tailored reinforcement learning algorithm that assigns step-specific rewards to optimize the reasoning behavior of MLLMs. Experimental results on various open-domain QA benchmarks across multiple modalities demonstrate that R1-Router outperforms baseline models by over 7%. Further analysis shows that R1-Router can adaptively and effectively leverage diverse KBs, reducing unnecessary retrievals and improving both efficiency and accuracy.

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Authors:Tonghe Zhang, Yu Chao, Sicang Su, Yu Wang

Date:2025-05-28 08:17:16

We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project Webpage: https://reinflow.github.io/

Reinforced Reasoning for Embodied Planning

Authors:Di Wu, Jiaxin Fan, Junzhe Zang, Guanbo Wang, Wei Yin, Wenhao Li, Bo Jin

Date:2025-05-28 07:21:37

Embodied planning requires agents to make coherent multi-step decisions based on dynamic visual observations and natural language goals. While recent vision-language models (VLMs) excel at static perception tasks, they struggle with the temporal reasoning, spatial understanding, and commonsense grounding needed for planning in interactive environments. In this work, we introduce a reinforcement fine-tuning framework that brings R1-style reasoning enhancement into embodied planning. We first distill a high-quality dataset from a powerful closed-source model and perform supervised fine-tuning (SFT) to equip the model with structured decision-making priors. We then design a rule-based reward function tailored to multi-step action quality and optimize the policy via Generalized Reinforced Preference Optimization (GRPO). Our approach is evaluated on Embench, a recent benchmark for interactive embodied tasks, covering both in-domain and out-of-domain scenarios. Experimental results show that our method significantly outperforms models of similar or larger scale, including GPT-4o-mini and 70B+ open-source baselines, and exhibits strong generalization to unseen environments. This work highlights the potential of reinforcement-driven reasoning to advance long-horizon planning in embodied AI.

Differentiable Generalized Sliced Wasserstein Plans

Authors:Laetitia Chapel, Romain Tavenard, Samuel Vaiter

Date:2025-05-28 07:18:08

Optimal Transport (OT) has attracted significant interest in the machine learning community, not only for its ability to define meaningful distances between probability distributions -- such as the Wasserstein distance -- but also for its formulation of OT plans. Its computational complexity remains a bottleneck, though, and slicing techniques have been developed to scale OT to large datasets. Recently, a novel slicing scheme, dubbed min-SWGG, lifts a single one-dimensional plan back to the original multidimensional space, finally selecting the slice that yields the lowest Wasserstein distance as an approximation of the full OT plan. Despite its computational and theoretical advantages, min-SWGG inherits typical limitations of slicing methods: (i) the number of required slices grows exponentially with the data dimension, and (ii) it is constrained to linear projections. Here, we reformulate min-SWGG as a bilevel optimization problem and propose a differentiable approximation scheme to efficiently identify the optimal slice, even in high-dimensional settings. We furthermore define its generalized extension for accommodating to data living on manifolds. Finally, we demonstrate the practical value of our approach in various applications, including gradient flows on manifolds and high-dimensional spaces, as well as a novel sliced OT-based conditional flow matching for image generation -- where fast computation of transport plans is essential.

Learning World Models for Interactive Video Generation

Authors:Taiye Chen, Xun Hu, Zihan Ding, Chi Jin

Date:2025-05-28 05:55:44

Foundational world models must be both interactive and preserve spatiotemporal coherence for effective future planning with action choices. However, present models for long video generation have limited inherent world modeling capabilities due to two main challenges: compounding errors and insufficient memory mechanisms. We enhance image-to-video models with interactive capabilities through additional action conditioning and autoregressive framework, and reveal that compounding error is inherently irreducible in autoregressive video generation, while insufficient memory mechanism leads to incoherence of world models. We propose video retrieval augmented generation (VRAG) with explicit global state conditioning, which significantly reduces long-term compounding errors and increases spatiotemporal consistency of world models. In contrast, naive autoregressive generation with extended context windows and retrieval-augmented generation prove less effective for video generation, primarily due to the limited in-context learning capabilities of current video models. Our work illuminates the fundamental challenges in video world models and establishes a comprehensive benchmark for improving video generation models with internal world modeling capabilities.

DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

Authors:Tianjun Gu, Linfeng Li, Xuhong Wang, Chenghua Gong, Jingyu Gong, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan

Date:2025-05-28 04:46:13

Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.

Enhanced SIRRT: A Structure-Aware RRT for 2D Path Planning with Hybrid Smoothing and Bidirectional Rewiring

Authors:Hyejeong Ryu

Date:2025-05-28 04:45:25

Sampling-based motion planners such as Rapidly-exploring Random Tree* (RRT*) and its informed variant IRRT* are widely used for optimal path planning in complex environments. However, these methods often suffer from slow convergence and high variance due to their reliance on random sampling, particularly when initial solution discovery is delayed. This paper presents Enhanced SIRRT* (E-SIRRT*), a structure-aware planner that improves upon the original SIRRT* framework by introducing two key enhancements: hybrid path smoothing and bidirectional rewiring. Hybrid path smoothing refines the initial path through spline fitting and collision-aware correction, while bidirectional rewiring locally optimizes tree connectivity around the smoothed path to improve cost propagation. Experimental results demonstrate that E-SIRRT* consistently outperforms IRRT* and SIRRT* in terms of initial path quality, convergence rate, and robustness across 100 trials. Unlike IRRT*, which exhibits high variability due to stochastic initialization, E-SIRRT* achieves repeatable and efficient performance through deterministic skeleton-based initialization and structural refinement.

Mastering Agile Tasks with Limited Trials

Authors:Yihang Hu, Pingyue Sheng, Shengjie Wang, Yang Gao

Date:2025-05-28 03:03:38

Embodied robots nowadays can already handle many real-world manipulation tasks. However, certain other real-world tasks (e.g., shooting a basketball into a hoop) are highly agile and require high execution precision, presenting additional challenges for methods primarily designed for quasi-static manipulation tasks. This leads to increased efforts in costly data collection, laborious reward design, or complex motion planning. Such tasks, however, are far less challenging for humans. Say a novice basketball player typically needs only $\sim$10 attempts to make their first successful shot, by roughly imitating a motion prior and then iteratively adjusting their motion based on the past outcomes. Inspired by this human learning paradigm, we propose the Adaptive Diffusion Action Plannin (ADAP) algorithm, a simple & scalable approach which iteratively refines its action plan by few real-world trials within a learned prior motion pattern, until reaching a specific goal. Experiments demonstrated that ADAP can learn and accomplish a wide range of goal-conditioned agile dynamic tasks with human-level precision and efficiency directly in real-world, such as throwing a basketball into the hoop in fewer than 10 trials. Project website:https://adap-robotics.github.io/ .

Assessing EV Charging Impacts on Power Distribution Systems: A Unified Co-Simulation Framework

Authors:Mohammadreza Iranpour, Mohammad Rasoul Narimani, Xudong Jia

Date:2025-05-27 21:16:43

The growing adoption of electric vehicles (EVs) is expected to significantly increase demand on electric power distribution systems, many of which are already nearing capacity. To address this, the paper presents a comprehensive framework for analyzing the impact of large-scale EV integration on distribution networks. Using the open-source simulator OpenDSS, the framework builds detailed, scalable models of electric distribution systems, incorporating high-fidelity synthetic data from the SMART-DS project. The study models three feeders from an urban substation in San Francisco down to the household level. A key contribution is the framework's ability to identify critical system components likely to require upgrades due to increased EV loads. It also incorporates advanced geospatial visualization through QGIS, which aids in understanding how charging demands affect specific grid areas, helping stakeholders target infrastructure reinforcements. To ensure realistic load modeling, the framework uses EV load profiles based on U.S. Department of Energy projections, factoring in vehicle types, charging behaviors, usage patterns, and adoption rates. By leveraging large-scale synthetic data, the model remains relevant for real-world utility planning. It supports diverse simulation scenarios, from light to heavy EV charging loads and distributed vs. centralized charging patterns, offering a practical planning tool for utilities and policymakers. Additionally, its modular design enables easy adaptation to different geographic regions, feeder setups, and adoption scenarios, making it suitable for future studies on evolving grid conditions.

PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation

Authors:Yifan Yin, Zhengtao Han, Shivam Aarya, Jianxin Wang, Shuhang Xu, Jiawei Peng, Angtian Wang, Alan Yuille, Tianmin Shu

Date:2025-05-27 18:25:42

Fine-grained robot manipulation, such as lifting and rotating a bottle to display the label on the cap, requires robust reasoning about object parts and their relationships with intended tasks. Despite recent advances in training general-purpose robot manipulation policies guided by language instructions, there is a notable lack of large-scale datasets for fine-grained manipulation tasks with part-level instructions and diverse 3D object instances annotated with part-level labels. In this work, we introduce PartInstruct, the first large-scale benchmark for training and evaluating fine-grained robot manipulation models using part-level instructions. PartInstruct comprises 513 object instances across 14 categories, each annotated with part-level information, and 1302 fine-grained manipulation tasks organized into 16 task classes. Our training set consists of over 10,000 expert demonstrations synthesized in a 3D simulator, where each demonstration is paired with a high-level task instruction, a chain of base part-based skill instructions, and ground-truth 3D information about the object and its parts. Additionally, we designed a comprehensive test suite to evaluate the generalizability of learned policies across new states, objects, and tasks. We evaluated several state-of-the-art robot manipulation approaches, including end-to-end vision-language policy learning and bi-level planning models for robot manipulation on our benchmark. The experimental results reveal that current models struggle to robustly ground part concepts and predict actions in 3D space, and face challenges when manipulating object parts in long-horizon tasks.

Detection of the Geminga pulsar at energies down to 20 GeV with the LST-1 of CTAO

Authors:The CTAO-LST Project, :, K. Abe, S. Abe, A. Abhishek, F. Acero, A. Aguasca-Cabot, I. Agudo, C. Alispach, D. Ambrosino, F. Ambrosino, L. A. Antonelli, C. Aramo, A. Arbet-Engels, C. Arcaro, T. T. H. Arnesen, K. Asano, P. Aubert, A. Baktash, M. Balbo, A. Bamba, A. Baquero Larriva, U. Barres de Almeida, J. A. Barrio, L. Barrios Jiménez, I. Batkovic, J. Baxter, J. Becerra González, E. Bernardini, J. Bernete, A. Berti, I. Bezshyiko, C. Bigongiari, E. Bissaldi, O. Blanch, G. Bonnoli, P. Bordas, G. Borkowski, G. Brunelli, A. Bulgarelli, M. Bunse, I. Burelli, L. Burmistrov, M. Cardillo, S. Caroff, A. Carosi, R. Carraro, M. S. Carrasco, F. Cassol, N. Castrejón, D. Cerasole, G. Ceribella, A. Cerviño Cortínez, Y. Chai, K. Cheng, A. Chiavassa, M. Chikawa, G. Chon, L. Chytka, G. M. Cicciari, A. Cifuentes, J. L. Contreras, J. Cortina, H. Costantini, M. Dalchenko, P. Da Vela, F. Dazzi, A. De Angelis, M. de Bony de Lavergne, R. Del Burgo, C. Delgado, J. Delgado Mengual, M. Dellaiera, D. della Volpe, B. De Lotto, L. Del Peral, R. de Menezes, G. De Palma, C. Díaz, A. Di Piano, F. Di Pierro, R. Di Tria, L. Di Venere, R. M. Dominik, D. Dominis Prester, A. Donini, D. Dore, D. Dorner, M. Doro, L. Eisenberger, D. Elsässer, G. Emery, J. Escudero, V. Fallah Ramazani, F. Ferrarotto, A. Fiasson, L. Foffano, S. Fröse, Y. Fukazawa, S. Gallozzi, R. Garcia López, S. Garcia Soto, C. Gasbarra, D. Gasparrini, D. Geyer, J. Giesbrecht Paiva, N. Giglietto, F. Giordano, N. Godinovic, T. Gradetzke, R. Grau, D. Green, J. Green, S. Gunji, P. Günther, J. Hackfeld, D. Hadasch, A. Hahn, M. Hashizume, T. Hassan, K. Hayashi, L. Heckmann, M. Heller, J. Herrera Llorente, K. Hirotani, D. Hoffmann, D. Horns, J. Houles, M. Hrabovsky, D. Hrupec, D. Hui, M. Iarlori, R. Imazawa, T. Inada, Y. Inome, S. Inoue, K. Ioka, M. Iori, T. Itokawa, A. Iuliano, J. Jahanvi, I. Jimenez Martinez, J. Jimenez Quiles, I. Jorge Rodrigo, J. Jurysek, M. Kagaya, O. Kalashev, V. Karas, H. Katagiri, D. Kerszberg, T. Kiyomot, Y. Kobayashi, K. Kohri, A. Kong, P. Kornecki, H. Kubo, J. Kushida, B. Lacave, M. Lainez, G. Lamanna, A. Lamastra, L. Lemoigne, M. Linhoff, S. Lombardi, F. Longo, R. López-Coto, M. López-Moya, A. López-Oramas, S. Loporchio, A. Lorini, J. Lozano Bahilo, F. Lucarelli, H. Luciani, P. L. Luque-Escamilla, P. Majumdar, M. Makariev, M. Mallamaci, D. Mandat, M. Manganaro, D. K. Maniadakis, G. Manicò, K. Mannheim, S. Marchesi, F. Marini, M. Mariotti, P. Marquez, G. Marsella, J. Martí, O. Martinez, G. Martínez, M. Martínez, A. Mas-Aguilar, M. Massa, G. Maurin, D. Mazin, J. Méndez-Gallego, S. Menon, E. Mestre Guillen, S. Micanovic, D. Miceli, T. Miener, J. M. Miranda, R. Mirzoyan, M. Mizote, T. Mizuno, M. Molero Gonzalez, E. Molina, T. Montaruli, A. Moralejo, D. Morcuende, A. Moreno Ramos, A. Morselli, V. Moya, H. Muraishi, S. Nagataki, T. Nakamori, A. Neronov, D. Nieto Castaño, M. Nievas Rosillo, L. Nikolic, K. Nishijima, K. Noda, D. Nosek, V. Novotny, S. Nozaki, M. Ohishi, Y. Ohtani, T. Oka, A. Okumura, R. Orito, L. Orsini, J. Otero-Santos, P. Ottanelli, M. Palatiello, G. Panebianco, D. Paneque, F. R. Pantaleo, R. Paoletti, J. M. Paredes, M. Pech, M. Pecimotika, M. Peresano, F. Pfeifle, E. Pietropaolo, M. Pihet, G. Pirola, C. Plard, F. Podobnik, M. Polo, E. Prandini, M. Prouza, S. Rainò, R. Rando, W. Rhode, M. Ribó, V. Rizi, G. Rodriguez Fernandez, M. D. Rodríguez Frías, P. Romano, A. Roy, A. Ruina, E. Ruiz-Velasco, T. Saito, S. Sakurai, D. A. Sanchez, H. Sano, T. Šarić, Y. Sato, F. G. Saturni, V. Savchenko, F. Schiavone, B. Schleicher, F. Schmuckermaier, J. L . Schubert, F. Schussler, T. Schweizer, M. Seglar Arroyo, T. Siegert, G. Silvestri, A. Simongini, J. Sitarek, V. Sliusar, A. Stamerra, J. Strišković, M. Strzys, Y. Suda, A. Sunny, H. Tajima, M. Takahashi, J. Takata, R. Takeishi, P. H. T. Tam, S. J. Tanaka, D. Tateishi, T. Tavernier, P. Temnikov, Y. Terada, K. Terauchi, T. Terzic, M. Teshima, M. Tluczykont, F. Tokanai, T. Tomura, D. F. Torres, F. Tramonti, P. Travnicek, G. Tripodo, A. Tutone, M. Vacula, J. van Scherpenberg, M. Vázquez Acosta, S. Ventura, S. Vercellone, G. Verna, I. Viale, A. Vigliano, C. F. Vigorito, E. Visentin, V. Vitale, V. Voitsekhovskyi, G. Voutsinas, I. Vovk, T. Vuillaume, R. Walter, L. Wan, M. Will, J. Wójtowicz, T. Yamamoto, R. Yamazaki, Y. Yao, P. K. H. Yeung, T. Yoshida, T. Yoshikoshi, W. Zhang

Date:2025-05-27 18:04:48

Geminga is the third gamma-ray pulsar firmly detected by imaging atmospheric Cherenkov telescopes (IACTs) after the Crab and the Vela pulsars. Most of its emission is expected at tens of GeV, and, out of the planned telescopes of the upcoming Cherenkov Telescope Array Observatory (CTAO), the Large-Sized Telescopes (LSTs) are the only ones with optimised sensitivity at these energies. We aim to characterise the gamma-ray pulse shape and spectrum of Geminga as observed by the first LST (hereafter LST-1) of the CTAO-North. Furthermore, this study confirms the great performance and the improved energy threshold of the telescope, as low as 10 GeV for pulsar analysis, with respect to current-generation Cherenkov telescopes. We analysed 60 hours of good-quality data taken by the LST-1 at zenith angles below 50$^\circ$. Additionally, a new Fermi-LAT analysis of 16.6 years of data was carried out to extend the spectral analysis down to 100 MeV. Lastly, a detailed study of the systematic effects was performed. We report the detection of Geminga in the energy range between 20 and 65 GeV. Of the two peaks of the phaseogram, the second one, P2, is detected with a significance of 12.2$\sigma$, while the first (P1) reaches a significance level of 2.6$\sigma$. The best-fit model for the spectrum of P2 was found to be a power law with $\Gamma = (4.5 \pm 0.4_{stat})^{+0.2_{sys}}_{-0.6_{sys}}$, compatible with the previous results obtained by the MAGIC. No evidence of curvature is found in the LST-1 energy range. The joint fit with Fermi data confirms a preference for a sub-exponential cut-off over a pure exponential, even though both models fail to reproduce the data above several tens of GeV. The overall results presented in this paper prove that the LST-1 is an excellent telescope for the observation of pulsars, and improved sensitivity is expected to be achieved with the full CTAO-North.

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

Authors:Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, Mingyu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen

Date:2025-05-27 17:29:31

Active vision, also known as active perception, refers to the process of actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention. However, despite the importance of active perception in embodied intelligence, there is little to no exploration of how MLLMs can be equipped with or learn active perception capabilities. In this paper, we first provide a systematic definition of MLLM-based active perception tasks. We point out that the recently proposed GPT-o3 model's zoom-in search strategy can be regarded as a special case of active perception; however, it still suffers from low search efficiency and inaccurate region selection. To address these issues, we propose ACTIVE-O3, a purely reinforcement learning based training framework built on top of GRPO, designed to equip MLLMs with active perception capabilities. We further establish a comprehensive benchmark suite to evaluate ACTIVE-O3 across both general open-world tasks, such as small-object and dense object grounding, and domain-specific scenarios, including small object detection in remote sensing and autonomous driving, as well as fine-grained interactive segmentation. In addition, ACTIVE-O3 also demonstrates strong zero-shot reasoning abilities on the V* Benchmark, without relying on any explicit reasoning data. We hope that our work can provide a simple codebase and evaluation protocol to facilitate future research on active perception in MLLMs.

Automation of a Matching On-Shell Calculator

Authors:Javier López Miras, Fuensanta Vilches

Date:2025-05-27 15:46:42

We introduce $\texttt{mosca}$, a $\texttt{Mathematica}$ package designed to facilitate on-shell calculations in effective field theories (EFTs). This initial release focuses on the reduction of Green's bases to physical bases, as well as transformations between arbitrary operator bases. The core of the package is based on a diagrammatic on-shell matching procedure, grounded in the equivalence of physical observables derived from both redundant and non-redundant Lagrangians. $\texttt{mosca}$ offers a complete set of tools for performing basis transformations, diagram isomorphism detection, numerical substitution of kinematic configurations, and symbolic manipulation of algebraic expressions. Planned future developments include extension to one-loop computations, thus providing support for EFT renormalization directly in a physical basis and automated computation of one-loop finite matching, including contributions from evanescent operators. The package, along with example notebooks and documentation, is available at: https://gitlab.com/matchingonshell/mosca.

robostrategy: Field and Target Assignment Optimization in the Sloan Digital Sky Survey V

Authors:Michael R. Blanton, Joleen K. Carlberg, Tom Dwelly, Ilija Medan, S. Drew Chojnowski, Kevin Covey, Megan C. Davis, John Donor, Pramod Gupta, Alexander Ji, Jennifer A. Johnson, Juna A. Kollmeier, Jose Sanchez-Gallego, Conor Sayres, Eleonora Zari

Date:2025-05-27 15:23:49

We present an algorithmic method for efficiently planning a long-term, large-scale multi-object spectroscopy program. The Sloan Digital Sky Survey V (SDSS-V) Focal Plane System performs multi-object spectroscopy using 500 robotic positioners to place fibers feeding optical and infrared spectrographs across a wide field. SDSS-V uses this system to observe targets throughout the year at two observatories in support of the science goals of its Milky Way Mapper and Black Hole Mapper programs. These science goals require observations of objects over time with preferred temporal spacinges (referred to as "cadences"), which can differ from object to object even in the same area of sky. robostrategy is the software we use to construct our planned observations so that they can best achieve the desired goals given the time available as a function of sky brightness and local sidereal time, and to assign fibers to targets during specific observations. We use linear programming techniques to seek optimal allocations of time under the constraints given. We present the methods and example results obtained with this software.

Collision Probability Estimation for Optimization-based Vehicular Motion Planning

Authors:Leon Tolksdorf, Arturo Tejada, Christian Birkner, Nathan van de Wouw

Date:2025-05-27 13:16:03

Many motion planning algorithms for automated driving require estimating the probability of collision (POC) to account for uncertainties in the measurement and estimation of the motion of road users. Common POC estimation techniques often utilize sampling-based methods that suffer from computational inefficiency and a non-deterministic estimation, i.e., each estimation result for the same inputs is slightly different. In contrast, optimization-based motion planning algorithms require computationally efficient POC estimation, ideally using deterministic estimation, such that typical optimization algorithms for motion planning retain feasibility. Estimating the POC analytically, however, is challenging because it depends on understanding the collision conditions (e.g., vehicle's shape) and characterizing the uncertainty in motion prediction. In this paper, we propose an approach in which we estimate the POC between two vehicles by over-approximating their shapes by a multi-circular shape approximation. The position and heading of the predicted vehicle are modelled as random variables, contrasting with the literature, where the heading angle is often neglected. We guarantee that the provided POC is an over-approximation, which is essential in providing safety guarantees, and present a computationally efficient algorithm for computing the POC estimate for Gaussian uncertainty in the position and heading. This algorithm is then used in a path-following stochastic model predictive controller (SMPC) for motion planning. With the proposed algorithm, the SMPC generates reproducible trajectories while the controller retains its feasibility in the presented test cases and demonstrates the ability to handle varying levels of uncertainty.

A Reduction-Driven Local Search for the Generalized Independent Set Problem

Authors:Yiping Liu, Yi Zhou, Zhenxiang Xu, Mingyu Xiao, Jin-Kao Hao

Date:2025-05-27 11:39:05

The Generalized Independent Set (GIS) problem extends the classical maximum independent set problem by incorporating profits for vertices and penalties for edges. This generalized problem has been identified in diverse applications in fields such as forest harvest planning, competitive facility location, social network analysis, and even machine learning. However, solving the GIS problem in large-scale, real-world networks remains computationally challenging. In this paper, we explore data reduction techniques to address this challenge. We first propose 14 reduction rules that can reduce the input graph with rigorous optimality guarantees. We then present a reduction-driven local search (RLS) algorithm that integrates these reduction rules into the pre-processing, the initial solution generation, and the local search components in a computationally efficient way. The RLS is empirically evaluated on 278 graphs arising from different application scenarios. The results indicates that the RLS is highly competitive -- For most graphs, it achieves significantly superior solutions compared to other known solvers, and it effectively provides solutions for graphs exceeding 260 million edges, a task at which every other known method fails. Analysis also reveals that the data reduction plays a key role in achieving such a competitive performance.

Cardiac Digital Twins at Scale from MRI: Open Tools and Representative Models from ~55000 UK Biobank Participants

Authors:Devran Ugurlu, Shuang Qian, Elliot Fairweather, Charlene Mauger, Bram Ruijsink, Laura Dal Toso, Yu Deng, Marina Strocchi, Reza Razavi, Alistair Young, Pablo Lamata, Steven Niederer, Martin Bishop

Date:2025-05-27 10:52:52

A cardiac digital twin is a virtual replica of a patient's heart for screening, diagnosis, prognosis, risk assessment, and treatment planning of cardiovascular diseases. This requires an anatomically accurate patient-specific 3D structural representation of the heart, suitable for electro-mechanical simulations or study of disease mechanisms. However, generation of cardiac digital twins at scale is demanding and there are no public repositories of models across demographic groups. We describe an automatic open-source pipeline for creating patient-specific left and right ventricular meshes from cardiovascular magnetic resonance images, its application to a large cohort of ~55000 participants from UK Biobank, and the construction of the most comprehensive cohort of adult heart models to date, comprising 1423 representative meshes across sex (male, female), body mass index (range: 16 - 42 kg/m$^2$) and age (range: 49 - 80 years). Our code is available at https://github.com/cdttk/biv-volumetric-meshing/tree/plos2025 , and pre-trained networks, representative volumetric meshes with fibers and UVCs will be made available soon.

RefAV: Towards Planning-Centric Scenario Mining

Authors:Cainan Davidson, Deva Ramanan, Neehar Peri

Date:2025-05-27 10:14:35

Autonomous Vehicles (AVs) collect and pseudo-label terabytes of multi-modal data localized to HD maps during normal fleet testing. However, identifying interesting and safety-critical scenarios from uncurated driving logs remains a significant challenge. Traditional scenario mining techniques are error-prone and prohibitively time-consuming, often relying on hand-crafted structured queries. In this work, we revisit spatio-temporal scenario mining through the lens of recent vision-language models (VLMs) to detect whether a described scenario occurs in a driving log and, if so, precisely localize it in both time and space. To address this problem, we introduce RefAV, a large-scale dataset of 10,000 diverse natural language queries that describe complex multi-agent interactions relevant to motion planning derived from 1000 driving logs in the Argoverse 2 Sensor dataset. We evaluate several referential multi-object trackers and present an empirical analysis of our baselines. Notably, we find that naively repurposing off-the-shelf VLMs yields poor performance, suggesting that scenario mining presents unique challenges. Our code and dataset are available at https://github.com/CainanD/RefAV/ and https://argoverse.github.io/user-guide/tasks/scenario_mining.html