planning - 2025-06-02

ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL

Authors:Yu Zhang, Yunqi Li, Yifan Yang, Rui Wang, Yuqing Yang, Dai Qi, Jianmin Bao, Dongdong Chen, Chong Luo, Lili Qiu
Date:2025-05-30 17:59:48

Although chain-of-thought reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization. To enable the model to reason through text before generating images, We automatically generate and release a corpus of model crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions. Our GRPO algorithm uses reward signals from a pretrained vision language model to assess overall visual quality, optimizing the policy in each update. Evaluations on GenEval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. More: aka.ms/reasongen.

GenSpace: Benchmarking Spatially-Aware Image Generation

Authors:Zehan Wang, Jiayang Xu, Ziang Zhang, Tianyu Pan, Chao Du, Hengshuang Zhao, Zhou Zhao
Date:2025-05-30 17:59:26

Humans can intuitively compose and arrange scenes in the 3D space for photography. However, can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts? We present GenSpace, a novel benchmark and evaluation pipeline to comprehensively assess the spatial awareness of current image generation models. Furthermore, standard evaluations using general Vision-Language Models (VLMs) frequently fail to capture the detailed spatial errors. To handle this challenge, we propose a specialized evaluation pipeline and metric, which reconstructs 3D scene geometry using multiple visual foundation models and provides a more accurate and human-aligned metric of spatial faithfulness. Our findings show that while AI models create visually appealing images and can follow general instructions, they struggle with specific 3D details like object placement, relationships, and measurements. We summarize three core limitations in the spatial perception of current state-of-the-art image generation models: 1) Object Perspective Understanding, 2) Egocentric-Allocentric Transformation and 3) Metric Measurement Adherence, highlighting possible directions for improving spatial intelligence in image generation.

RealDrive: Retrieval-Augmented Driving with Diffusion Models

Authors:Wenhao Ding, Sushant Veer, Yuxiao Chen, Yulong Cao, Chaowei Xiao, Marco Pavone
Date:2025-05-30 17:15:03

Learning-based planners generate natural human-like driving behaviors by learning to reason about nuanced interactions from data, overcoming the rigid behaviors that arise from rule-based planners. Nonetheless, data-driven approaches often struggle with rare, safety-critical scenarios and offer limited controllability over the generated trajectories. To address these challenges, we propose RealDrive, a Retrieval-Augmented Generation (RAG) framework that initializes a diffusion-based planning policy by retrieving the most relevant expert demonstrations from the training dataset. By interpolating between current observations and retrieved examples through a denoising process, our approach enables fine-grained control and safe behavior across diverse scenarios, leveraging the strong prior provided by the retrieved scenario. Another key insight we produce is that a task-relevant retrieval model trained with planning-based objectives results in superior planning performance in our framework compared to a task-agnostic retriever. Experimental results demonstrate improved generalization to long-tail events and enhanced trajectory diversity compared to standard learning-based planners -- we observe a 40% reduction in collision rate on the Waymo Open Motion dataset with RAG.

Unsupervised Evolutionary Cell Type Matching via Entropy-Minimized Optimal Transport

Authors:Mu Qiao
Date:2025-05-30 16:20:00

Identifying evolutionary correspondences between cell types across species is a fundamental challenge in comparative genomics and evolutionary biology. Existing approaches often rely on either reference-based matching, which imposes asymmetry by designating one species as the reference, or projection-based matching, which may increase computational complexity and obscure biological interpretability at the cell-type level. Here, we present OT-MESH, an unsupervised computational framework leveraging entropy-regularized optimal transport (OT) to systematically determine cross-species cell type homologies. Our method uniquely integrates the Minimize Entropy of Sinkhorn (MESH) technique to refine the OT plan. It begins by selecting genes with high Signal-to-Noise Ratio (SNR) to capture the most informative features, from which a cost matrix is constructed using cosine distances between cell-type centroids. Importantly, the MESH procedure iteratively refines the cost matrix, leading to a transport plan with significantly enhanced sparsity and interpretability of the resulting correspondence matrices. Applied to retinal bipolar cells (BCs) and retinal ganglion cells (RGCs) from mouse and macaque, OT-MESH accurately recovers known evolutionary relationships and uncovers novel correspondences, one of which was independently validated experimentally. Thus, our framework offers a principled, scalable, symmetric, and interpretable solution for evolutionary cell type mapping, facilitating deeper insights into cellular specialization and conservation across species.

EL-AGHF: Extended Lagrangian Affine Geometric Heat Flow

Authors:Sangmin Kim, Hae-Won Park
Date:2025-05-30 16:12:20

We propose a constrained Affine Geometric Heat Flow (AGHF) method that evolves so as to suppress the dynamics gaps associated with inadmissible control directions. AGHF provides a unified framework applicable to a wide range of motion planning problems, including both holonomic and non-holonomic systems. However, to generate admissible trajectories, it requires assigning infinite penalties to inadmissible control directions. This design choice, while theoretically valid, often leads to high computational cost or numerical instability when the penalty becomes excessively large. To overcome this limitation, we extend AGHF in an Augmented Lagrangian method approach by introducing a dual trajectory related to dynamics gaps in inadmissible control directions. This method solves the constrained variational problem as an extended parabolic partial differential equation defined over both the state and dual trajectorys, ensuring the admissibility of the resulting trajectory. We demonstrate the effectiveness of our algorithm through simulation examples.

Assessing Future Wind Energy Potential under Climate Change: The Critical Role of Multi-Model Ensembles in Robustness Assessment

Authors:Andrea Lira-Loarca, Francesco Ferrari, Andrea Mazzino
Date:2025-05-30 11:07:18

Accurate projections of wind energy potential under climate change are critical for effective long-term energy planning. While previous studies have highlighted the value of multi-model ensembles, they often fall short in capturing the full spectrum of uncertainties and temporal dynamics relevant to wind resource reliability. This paper presents one of the most comprehensive assessments to date, leveraging a large ensemble of 21 high-resolution RCM-GCM combinations from the EURO-CORDEX initiative to evaluate future wind energy conditions across Europe under the RCP8.5 scenario. Moving beyond mean values, we incorporate a novel event-based framework to analyze persistent high- and low-wind episodes using ERA5-derived percentile thresholds -- capturing operationally critical conditions that influence turbine performance and grid stability. To ensure statistical rigor, we apply the IPCC AR6 `Approach C' for robustness assessment, distinguishing climate signals from internal variability and quantifying model agreement. Crucially, we demonstrate that projections based on limited sub-ensembles can lead to contradictory or misleading conclusions, underscoring the essential role of ensemble diversity. The combination of spatial granularity, temporal detail, and formal uncertainty quantification makes this study a significant advancement in climate-informed wind energy research and a valuable tool for resilient energy system design.

SAH-Drive: A Scenario-Aware Hybrid Planner for Closed-Loop Vehicle Trajectory Generation

Authors:Yuqi Fan, Zhiyong Cui, Zhenning Li, Yilong Ren, Haiyang Yu
Date:2025-05-30 09:19:39

Reliable planning is crucial for achieving autonomous driving. Rule-based planners are efficient but lack generalization, while learning-based planners excel in generalization yet have limitations in real-time performance and interpretability. In long-tail scenarios, these challenges make planning particularly difficult. To leverage the strengths of both rule-based and learning-based planners, we proposed the Scenario-Aware Hybrid Planner (SAH-Drive) for closed-loop vehicle trajectory planning. Inspired by human driving behavior, SAH-Drive combines a lightweight rule-based planner and a comprehensive learning-based planner, utilizing a dual-timescale decision neuron to determine the final trajectory. To enhance the computational efficiency and robustness of the hybrid planner, we also employed a diffusion proposal number regulator and a trajectory fusion module. The experimental results show that the proposed method significantly improves the generalization capability of the planning system, achieving state-of-the-art performance in interPlan, while maintaining computational efficiency without incurring substantial additional runtime.

Spatiotemporal Analysis of Forest Machine Operations Using 3D Video Classification

Authors:Maciej Wielgosz, Simon Berg, Heikki Korpunen, Stephan Hoffmann
Date:2025-05-30 09:07:57

This paper presents a deep learning-based framework for classifying forestry operations from dashcam video footage. Focusing on four key work elements - crane-out, cutting-and-to-processing, driving, and processing - the approach employs a 3D ResNet-50 architecture implemented with PyTorchVideo. Trained on a manually annotated dataset of field recordings, the model achieves strong performance, with a validation F1 score of 0.88 and precision of 0.90. These results underscore the effectiveness of spatiotemporal convolutional networks for capturing both motion patterns and appearance in real-world forestry environments. The system integrates standard preprocessing and augmentation techniques to improve generalization, but overfitting is evident, highlighting the need for more training data and better class balance. Despite these challenges, the method demonstrates clear potential for reducing the manual workload associated with traditional time studies, offering a scalable solution for operational monitoring and efficiency analysis in forestry. This work contributes to the growing application of AI in natural resource management and sets the foundation for future systems capable of real-time activity recognition in forest machinery. Planned improvements include dataset expansion, enhanced regularization, and deployment trials on embedded systems for in-field use.

Imitation Learning-Based Path Generation for the Complex Assembly of Deformable Objects

Authors:Yitaek Kim, Christoffer Sloth
Date:2025-05-30 08:29:03

This paper investigates how learning can be used to ease the design of high-quality paths for the assembly of deformable objects. Object dynamics plays an important role when manipulating deformable objects; thus, detailed models are often used when conducting motion planning for deformable objects. We propose to use human demonstrations and learning to enable motion planning of deformable objects with only simple dynamical models of the objects. In particular, we use the offline collision-free path planning, to generate a large number of reference paths based on a simple model of the deformable object. Subsequently, we execute the collision-free paths on a robot with a compliant control such that a human can slightly modify the path to complete the task successfully. Finally, based on the virtual path data sets and the human corrected ones, we use behavior cloning (BC) to create a dexterous policy that follows one reference path to finish a given task.

DTR: Delaunay Triangulation-based Racing for Scaled Autonomous Racing

Authors:Luca Tognoni, Neil Reichlin, Edoardo Ghignone, Nicolas Baumann, Steven Marty, Liam Boyle, Michele Magno
Date:2025-05-30 08:02:49

Reactive controllers for autonomous racing avoid the computational overhead of full ee-Think-Act autonomy stacks by directly mapping sensor input to control actions, eliminating the need for localization and planning. A widely used reactive strategy is FTG, which identifies gaps in LiDAR range measurements and steers toward a chosen one. While effective on fully bounded circuits, FTG fails in scenarios with incomplete boundaries and is prone to driving into dead-ends, known as FTG-traps. This work presents DTR, a reactive controller that combines Delaunay triangulation, from raw LiDAR readings, with track boundary segmentation to extract a centerline while systematically avoiding FTG-traps. Compared to FTG, the proposed method achieves lap times that are 70\% faster and approaches the performance of map-dependent methods. With a latency of 8.95 ms and CPU usage of only 38.85\% on the robot's OBC, DTR is real-time capable and has been successfully deployed and evaluated in field experiments.

SSCard: Substring Cardinality Estimation using Suffix Tree-Guided Learned FM-Index

Authors:Yirui Zhan, Wen Nie, Jun Gao
Date:2025-05-30 07:52:06

Accurate cardinality estimation of substring queries, which are commonly expressed using the SQL LIKE predicate, is crucial for query optimization in database systems. While both rule-based methods and machine learning-based methods have been developed to optimize various aspects of cardinality estimation, their absence of error bounds may result in substantial estimation errors, leading to suboptimal execution plans. In this paper, we propose SSCard, a novel SubString Cardinality estimator that leverages a space-efficient FM-Index into flexible database applications. SSCard first extends the FM-Index to support multiple strings naturally, and then organizes the FM-index using a pruned suffix tree. The suffix tree structure enables precise cardinality estimation for short patterns and achieves high compression via a pushup operation, especially on a large alphabet with skewed character distributions. Furthermore, SSCard incorporates a spline interpolation method with an error bound to balance space usage and estimation accuracy. Additional innovations include a bidirectional estimation algorithm and incremental update strategies. Extensive experimental results in five real-life datasets show that SSCard outperforms both traditional methods and recent learning-based methods, which achieves an average reduction of 20% in the average q-error, 80% in the maximum q-error, and 50% in the construction time, compared with second-best approaches.

Generative AI for Urban Design: A Stepwise Approach Integrating Human Expertise with Multimodal Diffusion Models

Authors:Mingyi He, Yuebing Liang, Shenhao Wang, Yunhan Zheng, Qingyi Wang, Dingyi Zhuang, Li Tian, Jinhua Zhao
Date:2025-05-30 06:33:48

Urban design is a multifaceted process that demands careful consideration of site-specific constraints and collaboration among diverse professionals and stakeholders. The advent of generative artificial intelligence (GenAI) offers transformative potential by improving the efficiency of design generation and facilitating the communication of design ideas. However, most existing approaches are not well integrated with human design workflows. They often follow end-to-end pipelines with limited control, overlooking the iterative nature of real-world design. This study proposes a stepwise generative urban design framework that integrates multimodal diffusion models with human expertise to enable more adaptive and controllable design processes. Instead of generating design outcomes in a single end-to-end process, the framework divides the process into three key stages aligned with established urban design workflows: (1) road network and land use planning, (2) building layout planning, and (3) detailed planning and rendering. At each stage, multimodal diffusion models generate preliminary designs based on textual prompts and image-based constraints, which can then be reviewed and refined by human designers. We design an evaluation framework to assess the fidelity, compliance, and diversity of the generated designs. Experiments using data from Chicago and New York City demonstrate that our framework outperforms baseline models and end-to-end approaches across all three dimensions. This study underscores the benefits of multimodal diffusion models and stepwise generation in preserving human control and facilitating iterative refinements, laying the groundwork for human-AI interaction in urban design solutions.

Safety-Aware Robust Model Predictive Control for Robotic Arms in Dynamic Environments

Authors:Sanghyeon Nam, Dongmin Kim, Seung-Hwan Choi, Chang-Hyun Kim, Hyoeun Kwon, Hiroaki Kawamoto, Suwoong Lee
Date:2025-05-30 04:41:28

Robotic manipulators are essential for precise industrial pick-and-place operations, yet planning collision-free trajectories in dynamic environments remains challenging due to uncertainties such as sensor noise and time-varying delays. Conventional control methods often fail under these conditions, motivating the development of Robust MPC (RMPC) strategies with constraint tightening. In this paper, we propose a novel RMPC framework that integrates phase-based nominal control with a robust safety mode, allowing smooth transitions between safe and nominal operations. Our approach dynamically adjusts constraints based on real-time predictions of moving obstacles\textemdash whether human, robot, or other dynamic objects\textemdash thus ensuring continuous, collision-free operation. Simulation studies demonstrate that our controller improves both motion naturalness and safety, achieving faster task completion than conventional methods.

S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Modelwith Spatio-Temporal Visual Representation

Authors:Yichen Xie, Runsheng Xu, Tong He, Jyh-Jing Hwang, Katie Luo, Jingwei Ji, Hubert Lin, Letian Chen, Yiren Lu, Zhaoqi Leng, Dragomir Anguelov, Mingxing Tan
Date:2025-05-30 02:20:14

The latest advancements in multi-modal large language models (MLLMs) have spurred a strong renewed interest in end-to-end motion planning approaches for autonomous driving. Many end-to-end approaches rely on human annotations to learn intermediate perception and prediction tasks, while purely self-supervised approaches--which directly learn from sensor inputs to generate planning trajectories without human annotations often underperform the state of the art. We observe a key gap in the input representation space: end-to-end approaches built on MLLMs are often pretrained with reasoning tasks in 2D image space rather than the native 3D space in which autonomous vehicles plan. To this end, we propose S4-Driver, a scalable self-supervised motion planning algorithm with spatio-temporal visual representation, based on the popular PaLI multimodal large language model. S4-Driver uses a novel sparse volume strategy to seamlessly transform the strong visual representation of MLLMs from perspective view to 3D space without the need to finetune the vision encoder. This representation aggregates multi-view and multi-frame visual inputs and enables better prediction of planning trajectories in 3D space. To validate our method, we run experiments on both nuScenes and Waymo Open Motion Dataset (with in-house camera data). Results show that S4-Driver performs favorably against existing supervised multi-task approaches while requiring no human annotations. It also demonstrates great scalability when pretrained on large volumes of unannotated driving logs.

Humanoid Loco-Manipulations Pattern Generation and Stabilization Control

Authors:Masaki Murooka, Kevin Chappellet, Arnaud Tanguy, Mehdi Benallegue, Iori Kumagai, Mitsuharu Morisawa, Fumio Kanehiro, Abderrahmane Kheddar
Date:2025-05-30 01:27:09

In order for a humanoid robot to perform loco-manipulation such as moving an object while walking, it is necessary to account for sustained or alternating external forces other than ground-feet reaction, resulting from humanoid-object contact interactions. In this letter, we propose a bipedal control strategy for humanoid loco-manipulation that can cope with such external forces. First, the basic formulas of the bipedal dynamics, i.e., linear inverted pendulum mode and divergent component of motion, are derived, taking into account the effects of external manipulation forces. Then, we propose a pattern generator to plan center of mass trajectories consistent with the reference trajectory of the manipulation forces, and a stabilizer to compensate for the error between desired and actual manipulation forces. The effectiveness of our controller is assessed both in simulation and loco-manipulation experiments with real humanoid robots.

Distributed Neural Policy Gradient Algorithm for Global Convergence of Networked Multi-Agent Reinforcement Learning

Authors:Pengcheng Dai, Yuanqiu Mo, Wenwu Yu, Wei Ren
Date:2025-05-30 01:23:14

This paper studies the networked multi-agent reinforcement learning (NMARL) problem, where the objective of agents is to collaboratively maximize the discounted average cumulative rewards. Different from the existing methods that suffer from poor expression due to linear function approximation, we propose a distributed neural policy gradient algorithm that features two innovatively designed neural networks, specifically for the approximate Q-functions and policy functions of agents. This distributed neural policy gradient algorithm consists of two key components: the distributed critic step and the decentralized actor step. In the distributed critic step, agents receive the approximate Q-function parameters from their neighboring agents via a time-varying communication networks to collaboratively evaluate the joint policy. In contrast, in the decentralized actor step, each agent updates its local policy parameter solely based on its own approximate Q-function. In the convergence analysis, we first establish the global convergence of agents for the joint policy evaluation in the distributed critic step. Subsequently, we rigorously demonstrate the global convergence of the overall distributed neural policy gradient algorithm with respect to the objective function. Finally, the effectiveness of the proposed algorithm is demonstrated by comparing it with a centralized algorithm through simulation in the robot path planning environment.

A SHAP-based explainable multi-level stacking ensemble learning method for predicting the length of stay in acute stroke

Authors:Zhenran Xu
Date:2025-05-30 01:08:26

Length of stay (LOS) prediction in acute stroke is critical for improving care planning. Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors. We aimed to enhance model efficiency, performance, and interpretability by refining predictors and developing an interpretable multi-level stacking ensemble model. Data were accessed from the biennial Stroke Foundation Acute Audit (2015, 2017, 2019, 2021) in Australia. Models were developed for ischaemic and haemorrhagic stroke separately. The outcome was prolonged LOS (the LOS above the 75th percentile). Candidate predictors (ischaemic: n=89; haemorrhagic: n=83) were categorised into patient, clinical, and system domains. Feature selection with correlation-based approaches was used to refine key predictors. The evaluation of models included discrimination (AUC), calibration curves, and interpretability (SHAP plots). In ischaemic stroke (N=12,575), prolonged LOS was >=9 days, compared to >=11 days in haemorrhagic stroke (N=1,970). The ensemble model achieved superior performance [AUC: 0.824 (95% CI: 0.801-0.846)] and statistically outperformed logistic regression [AUC: 0.805 (95% CI: 0.782-0.829); P=0.0004] for ischaemic. However, the model [AUC: 0.843 (95% CI: 0.790-0.895)] did not statistically outperform logistic regression [AUC: 0.828 (95% CI: 0.774-0.882); P=0.136] for haemorrhagic. SHAP analysis identified shared predictors for both types of stroke: rehabilitation assessment, urinary incontinence, stroke unit care, inability to walk independently, physiotherapy, and stroke care coordinators involvement. An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke. Further validation in larger cohorts is needed for haemorrhagic stroke.

A Hetero-functional Graph Theory Perspective of Engineering Management of Mega-Projects

Authors:Amirreza Hosseini, Amro M. Farid
Date:2025-05-29 22:28:03

Megaprojects are large-scale, complex, and one-off engineering endeavors that require significant investments from a public or private sector. Such projects generally cost more than a billion dollars, take many years to develop and construct, involve stakeholders both in the public and private sectors, and impact millions of people. Most of the extant megaproject research is concerned with understanding why the engineering management of megaprojects fails so frequently and which dimensions make them so difficult to manage, including size, uncertainty, complexity, urgency, and institutional structure \cite{denicol:2020:00}. Recently, the literature on mega-projects has advocated for a convergence of the engineering management and production system management literature. To that end, this paper proposes the use of Model-Based System Engineering (MBSE) and Hetero-Functional Graph Theory (HFGT), where the latter, quite interestingly, finds its origins in the mass-customized production system literature. More specifically, HFGT was developed so that the physical and informatic parts of production system planning, operations, and decision-making are readily reconfigured to support production customization at scale. As the literature on megaprojects is rapidly evolving with a significant amount of divergence between authors, this report builds upon the recent and extensive megaproject literature review provided by Denicol et. al. \cite{denicol:2020:00}. The paper concludes that MBSE and HFGT provide a means for addressing many of the concluding recommendations provided by Denicol et. al. MBSE and HFGT not only align with current research on megaprojects but also push the boundaries of how the engineering management of megaprojects can gain a unified theoretical foundation.

Exploiting Euclidean Distance Field Properties for Fast and Safe 3D planning with a modified Lazy Theta*

Authors:Jose A. Cobano, L. Merino, F. Caballero
Date:2025-05-29 21:51:02

Graph search planners have been widely used for 3D path planning in the literature, and Euclidean Distance Fields (EDFs) are increasingly being used as a representation of the environment. However, to the best of our knowledge, the integration of EDFs into heuristic planning has been carried out in a loosely coupled fashion, dismissing EDF properties that can be used to accelerate/improve the planning process and enhance the safety margins of the resultant trajectories. This paper presents a fast graph search planner based on a modified Lazy Theta* planning algorithm for aerial robots in challenging 3D environments that exploits the EDF properties. The proposed planner outperforms classic graph search planners in terms of path smoothness and safety. It integrates EDFs as environment representation and directly generates fast and smooth paths avoiding the use of post-processing methods; it also considers the analytical properties of EDFs to obtain an approximation of the EDF cost along the line-of-sight segments and to reduce the number of visibility neighbours, which directly impacts the computation time. Moreover, we demonstrate that the proposed EDF-based cost function satisfies the triangle inequality, which reduces calculations during exploration and, hence, computation time. Many experiments and comparatives are carried out in 3D challenging indoor and outdoor simulation environments to evaluate and validate the proposed planner. The results show an efficient and safe planner in these environments.

Improved Accuracy in Pelvic Tumor Resections Using a Real-Time Vision-Guided Surgical System

Authors:Vahid Danesh, Paul Arauz, Maede Boroji, Andrew Zhu, Mia Cottone, Elaine Gould, Fazel A. Khan, Imin Kao
Date:2025-05-29 20:26:32

Pelvic bone tumor resections remain significantly challenging due to complex three-dimensional anatomy and limited surgical visualization. Current navigation systems and patient-specific instruments, while accurate, present limitations including high costs, radiation exposure, workflow disruption, long production time, and lack of reusability. This study evaluates a real-time vision-guided surgical system combined with modular jigs to improve accuracy in pelvic bone tumor resections. A vision-guided surgical system combined with modular cutting jigs and real-time optical tracking was developed and validated. Five female pelvis sawbones were used, with each hemipelvis randomly assigned to either the vision-guided and modular jig system or traditional freehand method. A total of twenty resection planes were analyzed for each method. Accuracy was assessed by measuring distance and angular deviations from the planned resection planes. The vision-guided and modular jig system significantly improved resection accuracy compared to the freehand method, reducing the mean distance deviation from 2.07 $\pm$ 1.71 mm to 1.01 $\pm$ 0.78 mm (p=0.0193). In particular, all specimens resected using the vision-guided system exhibited errors of less than 3 mm. Angular deviations also showed significant improvements with roll angle deviation reduced from 15.36 $\pm$ 17.57$^\circ$ to 4.21 $\pm$ 3.46$^\circ$ (p=0.0275), and pitch angle deviation decreased from 6.17 $\pm$ 4.58$^\circ$ to 1.84 $\pm$ 1.48$^\circ$ (p<0.001). The proposed vision-guided and modular jig system significantly improves the accuracy of pelvic bone tumor resections while maintaining workflow efficiency. This cost-effective solution provides real-time guidance without the need for referencing external monitors, potentially improving surgical outcomes in complex pelvic bone tumor cases.

Physics beyond the Standard Model with the DSA-2000

Authors:Kim V. Berghaus, Yufeng Du, Vincent S. H. Lee, Anirudh Prabhu, Robert Reischke, Liam Connor, Kathryn M. Zurek
Date:2025-05-29 18:00:00

The upcoming Deep Synoptic Array 2000 (DSA-2000) will map the radio sky at $0.7-2$ GHz ($2.9 - 8.3 \, \mu$eV) with unprecedented sensitivity. This will enable searches for dark matter and other physics beyond the Standard Model, of which we study four cases: axions, dark photons, dark matter subhalos and neutrino masses. We forecast DSA-2000's potential to detect axions through two mechanisms in neutron star magnetospheres: photon conversion of axion dark matter and radio emission from axion clouds, developing the first analytical treatment of the latter. We also forecast DSA-2000's sensitivity to discover kinetically mixed dark photons from black hole superradiance, constrain dark matter substructure and fifth forces through pulsar timing, and improve cosmological neutrino mass inference through fast radio burst dispersion measurements. Our analysis indicates that in its planned five year run the DSA-2000 could reach sensitivity to QCD axion parameters, improve current limits on compact dark matter by an order of magnitude, and enhance cosmological weak lensing neutrino mass constraints by a factor of three.

Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models

Authors:Haohan Chi, Huan-ang Gao, Ziming Liu, Jianing Liu, Chenyu Liu, Jinwei Li, Kaisen Yang, Yangcheng Yu, Zeda Wang, Wenyi Li, Leichen Wang, Xingtao Hu, Hao Sun, Hang Zhao, Hao Zhao
Date:2025-05-29 17:59:46

Vision-Language-Action (VLA) models for autonomous driving show promise but falter in unstructured corner case scenarios, largely due to a scarcity of targeted benchmarks. To address this, we introduce Impromptu VLA. Our core contribution is the Impromptu VLA Dataset: over 80,000 meticulously curated video clips, distilled from over 2M source clips sourced from 8 open-source large-scale datasets. This dataset is built upon our novel taxonomy of four challenging unstructured categories and features rich, planning-oriented question-answering annotations and action trajectories. Crucially, experiments demonstrate that VLAs trained with our dataset achieve substantial performance gains on established benchmarks--improving closed-loop NeuroNCAP scores and collision rates, and reaching near state-of-the-art L2 accuracy in open-loop nuScenes trajectory prediction. Furthermore, our Q&A suite serves as an effective diagnostic, revealing clear VLM improvements in perception, prediction, and planning. Our code, data and models are available at https://github.com/ahydchh/Impromptu-VLA.

Computerized Modeling of Electrophysiology and Pathoelectrophysiology of the Atria -- How Much Detail is Needed?

Authors:Olaf Dössel, Axel Loewe
Date:2025-05-29 17:51:40

This review focuses on the computerized modeling of the electrophysiology of the human atria, emphasizing the simulation of common arrhythmias such as atrial flutter (AFlut) and atrial fibrillation (AFib). Which components of the model are necessary to accurately model arrhythmogenic tissue modifications, including remodeling, cardiomyopathy, and fibrosis, to ensure reliable simulations? The central question explored is the level of detail required for trustworthy simulations for a specific context of use. The review discusses the balance between model complexity and computational efficiency, highlighting the risks of oversimplification and excessive detail. It covers various aspects of atrial modeling, from cellular to whole atria levels, including the influence of atrial geometry, fiber direction, anisotropy, and wall thickness on simulation outcomes. The article also examines the impact of different modeling approaches, such as volumetric 3D models, bilayer models, and single surface models, on the realism of simulations. In addition, it reviews the latest advances in the modeling of fibrotic tissue and the verification and validation of atrial models. The intended use of these models in planning and optimization of atrial ablation strategies is discussed, with a focus on personalized modeling for individual patients and cohort-based approaches for broader applications. The review concludes by emphasizing the importance of integrating experimental data and clinical validation to enhance the utility of computerized atrial models to improve patient outcomes.

Dual-Task Graph Neural Network for Joint Seizure Onset Zone Localization and Outcome Prediction using Stereo EEG

Authors:Syeda Abeera Amir, Artur Agaronyan, William Gaillard, Chima Oluigbo, Syed Muhammad Anwar
Date:2025-05-29 17:14:28

Accurately localizing the brain regions that triggers seizures and predicting whether a patient will be seizure-free after surgery are vital for surgical planning and patient management in drug-resistant epilepsy. Stereo-electroencephalography (sEEG) delivers high-fidelity intracranial recordings that enable clinicians to precisely locate epileptogenic networks. However, the clinical identification is subjective and dependent on the expertise of the clinical team. Data driven approaches in this domain are sparse, despite the fact that sEEG offers high temporal-fidelity related to seizure dynamics that can be leveraged using graph structures ideal for imitating brain networks. In this study, we introduce a dual-task graph-neural network (GNN) framework that operates on windowed sEEG recordings to jointly predict seizure-freedom outcomes and identify seizure-onset-zone (SOZ) channels. We assemble non-overlapping 10 second windows from 51 clinical seizures spread across 20 pediatric patients, with sEEG data annotated by clinical experts. For each temporal window we construct a functional connectivity graph via thresholded Pearson correlations and extract rich node features (spectral, statistical, wavelet, Hjorth and local graph features), alongside six global graph descriptors. We optimize a combined cross-entropy loss with a tunable task-weight, and select model hyper-parameters via Optuna. Under window-level 10-fold cross-validation, the model achieves a mean graph-level accuracy of $89.31 \pm 0.0976 \%$ for seizure-freedom prediction and a node-level SOZ localization accuracy of $94.72. \pm 0.0041 \%$. For the best performing model, we ran additive and leave-one-out ablation studies to explore feature importance for graph and node-level accuracy.

Inference-time Scaling of Diffusion Models through Classical Search

Authors:Xiangcheng Zhang, Haowei Lin, Haotian Ye, James Zou, Jianzhu Ma, Yitao Liang, Yilun Du
Date:2025-05-29 16:22:40

Classical search algorithms have long underpinned modern artificial intelligence. In this work, we tackle the challenge of inference-time control in diffusion models -- adapting generated outputs to meet diverse test-time objectives -- using principles from classical search. We propose a general framework that orchestrates local and global search to efficiently navigate the generative space. It employs a theoretically grounded local search via annealed Langevin MCMC and performs compute-efficient global exploration using breadth-first and depth-first tree search. We evaluate our approach on a range of challenging domains, including planning, offline reinforcement learning, and image generation. Across all tasks, we observe significant gains in both performance and efficiency. These results show that classical search provides a principled and practical foundation for inference-time scaling in diffusion models. Project page at diffusion-inference-scaling.github.io.

MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning

Authors:Linqiang Guo, Wei Liu, Yi Wen Heng, Tse-Hsun, Chen, Yang Wang
Date:2025-05-29 16:08:51

Mobile GUI agents aim to autonomously complete user-instructed tasks across mobile apps. Recent advances in Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens, identify actionable elements, and perform interactions such as tapping or typing. However, existing agents remain reactive: they reason only over the current screen and lack a structured model of app navigation flow, limiting their ability to understand context, detect unexpected outcomes, and recover from errors. We present MAPLE, a state-aware multi-agent framework that abstracts app interactions as a Finite State Machine (FSM). We computationally model each UI screen as a discrete state and user actions as transitions, allowing the FSM to provide a structured representation of the app execution. MAPLE consists of specialized agents responsible for four phases of task execution: planning, execution, verification, error recovery, and knowledge retention. These agents collaborate to dynamically construct FSMs in real time based on perception data extracted from the UI screen, allowing the GUI agents to track navigation progress and flow, validate action outcomes through pre- and post-conditions of the states, and recover from errors by rolling back to previously stable states. Our evaluation results on two challenging cross-app benchmarks, Mobile-Eval-E and SPA-Bench, show that MAPLE outperforms the state-of-the-art baseline, improving task success rate by up to 12%, recovery success by 13.8%, and action accuracy by 6.5%. Our results highlight the importance of structured state modeling in guiding mobile GUI agents during task execution. Moreover, our FSM representation can be integrated into future GUI agent architectures as a lightweight, model-agnostic memory layer to support structured planning, execution verification, and error recovery.

Individual differences in the cognitive mechanisms of planning strategy discovery

Authors:Ruiqi He, Falk Lieder
Date:2025-05-29 14:57:34

People employ efficient planning strategies. But how are these strategies acquired? Previous research suggests that people can discover new planning strategies through learning from reinforcements, a process known as metacognitive reinforcement learning (MCRL). While prior work has shown that MCRL models can learn new planning strategies and explain more participants' experience-driven discovery better than alternative mechanisms, it also revealed significant individual differences in metacognitive learning. Furthermore, when fitted to human data, these models exhibit a slower rate of strategy discovery than humans. In this study, we investigate whether incorporating cognitive mechanisms that might facilitate human strategy discovery can bring models of MCRL closer to human performance. Specifically, we consider intrinsically generated metacognitive pseudo-rewards, subjective effort valuation, and termination deliberation. Analysis of planning task data shows that a larger proportion of participants used at least one of these mechanisms, with significant individual differences in their usage and varying impacts on strategy discovery. Metacognitive pseudo-rewards, subjective effort valuation, and learning the value of acting without further planning were found to facilitate strategy discovery. While these enhancements provided valuable insights into individual differences and the effect of these mechanisms on strategy discovery, they did not fully close the gap between model and human performance, prompting further exploration of additional factors that people might use to discover new planning strategies.

Humanoid Loco-manipulation Planning based on Graph Search and Reachability Maps

Authors:Masaki Murooka, Iori Kumagai, Mitsuharu Morisawa, Fumio Kanehiro, Abderrahmane Kheddar
Date:2025-05-29 14:48:25

In this letter, we propose an efficient and highly versatile loco-manipulation planning for humanoid robots. Loco-manipulation planning is a key technological brick enabling humanoid robots to autonomously perform object transportation by manipulating them. We formulate planning of the alternation and sequencing of footsteps and grasps as a graph search problem with a new transition model that allows for a flexible representation of loco-manipulation. Our transition model is quickly evaluated by relocating and switching the reachability maps depending on the motion of both the robot and object. We evaluate our approach by applying it to loco-manipulation use-cases, such as a bobbin rolling operation with regrasping, where the motion is automatically planned by our framework.

Long Duration Inspection of GNSS-Denied Environments with a Tethered UAV-UGV Marsupial System

Authors:Simón Martínez-Rozas, David Alejo, José Javier Carpio, Fernando Caballero, Luis Merino
Date:2025-05-29 14:05:25

Unmanned Aerial Vehicles (UAVs) have become essential tools in inspection and emergency response operations due to their high maneuverability and ability to access hard-to-reach areas. However, their limited battery life significantly restricts their use in long-duration missions. This paper presents a novel tethered marsupial robotic system composed of a UAV and an Unmanned Ground Vehicle (UGV), specifically designed for autonomous, long-duration inspection tasks in Global Navigation Satellite System (GNSS)-denied environments. The system extends the UAV's operational time by supplying power through a tether connected to high-capacity battery packs carried by the UGV. We detail the hardware architecture based on off-the-shelf components to ensure replicability and describe our full-stack software framework, which is composed of open-source components and built upon the Robot Operating System (ROS). The proposed software architecture enables precise localization using a Direct LiDAR Localization (DLL) method and ensures safe path planning and coordinated trajectory tracking for the integrated UGV-tether-UAV system. We validate the system through three field experiments: (1) a manual flight endurance test to estimate the operational duration, (2) an autonomous navigation test, and (3) an inspection mission to demonstrate autonomous inspection capabilities. Experimental results confirm the robustness and autonomy of the system, its capacity to operate in GNSS-denied environments, and its potential for long-endurance, autonomous inspection and monitoring tasks.

Agentic Robot: A Brain-Inspired Framework for Vision-Language-Action Models in Embodied Agents

Authors:Zhejian Yang, Yongchao Chen, Xueyang Zhou, Jiangyue Yan, Dingjie Song, Yinuo Liu, Yuting Li, Yu Zhang, Pan Zhou, Hechang Chen, Lichao Sun
Date:2025-05-29 13:56:49

Long-horizon robotic manipulation poses significant challenges for autonomous systems, requiring extended reasoning, precise execution, and robust error recovery across complex sequential tasks. Current approaches, whether based on static planning or end-to-end visuomotor policies, suffer from error accumulation and lack effective verification mechanisms during execution, limiting their reliability in real-world scenarios. We present Agentic Robot, a brain-inspired framework that addresses these limitations through Standardized Action Procedures (SAP)--a novel coordination protocol governing component interactions throughout manipulation tasks. Drawing inspiration from Standardized Operating Procedures (SOPs) in human organizations, SAP establishes structured workflows for planning, execution, and verification phases. Our architecture comprises three specialized components: (1) a large reasoning model that decomposes high-level instructions into semantically coherent subgoals, (2) a vision-language-action executor that generates continuous control commands from real-time visual inputs, and (3) a temporal verifier that enables autonomous progression and error recovery through introspective assessment. This SAP-driven closed-loop design supports dynamic self-verification without external supervision. On the LIBERO benchmark, Agentic Robot achieves state-of-the-art performance with an average success rate of 79.6\%, outperforming SpatialVLA by 6.1\% and OpenVLA by 7.4\% on long-horizon tasks. These results demonstrate that SAP-driven coordination between specialized components enhances both performance and interpretability in sequential manipulation, suggesting significant potential for reliable autonomous systems. Project Github: https://agentic-robot.github.io.