planning - 2025-08-18

Optimal CO2 storage management considering safety constraints in multi-stakeholder multi-site CCS projects: a game theoretic perspective

Authors:Jungang Chen, Seyyed A. Hosseini

Date:2025-08-15 17:36:25

Carbon capture and storage (CCS) projects typically involve a diverse array of stakeholders or players from public, private, and regulatory sectors, each with different objectives and responsibilities. Given the complexity, scale, and long-term nature of CCS operations, determining whether individual stakeholders can independently maximize their interests or whether collaborative coalition agreements are needed remains a central question for effective CCS project planning and management. CCS projects are often implemented in geologically connected sites, where shared geological features such as pressure space and reservoir pore capacity can lead to competitive behavior among stakeholders. Furthermore, CO2 storage sites are often located in geologically mature basins that previously served as sites for hydrocarbon extraction or wastewater disposal in order to leverage existing infrastructures, which makes unilateral optimization even more complicated and unrealistic. In this work, we propose a paradigm based on Markov games to quantitatively investigate how different coalition structures affect the goals of stakeholders. We frame this multi-stakeholder multi-site problem as a multi-agent reinforcement learning problem with safety constraints. Our approach enables agents to learn optimal strategies while compliant with safety regulations. We present an example where multiple operators are injecting CO2 into their respective project areas in a geologically connected basin. To address the high computational cost of repeated simulations of high-fidelity models, a previously developed surrogate model based on the Embed-to-Control (E2C) framework is employed. Our results demonstrate the effectiveness of the proposed framework in addressing optimal management of CO2 storage when multiple stakeholders with various objectives and goals are involved.

Investigating Sensors and Methods in Grasp State Classification in Agricultural Manipulation

Authors:Benjamin Walt, Jordan Westphal, Girish Krishnan

Date:2025-08-15 16:47:42

Effective and efficient agricultural manipulation and harvesting depend on accurately understanding the current state of the grasp. The agricultural environment presents unique challenges due to its complexity, clutter, and occlusion. Additionally, fruit is physically attached to the plant, requiring precise separation during harvesting. Selecting appropriate sensors and modeling techniques is critical for obtaining reliable feedback and correctly identifying grasp states. This work investigates a set of key sensors, namely inertial measurement units (IMUs), infrared (IR) reflectance, tension, tactile sensors, and RGB cameras, integrated into a compliant gripper to classify grasp states. We evaluate the individual contribution of each sensor and compare the performance of two widely used classification models: Random Forest and Long Short-Term Memory (LSTM) networks. Our results demonstrate that a Random Forest classifier, trained in a controlled lab environment and tested on real cherry tomato plants, achieved 100% accuracy in identifying slip, grasp failure, and successful picks, marking a substantial improvement over baseline performance. Furthermore, we identify a minimal viable sensor combination, namely IMU and tension sensors that effectively classifies grasp states. This classifier enables the planning of corrective actions based on real-time feedback, thereby enhancing the efficiency and reliability of fruit harvesting operations.

Nominal Evaluation Of Automatic Multi-Sections Control Potential In Comparison To A Simpler One- Or Two-Sections Alternative With Predictive Spray Switching

Authors:Mogens Plessen

Date:2025-08-15 16:27:44

Automatic Section Control (ASC) is a long-standing trend for spraying in agriculture. It promises to minimise spray overlap areas. The core idea is to (i) switch off spray nozzles on areas that have already been sprayed, and (ii) to dynamically adjust nozzle flow rates along the boom bar that holds the spray nozzles when velocities of boom sections vary during turn maneuvers. ASC is not possible without sensors, in particular for accurate positioning data. Spraying and the movement of modern wide boom bars are highly dynamic processes. In addition, many uncertainty factors have an effect such as cross wind drift, boom height, nozzle clogging in open-field conditions, and so forth. In view of this complexity, the natural question arises if a simpler alternative exist. Therefore, an Automatic Multi-Sections Control method is compared to a proposed simpler one- or two-sections alternative that uses predictive spray switching. The comparison is provided under nominal conditions. Agricultural spraying is intrinsically linked to area coverage path planning and spray switching logic. Combinations of two area coverage path planning and switching logics as well as three sections-setups are compared. The three sections-setups differ by controlling 48 sections, 2 sections or controlling all nozzles uniformly with the same control signal as one single section. Methods are evaluated on 10 diverse real-world field examples, including non-convex field contours, freeform mainfield lanes and multiple obstacle areas. A preferred method is suggested that (i) minimises area coverage pathlength, (ii) offers intermediate overlap, (iii) is suitable for manual driving by following a pre-planned predictive spray switching logic for an area coverage path plan, and (iv) and in contrast to ASC can be implemented sensor-free and therefore at low cost.

A Comparative Study of Floating-Base Space Parameterizations for Agile Whole-Body Motion Planning

Authors:Evangelos Tsiatsianas, Chairi Kiourt, Konstantinos Chatzilygeroudis

Date:2025-08-15 15:00:25

Automatically generating agile whole-body motions for legged and humanoid robots remains a fundamental challenge in robotics. While numerous trajectory optimization approaches have been proposed, there is no clear guideline on how the choice of floating-base space parameterization affects performance, especially for agile behaviors involving complex contact dynamics. In this paper, we present a comparative study of different parameterizations for direct transcription-based trajectory optimization of agile motions in legged systems. We systematically evaluate several common choices under identical optimization settings to ensure a fair comparison. Furthermore, we introduce a novel formulation based on the tangent space of SE(3) for representing the robot's floating-base pose, which, to our knowledge, has not received attention from the literature. This approach enables the use of mature off-the-shelf numerical solvers without requiring specialized manifold optimization techniques. We hope that our experiments and analysis will provide meaningful insights for selecting the appropriate floating-based representation for agile whole-body motion generation.

Landmark-Assisted Monte Carlo Planning

Authors:David H. Chan, Mark Roberts, Dana S. Nau

Date:2025-08-15 14:16:14

Landmarks$\unicode{x2013}$conditions that must be satisfied at some point in every solution plan$\unicode{x2013}$have contributed to major advancements in classical planning, but they have seldom been used in stochastic domains. We formalize probabilistic landmarks and adapt the UCT algorithm to leverage them as subgoals to decompose MDPs; core to the adaptation is balancing between greedy landmark achievement and final goal achievement. Our results in benchmark domains show that well-chosen landmarks can significantly improve the performance of UCT in online probabilistic planning, while the best balance of greedy versus long-term goal achievement is problem-dependent. The results suggest that landmarks can provide helpful guidance for anytime algorithms solving MDPs.

Relative Position Matters: Trajectory Prediction and Planning with Polar Representation

Authors:Bozhou Zhang, Nan Song, Bingzhao Gao, Li Zhang

Date:2025-08-15 14:15:11

Trajectory prediction and planning in autonomous driving are highly challenging due to the complexity of predicting surrounding agents' movements and planning the ego agent's actions in dynamic environments. Existing methods encode map and agent positions and decode future trajectories in Cartesian coordinates. However, modeling the relationships between the ego vehicle and surrounding traffic elements in Cartesian space can be suboptimal, as it does not naturally capture the varying influence of different elements based on their relative distances and directions. To address this limitation, we adopt the Polar coordinate system, where positions are represented by radius and angle. This representation provides a more intuitive and effective way to model spatial changes and relative relationships, especially in terms of distance and directional influence. Based on this insight, we propose Polaris, a novel method that operates entirely in Polar coordinates, distinguishing itself from conventional Cartesian-based approaches. By leveraging the Polar representation, this method explicitly models distance and direction variations and captures relative relationships through dedicated encoding and refinement modules, enabling more structured and spatially aware trajectory prediction and planning. Extensive experiments on the challenging prediction (Argoverse 2) and planning benchmarks (nuPlan) demonstrate that Polaris achieves state-of-the-art performance.

Perception in Plan: Coupled Perception and Planning for End-to-End Autonomous Driving

Authors:Bozhou Zhang, Jingyu Li, Nan Song, Li Zhang

Date:2025-08-15 14:05:57

End-to-end autonomous driving has achieved remarkable advancements in recent years. Existing methods primarily follow a perception-planning paradigm, where perception and planning are executed sequentially within a fully differentiable framework for planning-oriented optimization. We further advance this paradigm through a perception-in-plan framework design, which integrates perception into the planning process. This design facilitates targeted perception guided by evolving planning objectives over time, ultimately enhancing planning performance. Building on this insight, we introduce VeteranAD, a coupled perception and planning framework for end-to-end autonomous driving. By incorporating multi-mode anchored trajectories as planning priors, the perception module is specifically designed to gather traffic elements along these trajectories, enabling comprehensive and targeted perception. Planning trajectories are then generated based on both the perception results and the planning priors. To make perception fully serve planning, we adopt an autoregressive strategy that progressively predicts future trajectories while focusing on relevant regions for targeted perception at each step. With this simple yet effective design, VeteranAD fully unleashes the potential of planning-oriented end-to-end methods, leading to more accurate and reliable driving behavior. Extensive experiments on the NAVSIM and Bench2Drive datasets demonstrate that our VeteranAD achieves state-of-the-art performance.

EvoPSF: Online Evolution of Autonomous Driving Models via Planning-State Feedback

Authors:Jiayue Jin, Lang Qian, Jingyu Zhang, Chuanyu Ju, Liang Song

Date:2025-08-15 13:00:55

Recent years have witnessed remarkable progress in autonomous driving, with systems evolving from modular pipelines to end-to-end architectures. However, most existing methods are trained offline and lack mechanisms to adapt to new environments during deployment. As a result, their generalization ability diminishes when faced with unseen variations in real-world driving scenarios. In this paper, we break away from the conventional "train once, deploy forever" paradigm and propose EvoPSF, a novel online Evolution framework for autonomous driving based on Planning-State Feedback. We argue that planning failures are primarily caused by inaccurate object-level motion predictions, and such failures are often reflected in the form of increased planner uncertainty. To address this, we treat planner uncertainty as a trigger for online evolution, using it as a diagnostic signal to initiate targeted model updates. Rather than performing blind updates, we leverage the planner's agent-agent attention to identify the specific objects that the ego vehicle attends to most, which are primarily responsible for the planning failures. For these critical objects, we compute a targeted self-supervised loss by comparing their predicted waypoints from the prediction module with their actual future positions, selected from the perception module's outputs with high confidence scores. This loss is then backpropagated to adapt the model online. As a result, our method improves the model's robustness to environmental changes, leads to more precise motion predictions, and therefore enables more accurate and stable planning behaviors. Experiments on both cross-region and corrupted variants of the nuScenes dataset demonstrate that EvoPSF consistently improves planning performance under challenging conditions.

Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation

Authors:Daniel Airinei, Elena Burceanu, Marius Leordeanu

Date:2025-08-15 12:54:13

Indoor navigation is a difficult task, as it generally comes with poor GPS access, forcing solutions to rely on other sources of information. While significant progress continues to be made in this area, deployment to production applications is still lacking, given the complexity and additional requirements of current solutions. Here, we introduce an efficient, real-time and easily deployable deep learning approach, based on visual input only, that can predict the direction towards a target from images captured by a mobile device. Our technical approach, based on a novel graph-based path generation method, combined with explainable data augmentation and curriculum learning, includes contributions that make the process of data collection, annotation and training, as automatic as possible, efficient and robust. On the practical side, we introduce a novel largescale dataset, with video footage inside a relatively large shopping mall, in which each frame is annotated with the correct next direction towards different specific target destinations. Different from current methods, ours relies solely on vision, avoiding the need of special sensors, additional markers placed along the path, knowledge of the scene map or internet access. We also created an easy to use application for Android, which we plan to make publicly available. We make all our data and code available along with visual demos on our project site

ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving

Authors:Jingyu Li, Bozhou Zhang, Xin Jin, Jiankang Deng, Xiatian Zhu, Li Zhang

Date:2025-08-15 12:06:55

Autonomous driving requires rich contextual comprehension and precise predictive reasoning to navigate dynamic and complex environments safely. Vision-Language Models (VLMs) and Driving World Models (DWMs) have independently emerged as powerful recipes addressing different aspects of this challenge. VLMs provide interpretability and robust action prediction through their ability to understand multi-modal context, while DWMs excel in generating detailed and plausible future driving scenarios essential for proactive planning. Integrating VLMs with DWMs is an intuitive, promising, yet understudied strategy to exploit the complementary strengths of accurate behavioral prediction and realistic scene generation. Nevertheless, this integration presents notable challenges, particularly in effectively connecting action-level decisions with high-fidelity pixel-level predictions and maintaining computational efficiency. In this paper, we propose ImagiDrive, a novel end-to-end autonomous driving framework that integrates a VLM-based driving agent with a DWM-based scene imaginer to form a unified imagination-and-planning loop. The driving agent predicts initial driving trajectories based on multi-modal inputs, guiding the scene imaginer to generate corresponding future scenarios. These imagined scenarios are subsequently utilized to iteratively refine the driving agent's planning decisions. To address efficiency and predictive accuracy challenges inherent in this integration, we introduce an early stopping mechanism and a trajectory selection strategy. Extensive experimental validation on the nuScenes and NAVSIM datasets demonstrates the robustness and superiority of ImagiDrive over previous alternatives under both open-loop and closed-loop conditions.

ReachVox: Clutter-free Reachability Visualization for Robot Motion Planning in Virtual Reality

Authors:Steffen Hauck, Diar Abdlkarim, John Dudley, Per Ola Kristensson, Eyal Ofek, Jens Grubert

Date:2025-08-15 12:04:52

Human-Robot-Collaboration can enhance workflows by leveraging the mutual strengths of human operators and robots. Planning and understanding robot movements remain major challenges in this domain. This problem is prevalent in dynamic environments that might need constant robot motion path adaptation. In this paper, we investigate whether a minimalistic encoding of the reachability of a point near an object of interest, which we call ReachVox, can aid the collaboration between a remote operator and a robotic arm in VR. Through a user study (n=20), we indicate the strength of the visualization relative to a point-based reachability check-up.

Principles of Physiological Closed-Loop Controllers in Neuromodulation

Authors:Victoria S. Marks, Joram vanRheede, Dean Karantonis, Rosana Esteller, David Dinsmoor, John Fleming, Barrett Larson, Lane Desborough, Peter Single, Robert Raike, Pierre-Francois DHaese, Dario J. Englot, Scott Lempka, Richard North, Lawrence Poree, Marom Bikson, Tim J. Denison

Date:2025-08-15 11:57:35

As neurostimulation devices increasingly incorporate closed-loop functionality, the greater design complexity brings additional requirements for risk management and special considerations to optimise benefit. This manuscript creates a common framework upon which all current and planned neuromodulation-based physiological closed-loop controllers (PCLCs) can be mapped including integration of the Technical Considerations of Medical Devices with Physiologic Closed-Loop Control Technology guidance published in 2023 by the United States Food and Drug Administration (FDA), a classification of feedback (reactive) and feedforward (predictive) biomarkers, and control systems theory. We explain risk management in the context of this framework and illustrate its applications for three exemplary technologies. This manuscript serves as guidance to the emerging field of PCLCs in neuromodulation, mitigating risk through standardized nomenclature and a systematic outline for rigorous device development, testing, and implementation.

Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent

Authors:Che Rin Yu, Daewon Chae, Dabin Seo, Sangwon Lee, Hyeongwoo Im, Jinkyu Kim

Date:2025-08-15 07:48:51

When humans perform everyday tasks, we naturally adjust our actions based on the current state of the environment. For instance, if we intend to put something into a drawer but notice it is closed, we open it first. However, many autonomous robots lack this adaptive awareness. They often follow pre-planned actions that may overlook subtle yet critical changes in the scene, which can result in actions being executed under outdated assumptions and eventual failure. While replanning is critical for robust autonomy, most existing methods respond only after failures occur, when recovery may be inefficient or infeasible. While proactive replanning holds promise for preventing failures in advance, current solutions often rely on manually designed rules and extensive supervision. In this work, we present a proactive replanning framework that detects and corrects failures at subtask boundaries by comparing scene graphs constructed from current RGB-D observations against reference graphs extracted from successful demonstrations. When the current scene fails to align with reference trajectories, a lightweight reasoning module is activated to diagnose the mismatch and adjust the plan. Experiments in the AI2-THOR simulator demonstrate that our approach detects semantic and spatial mismatches before execution failures occur, significantly improving task success and robustness.

Learning Differentiable Reachability Maps for Optimization-based Humanoid Motion Generation

Authors:Masaki Murooka, Iori Kumagai, Mitsuharu Morisawa, Fumio Kanehiro

Date:2025-08-15 07:22:57

To reduce the computational cost of humanoid motion generation, we introduce a new approach to representing robot kinematic reachability: the differentiable reachability map. This map is a scalar-valued function defined in the task space that takes positive values only in regions reachable by the robot's end-effector. A key feature of this representation is that it is continuous and differentiable with respect to task-space coordinates, enabling its direct use as constraints in continuous optimization for humanoid motion planning. We describe a method to learn such differentiable reachability maps from a set of end-effector poses generated using a robot's kinematic model, using either a neural network or a support vector machine as the learning model. By incorporating the learned reachability map as a constraint, we formulate humanoid motion generation as a continuous optimization problem. We demonstrate that the proposed approach efficiently solves various motion planning problems, including footstep planning, multi-contact motion planning, and loco-manipulation planning for humanoid robots.

Embodied Edge Intelligence Meets Near Field Communication: Concept, Design, and Verification

Authors:Guoliang Li, Xibin Jin, Yujie Wan, Chenxuan Liu, Tong Zhang, Shuai Wang, Chengzhong Xu

Date:2025-08-15 05:43:41

Realizing embodied artificial intelligence is challenging due to the huge computation demands of large models (LMs). To support LMs while ensuring real-time inference, embodied edge intelligence (EEI) is a promising paradigm, which leverages an LM edge to provide computing powers in close proximity to embodied robots. Due to embodied data exchange, EEI requires higher spectral efficiency, enhanced communication security, and reduced inter-user interference. To meet these requirements, near-field communication (NFC), which leverages extremely large antenna arrays as its hardware foundation, is an ideal solution. Therefore, this paper advocates the integration of EEI and NFC, resulting in a near-field EEI (NEEI) paradigm. However, NEEI also introduces new challenges that cannot be adequately addressed by isolated EEI or NFC designs, creating research opportunities for joint optimization of both functionalities. To this end, we propose radio-friendly embodied planning for EEI-assisted NFC scenarios and view-guided beam-focusing for NFC-assisted EEI scenarios. We also elaborate how to realize resource-efficient NEEI through opportunistic collaborative navigation. Experimental results are provided to confirm the superiority of the proposed techniques compared with various benchmarks.

Fluid Dynamics and Domain Reconstruction from Noisy Flow Images Using Physics-Informed Neural Networks and Quasi-Conformal Mapping

Authors:Han Zhang, Xue-Cheng Tai, Jean-Michel Morel, Raymond H. Chan

Date:2025-08-15 04:49:07

Blood flow imaging provides important information for hemodynamic behavior within the vascular system and plays an essential role in medical diagnosis and treatment planning. However, obtaining high-quality flow images remains a significant challenge. In this work, we address the problem of denoising flow images that may suffer from artifacts due to short acquisition times or device-induced errors. We formulate this task as an optimization problem, where the objective is to minimize the discrepancy between the modeled velocity field, constrained to satisfy the Navier-Stokes equations, and the observed noisy velocity data. To solve this problem, we decompose it into two subproblems: a fluid subproblem and a geometry subproblem. The fluid subproblem leverages a Physics-Informed Neural Network to reconstruct the velocity field from noisy observations, assuming a fixed domain. The geometry subproblem aims to infer the underlying flow region by optimizing a quasi-conformal mapping that deforms a reference domain. These two subproblems are solved in an alternating Gauss-Seidel fashion, iteratively refining both the velocity field and the domain. Upon convergence, the framework yields a high-quality reconstruction of the flow image. We validate the proposed method through experiments on synthetic flow data in a converging channel geometry under varying levels of Gaussian noise, and on real-like flow data in an aortic geometry with signal-dependent noise. The results demonstrate the effectiveness and robustness of the approach. Additionally, ablation studies are conducted to assess the influence of key hyperparameters.

Geometry-Aware Predictive Safety Filters on Humanoids: From Poisson Safety Functions to CBF Constrained MPC

Authors:Ryan M. Bena, Gilbert Bahati, Blake Werner, Ryan K. Cosner, Lizhi Yang, Aaron D. Ames

Date:2025-08-15 00:35:27

Autonomous navigation through unstructured and dynamically-changing environments is a complex task that continues to present many challenges for modern roboticists. In particular, legged robots typically possess manipulable asymmetric geometries which must be considered during safety-critical trajectory planning. This work proposes a predictive safety filter: a nonlinear model predictive control (MPC) algorithm for online trajectory generation with geometry-aware safety constraints based on control barrier functions (CBFs). Critically, our method leverages Poisson safety functions to numerically synthesize CBF constraints directly from perception data. We extend the theoretical framework for Poisson safety functions to incorporate temporal changes in the domain by reformulating the static Dirichlet problem for Poisson's equation as a parameterized moving boundary value problem. Furthermore, we employ Minkowski set operations to lift the domain into a configuration space that accounts for robot geometry. Finally, we implement our real-time predictive safety filter on humanoid and quadruped robots in various safety-critical scenarios. The results highlight the versatility of Poisson safety functions, as well as the benefit of CBF constrained model predictive safety-critical controllers.

Hell or High Water: Evaluating Agentic Recovery from External Failures

Authors:Andrew Wang, Sophia Hager, Adi Asija, Daniel Khashabi, Nicholas Andrews

Date:2025-08-14 19:21:09

As language model agents are applied to real world problems of increasing complexity, they will be expected to formulate plans across large search spaces. If those plans fail for reasons beyond their control, how well do language agents search for alternative ways to achieve their goals? We devise a specialized agentic planning benchmark to study this question. Each planning problem is solved via combinations of function calls. The agent searches for relevant functions from a set of over four thousand possibilities, and observes environmental feedback in the form of function outputs or error messages. Our benchmark confronts the agent with external failures in its workflow, such as functions that suddenly become unavailable. At the same time, even with the introduction of these failures, we guarantee that the task remains solvable. Ideally, an agent's performance on the planning task should not be affected by the presence of external failures. Overall, we find that language agents struggle to formulate and execute backup plans in response to environment feedback. While state-of-the-art models are often able to identify the correct function to use in the right context, they struggle to adapt to feedback from the environment and often fail to pursue alternate courses of action, even when the search space is artificially restricted. We provide a systematic analysis of the failures of both open-source and commercial models, examining the effects of search space size, as well as the benefits of scaling model size in our setting. Our analysis identifies key challenges for current generative models as well as promising directions for future work.

3D FlowMatch Actor: Unified 3D Policy for Single- and Dual-Arm Manipulation

Authors:Nikolaos Gkanatsios, Jiahe Xu, Matthew Bronars, Arsalan Mousavian, Tsung-Wei Ke, Katerina Fragkiadaki

Date:2025-08-14 18:07:40

We present 3D FlowMatch Actor (3DFA), a 3D policy architecture for robot manipulation that combines flow matching for trajectory prediction with 3D pretrained visual scene representations for learning from demonstration. 3DFA leverages 3D relative attention between action and visual tokens during action denoising, building on prior work in 3D diffusion-based single-arm policy learning. Through a combination of flow matching and targeted system-level and architectural optimizations, 3DFA achieves over 30x faster training and inference than previous 3D diffusion-based policies, without sacrificing performance. On the bimanual PerAct2 benchmark, it establishes a new state of the art, outperforming the next-best method by an absolute margin of 41.4%. In extensive real-world evaluations, it surpasses strong baselines with up to 1000x more parameters and significantly more pretraining. In unimanual settings, it sets a new state of the art on 74 RLBench tasks by directly predicting dense end-effector trajectories, eliminating the need for motion planning. Comprehensive ablation studies underscore the importance of our design choices for both policy effectiveness and efficiency.

TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning

Authors:Anantha Narayanan, Battu Bhanu Teja, Pruthwik Mishra

Date:2025-08-14 17:44:51

The increasing congestion of Low Earth Orbit (LEO) poses persistent challenges to the efficient deployment and safe operation of Earth observation satellites. Mission planners must now account not only for mission-specific requirements but also for the increasing collision risk with active satellites and space debris. This work presents a reinforcement learning framework using the Advantage Actor-Critic (A2C) algorithm to optimize satellite orbital parameters for precise terrestrial coverage within predefined surface radii. By formulating the problem as a Markov Decision Process (MDP) within a custom OpenAI Gymnasium environment, our method simulates orbital dynamics using classical Keplerian elements. The agent progressively learns to adjust five of the orbital parameters - semi-major axis, eccentricity, inclination, right ascension of ascending node, and the argument of perigee-to achieve targeted terrestrial coverage. Comparative evaluation against Proximal Policy Optimization (PPO) demonstrates A2C's superior performance, achieving 5.8x higher cumulative rewards (10.0 vs 9.263025) while converging in 31.5x fewer timesteps (2,000 vs 63,000). The A2C agent consistently meets mission objectives across diverse target coordinates while maintaining computational efficiency suitable for real-time mission planning applications. Key contributions include: (1) a TLE-based orbital simulation environment incorporating physics constraints, (2) validation of actor-critic methods' superiority over trust region approaches in continuous orbital control, and (3) demonstration of rapid convergence enabling adaptive satellite deployment. This approach establishes reinforcement learning as a computationally efficient alternative for scalable and intelligent LEO mission planning.

Local structure of centred tangent cones in the Wasserstein space

Authors:Averil Aussedat

Date:2025-08-14 17:03:04

This article investigates the geometric tangent cone to a probability measure with finite second moment. It is known that the tangent elements induced by a map belong to the $L^2_{\mu}$ closure of smooth gradients. We show that at the opposite, the elements that have barycenter 0 are characterized by a local condition, i.e. as the barycenter-free measures that are concentrated on a family of vector subspaces attached to any point. Our results rely on a decomposition of a measure into $d+1$ components, each allowing optimal plans to split mass in a fixed number of directions. We conclude by giving some links with Preiss tangent measures and illustrating the difference with Alberti and Marchese's decomposability bundle.

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Authors:Zhangxuan Gu, Zhengwen Zeng, Zhenyu Xu, Xingran Zhou, Shuheng Shen, Yunfei Liu, Beitong Zhou, Changhua Meng, Tianyu Xia, Weizhi Chen, Yue Wen, Jingya Dou, Fei Tang, Jinzhen Lin, Yulin Liu, Zhenlin Guo, Yichen Gong, Heng Jia, Changlong Gao, Yuan Guo, Yong Deng, Zhenyu Guo, Liang Chen, Weiqiang Wang

Date:2025-08-14 16:58:07

We present UI-Venus, a native UI agent that takes only screenshots as input based on a multimodal large language model. UI-Venus achieves SOTA performance on both UI grounding and navigation tasks using only several hundred thousand high-quality training samples through reinforcement finetune (RFT) based on Qwen2.5-VL. Specifically, the 7B and 72B variants of UI-Venus obtain 94.1% / 50.8% and 95.3% / 61.9% on the standard grounding benchmarks, i.e., Screenspot-V2 / Pro, surpassing the previous SOTA baselines including open-source GTA1 and closed-source UI-TARS-1.5. To show UI-Venus's summary and planing ability, we also evaluate it on the AndroidWorld, an online UI navigation arena, on which our 7B and 72B variants achieve 49.1% and 65.9% success rate, also beating existing models. To achieve this, we introduce carefully designed reward functions for both UI grounding and navigation tasks and corresponding efficient data cleaning strategies. To further boost navigation performance, we propose Self-Evolving Trajectory History Alignment & Sparse Action Enhancement that refine historical reasoning traces and balances the distribution of sparse but critical actions, leading to more coherent planning and better generalization in complex UI tasks. Our contributions include the publish of SOTA open-source UI agents, comprehensive data cleaning protocols and a novel self-evolving framework for improving navigation performance, which encourage further research and development in the community. Code is available at https://github.com/inclusionAI/UI-Venus.

Accelerating Stochastic Energy System Optimization Models: Temporally Split Benders Decomposition

Authors:Shima Sasanpour, Manuel Wetzel, Karl-Kiên Cao, Hans Christian Gils, Andrés Ramos

Date:2025-08-14 16:15:17

Stochastic programming can be applied to consider uncertainties in energy system optimization models for capacity expansion planning. However, these models become increasingly large and time-consuming to solve, even without considering uncertainties. For two-stage stochastic capacity expansion planning problems, Benders decomposition is often applied to ensure that the problem remains solvable. Since stochastic scenarios can be optimized independently within subproblems, their optimization can be parallelized. However, hourly-resolved capacity expansion planning problems typically have a larger temporal than scenario cardinality. Therefore, we present a temporally split Benders decomposition that further exploits the parallelization potential of stochastic expansion planning problems. A compact reformulation of the storage level constraint into linking variables ensures that long-term storage operation can still be optimized despite the temporal decomposition. We demonstrate this novel approach with model instances of the German power system with up to 87 million rows and columns. Our results show a reduction in computing times of up to 60% and reduced memory requirements. Additional enhancement strategies and the use of distributed memory on high-performance computers further improve the computing time by over 80%.

Generating Compilers for Qubit Mapping and Routing

Authors:Abtin Molavi, Amanda Xu, Ethan Cecchetti, Swamit Tannu, Aws Albarghouthi

Date:2025-08-14 16:07:07

Quantum computers promise to solve important problems faster than classical computers, potentially unlocking breakthroughs in materials science, chemistry, and beyond. Optimizing compilers are key to realizing this potential, as they minimize expensive resource usage and limit error rates. A critical compilation step is qubit mapping and routing (QMR), which finds mappings from circuit qubits to qubits on a target device and plans instruction execution while satisfying the device's connectivity constraints. The challenge is that the landscape of quantum architectures is incredibly diverse and fast-evolving. Given this diversity, hundreds of papers have addressed the QMR problem for different qubit hardware, connectivity constraints, and quantum error correction schemes. We present an approach for automatically generating qubit mapping and routing compilers for arbitrary quantum architectures. Though each QMR problem is different, we identify a common core structure-device state machine-that we use to formulate an abstract QMR problem. Our formulation naturally leads to a domain-specific language, Marol, for specifying QMR problems-for example, the well-studied NISQ mapping and routing problem requires only 12 lines of Marol. We demonstrate that QMR problems, defined in Marol, can be solved with a powerful parametric solver that can be instantiated for any Marol program. We evaluate our approach through case studies of important QMR problems from prior and recent work, covering noisy and fault-tolerant quantum architectures on all major hardware platforms. Our thorough evaluation shows that generated compilers are competitive with handwritten, specialized compilers in terms of runtime and solution quality. We envision that our approach will simplify development of future quantum compilers as new quantum architectures continue to emerge.

FROGENT: An End-to-End Full-process Drug Design Agent

Authors:Qihua Pan, Dong Xu, Jenna Xinyi Yao, Lijia Ma, Zexuan Zhu, Junkai Ji

Date:2025-08-14 15:45:53

Powerful AI tools for drug discovery reside in isolated web apps, desktop programs, and code libraries. Such fragmentation forces scientists to manage incompatible interfaces and specialized scripts, which can be a cumbersome and repetitive process. To address this issue, a Full-pROcess druG dEsign ageNT, named FROGENT, has been proposed. Specifically, FROGENT utilizes a Large Language Model and the Model Context Protocol to integrate multiple dynamic biochemical databases, extensible tool libraries, and task-specific AI models. This agentic framework allows FROGENT to execute complicated drug discovery workflows dynamically, including component tasks such as target identification, molecule generation and retrosynthetic planning. FROGENT has been evaluated on eight benchmarks that cover various aspects of drug discovery, such as knowledge retrieval, property prediction, virtual screening, mechanistic analysis, molecular design, and synthesis. It was compared against six increasingly advanced ReAct-style agents that support code execution and literature searches. Empirical results demonstrated that FROGENT triples the best baseline performance in hit-finding and doubles it in interaction profiling, significantly outperforming both the open-source model Qwen3-32B and the commercial model GPT-4o. In addition, real-world cases have been utilized to validate the practicability and generalization of FROGENT. This development suggests that streamlining the agentic drug discovery pipeline can significantly enhance researcher productivity.

Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

Authors:Sangwoo Jeon, Juchul Shin, Gyeong-Tae Kim, YeonJe Cho, Seongwoo Kim

Date:2025-08-14 15:30:28

Generalized planning using deep reinforcement learning (RL) combined with graph neural networks (GNNs) has shown promising results in various symbolic planning domains described by PDDL. However, existing approaches typically represent planning states as fully connected graphs, leading to a combinatorial explosion in edge information and substantial sparsity as problem scales grow, especially evident in large grid-based environments. This dense representation results in diluted node-level information, exponentially increases memory requirements, and ultimately makes learning infeasible for larger-scale problems. To address these challenges, we propose a sparse, goal-aware GNN representation that selectively encodes relevant local relationships and explicitly integrates spatial features related to the goal. We validate our approach by designing novel drone mission scenarios based on PDDL within a grid world, effectively simulating realistic mission execution environments. Our experimental results demonstrate that our method scales effectively to larger grid sizes previously infeasible with dense graph representations and substantially improves policy generalization and success rates. Our findings provide a practical foundation for addressing realistic, large-scale generalized planning tasks.

Traffic Intersection Simulation Using Turning Movement Count Data in SUMO: A Case Study of Toronto Intersections

Authors:Harshit Maheshwari, Li Yang, Richard W Pazzi

Date:2025-08-14 15:12:50

Urban traffic simulation is vital in planning, modeling, and analyzing road networks. However, the realism of a simulation depends extensively on the quality of input data. This paper presents an intersection traffic simulation tool that leverages real-world vehicle turning movement count (TMC) data from the City of Toronto to model traffic in an urban environment at an individual or multiple intersections using Simulation of Urban MObility (SUMO). The simulation performed in this research focuses specifically on intersection-level traffic generation without creating full vehicle routes through the network. This also helps keep the network's complexity to a minimum. The simulated traffic is evaluated against actual data to show that the simulation closely reproduces real intersection flows. This validates that the real data can drive practical simulations, and these scenarios can replace synthetic or random generated data, which is prominently used in developing new traffic-related methodologies. This is the first tool to integrate TMC data from Toronto into SUMO via an easy-to-use Graphical User Interface. This work contributes to the research and traffic planning community on data-driven traffic simulation. It provides transportation engineers with a framework to evaluate intersection design and traffic signal optimization strategies using readily available aggregate traffic data.

ChatENV: An Interactive Vision-Language Model for Sensor-Guided Environmental Monitoring and Scenario Simulation

Authors:Hosam Elgendy, Ahmed Sharshar, Ahmed Aboeitta, Mohsen Guizani

Date:2025-08-14 13:33:44

Understanding environmental changes from aerial imagery is vital for climate resilience, urban planning, and ecosystem monitoring. Yet, current vision language models (VLMs) overlook causal signals from environmental sensors, rely on single-source captions prone to stylistic bias, and lack interactive scenario-based reasoning. We present ChatENV, the first interactive VLM that jointly reasons over satellite image pairs and real-world sensor data. Our framework: (i) creates a 177k-image dataset forming 152k temporal pairs across 62 land-use classes in 197 countries with rich sensor metadata (e.g., temperature, PM10, CO); (ii) annotates data using GPT- 4o and Gemini 2.0 for stylistic and semantic diversity; and (iii) fine-tunes Qwen-2.5-VL using efficient Low-Rank Adaptation (LoRA) adapters for chat purposes. ChatENV achieves strong performance in temporal and "what-if" reasoning (e.g., BERT-F1 0.903) and rivals or outperforms state-of-the-art temporal models, while supporting interactive scenario-based analysis. This positions ChatENV as a powerful tool for grounded, sensor-aware environmental monitoring.

FIND-Net -- Fourier-Integrated Network with Dictionary Kernels for Metal Artifact Reduction

Authors:Farid Tasharofi, Fuxin Fan, Melika Qahqaie, Mareike Thies, Andreas Maier

Date:2025-08-14 13:13:54

Metal artifacts, caused by high-density metallic implants in computed tomography (CT) imaging, severely degrade image quality, complicating diagnosis and treatment planning. While existing deep learning algorithms have achieved notable success in Metal Artifact Reduction (MAR), they often struggle to suppress artifacts while preserving structural details. To address this challenge, we propose FIND-Net (Fourier-Integrated Network with Dictionary Kernels), a novel MAR framework that integrates frequency and spatial domain processing to achieve superior artifact suppression and structural preservation. FIND-Net incorporates Fast Fourier Convolution (FFC) layers and trainable Gaussian filtering, treating MAR as a hybrid task operating in both spatial and frequency domains. This approach enhances global contextual understanding and frequency selectivity, effectively reducing artifacts while maintaining anatomical structures. Experiments on synthetic datasets show that FIND-Net achieves statistically significant improvements over state-of-the-art MAR methods, with a 3.07% MAE reduction, 0.18% SSIM increase, and 0.90% PSNR improvement, confirming robustness across varying artifact complexities. Furthermore, evaluations on real-world clinical CT scans confirm FIND-Net's ability to minimize modifications to clean anatomical regions while effectively suppressing metal-induced distortions. These findings highlight FIND-Net's potential for advancing MAR performance, offering superior structural preservation and improved clinical applicability. Code is available at https://github.com/Farid-Tasharofi/FIND-Net

A Unified Framework from Boltzmann Transport to Proton Treatment Planning

Authors:Andreas E. Kyprianou, Aaron Pim, Tristan Pryer

Date:2025-08-14 12:37:25

This work develops a rigorous mathematical formulation of proton transport by integrating both deterministic and stochastic perspectives. The deterministic framework is based on the Boltzmann-Fokker-Planck equation, formulated as an operator equation in a suitable functional setting. The stochastic approach models proton evolution via a track-length parameterised diffusion process, whose infinitesimal generator provides an alternative description of transport. A key result is the duality between the stochastic and deterministic formulations, established through the adjoint relationship between the transport operator and the stochastic generator. We prove that the resolvent of the stochastic process corresponds to the Green's function of the deterministic equation, providing a natural link between fluence-based and particle-based transport descriptions. The theory is applied to dose computation, where we show that the classical relation: dose = (fluence * mass stopping power) arises consistently in both approaches. Building on this foundation, we formulate a hybrid optimisation framework for treatment planning, in which dose is computed using a stochastic model while optimisation proceeds via adjoint-based PDE methods. We prove existence and differentiability of the objective functional and derive the first-order optimality system. This framework bridges stochastic simulation with deterministic control theory and provides a foundation for future work in constrained, adaptive and uncertainty-aware optimisation in proton therapy.