planning - 2025-05-22

Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning

Authors:Pedro P. Santos, Alberto Sardinha, Francisco S. Melo
Date:2025-05-21 17:32:23

In this work, we contribute the first approach to solve infinite-horizon discounted general-utility Markov decision processes (GUMDPs) in the single-trial regime, i.e., when the agent's performance is evaluated based on a single trajectory. First, we provide some fundamental results regarding policy optimization in the single-trial regime, investigating which class of policies suffices for optimality, casting our problem as a particular MDP that is equivalent to our original problem, as well as studying the computational hardness of policy optimization in the single-trial regime. Second, we show how we can leverage online planning techniques, in particular a Monte-Carlo tree search algorithm, to solve GUMDPs in the single-trial regime. Third, we provide experimental results showcasing the superior performance of our approach in comparison to relevant baselines.

Improving planning and MBRL with temporally-extended actions

Authors:Palash Chatterjee, Roni Khardon
Date:2025-05-21 16:59:32

Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems and reduced performance. Previous work in model free reinforcement learning has partially addressed this issue using action repeats where a policy is learned to determine a discrete action duration. Instead we propose to control the continuous decision timescale directly by using temporally-extended actions and letting the planner treat the duration of the action as an additional optimization variable along with the standard action variables. This additional structure has multiple advantages. It speeds up simulation time of trajectories and, importantly, it allows for deep horizon search in terms of primitive actions while using a shallow search depth in the planner. In addition, in the model based reinforcement learning (MBRL) setting, it reduces compounding errors from model learning and improves training time for models. We show that this idea is effective and that the range for action durations can be automatically selected using a multi-armed bandit formulation and integrated into the MBRL framework. An extensive experimental evaluation both in planning and in MBRL, shows that our approach yields faster planning, better solutions, and that it enables solutions to problems that are not solved in the standard formulation.

Distributionally Robust Planning of Hydrogen-Electrical Microgrids for Sea Islands

Authors:Yuchen Dong, Zhengsong Lu, Xiaoyu Cao, Zhengwen He, Tanveer Hossain Bhuiyan, Bo Zeng
Date:2025-05-21 16:39:36

This paper presents a distributionally robust planning method for hydrogen-electrical microgrids over islands, where the cross-island energy exchange is supported by a maritime hydrogen transport network. This planning problem is complicated due to heterogeneous off-shore wind-driven uncertainties (i.e., renewable power, transport availability, demand fluctuations, and grid faulting), a subset of which exhibit endogenous uncertainty, as they can be affected by proactive measures (e.g., grid hardening) or infrastructure investment. To capture these features, a two-stage distributionally robust optimization (DRO) model is developed considering decision-dependent uncertainty (DDU), which encompasses variation of the underlying distributional ambiguity due to the change of the first stage decisions. Notably, the complete recourse property is missing, which is often neglected in existing DRO studies. Nevertheless, different from the case for land-based microgrids, this issue is critical and fundamental for sea island systems due to their particular physical and logistical requirements. To address these issues, we develop a C&CG algorithm that is customized with strong cutting planes to handle DRO with a varying DDU ambiguity set and feasibility requirements. Numerical results demonstrate the cost-effectiveness and resilience of the proposed planning framework, along with the nontrivial improvements of the algorithm in both solution accuracy and computational efficiency.

UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning

Authors:Xiangyu Wang, Donglin Yang, Yue Liao, Wenhao Zheng, wenjun wu, Bin Dai, Hongsheng Li, Si Liu
Date:2025-05-21 16:31:28

Unmanned Aerial Vehicles (UAVs) are evolving into language-interactive platforms, enabling more intuitive forms of human-drone interaction. While prior works have primarily focused on high-level planning and long-horizon navigation, we shift attention to language-guided fine-grained trajectory control, where UAVs execute short-range, reactive flight behaviors in response to language instructions. We formalize this problem as the Flying-on-a-Word (Flow) task and introduce UAV imitation learning as an effective approach. In this framework, UAVs learn fine-grained control policies by mimicking expert pilot trajectories paired with atomic language instructions. To support this paradigm, we present UAV-Flow, the first real-world benchmark for language-conditioned, fine-grained UAV control. It includes a task formulation, a large-scale dataset collected in diverse environments, a deployable control framework, and a simulation suite for systematic evaluation. Our design enables UAVs to closely imitate the precise, expert-level flight trajectories of human pilots and supports direct deployment without sim-to-real gap. We conduct extensive experiments on UAV-Flow, benchmarking VLN and VLA paradigms. Results show that VLA models are superior to VLN baselines and highlight the critical role of spatial grounding in the fine-grained Flow setting.

Path Planning Algorithm Comparison Analysis for Wireless AUVs Energy Sharing System

Authors:Zhengji Feng, Hengxiang Chen, Liqun Chen, Heyan Li, Xiaolin Mou
Date:2025-05-21 16:02:13

Autonomous underwater vehicles (AUVs) are increasingly used in marine research, military applications, and undersea exploration. However, their operational range is significantly affected by battery performance. In this paper, a framework for a wireless energy sharing system among AUVs is proposed, enabling rapid energy replenishment. Path planning plays a crucial role in the energy-sharing process and autonomous navigation, as it must generate feasible trajectories toward designated goals. This article focuses on efficient obstacle avoidance in complex underwater environments, including irregularly shaped obstacles and narrow passages. The proposed method combines Rapidly-exploring Random Trees Star (RRT*) with Particle Swarm Optimization (PSO) to improve path planning efficiency. Comparative analysis of the two algorithms is presented through simulation results in both random and irregular obstacle environments. Index Terms: Wireless charging, autonomous underwater vehicles (AUVs), path planning, irregular obstacles, narrow passages, RRT*, particle swarm optimization (PSO).

SwarmDiff: Swarm Robotic Trajectory Planning in Cluttered Environments via Diffusion Transformer

Authors:Kang Ding, Chunxuan Jiao, Yunze Hu, Kangjie Zhou, Pengying Wu, Yao Mu, Chang Liu
Date:2025-05-21 15:56:55

Swarm robotic trajectory planning faces challenges in computational efficiency, scalability, and safety, particularly in complex, obstacle-dense environments. To address these issues, we propose SwarmDiff, a hierarchical and scalable generative framework for swarm robots. We model the swarm's macroscopic state using Probability Density Functions (PDFs) and leverage conditional diffusion models to generate risk-aware macroscopic trajectory distributions, which then guide the generation of individual robot trajectories at the microscopic level. To ensure a balance between the swarm's optimal transportation and risk awareness, we integrate Wasserstein metrics and Conditional Value at Risk (CVaR). Additionally, we introduce a Diffusion Transformer (DiT) to improve sampling efficiency and generation quality by capturing long-range dependencies. Extensive simulations and real-world experiments demonstrate that SwarmDiff outperforms existing methods in computational efficiency, trajectory validity, and scalability, making it a reliable solution for swarm robotic trajectory planning.

Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

Authors:Kaiyuan Chen, Shuangyu Xie, Zehan Ma, Ken Goldberg
Date:2025-05-21 13:42:52

Vision-Language Models (VLMs) acquire real-world knowledge and general reasoning ability through Internet-scale image-text corpora. They can augment robotic systems with scene understanding and task planning, and assist visuomotor policies that are trained on robot trajectory data. We explore the reverse paradigm - using rich, real, multi-modal robot trajectory data to enhance and evaluate VLMs. In this paper, we present Robo2VLM, a Visual Question Answering (VQA) dataset generation framework for VLMs. Given a human tele-operated robot trajectory, Robo2VLM derives ground-truth from non-visual and non-descriptive sensory modalities, such as end-effector pose, gripper aperture, and force sensing. Based on these modalities, it segments the robot trajectory into a sequence of manipulation phases. At each phase, Robo2VLM uses scene and interaction understanding to identify 3D properties of the robot, task goal, and the target object. The properties are used to generate representative VQA queries - images with textural multiple-choice questions - based on spatial, goal-conditioned, and interaction reasoning question templates. We curate Robo2VLM-1, a large-scale in-the-wild dataset with 684,710 questions covering 463 distinct scenes and 3,396 robotic manipulation tasks from 176k real robot trajectories. Results suggest that Robo2VLM-1 can benchmark and improve VLM capabilities in spatial and interaction reasoning.

Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning

Authors:Mahesh Godavarti
Date:2025-05-21 13:27:14

We introduce a new algebraic structure for multi-dimensional compositional embeddings, built on directional non-commutative monoidal operators. The core contribution of this work is this novel framework, which exhibits appealing theoretical properties (associativity along each dimension and an interchange law ensuring global consistency) while remaining compatible with modern machine learning architectures. Our construction defines a distinct composition operator circ_i for each axis i, ensuring associative combination along each axis without imposing global commutativity. Importantly, all axis-specific operators commute with one another, enforcing a global interchange law that enables consistent crossaxis compositions. This is, to our knowledge, the first approach that provides a common foundation that generalizes classical sequence-modeling paradigms (e.g., structured state-space models (SSMs) and transformer self-attention) to a unified multi-dimensional framework. For example, specific one-dimensional instances of our framework can recover the familiar affine transformation algebra, vanilla self-attention, and the SSM-style recurrence. The higher-dimensional generalizations naturally support recursive, structure-aware operations in embedding spaces. We outline several potential applications unlocked by this structure-including structured positional encodings in Transformers, directional image embeddings, and symbolic modeling of sequences or grids-indicating that it could inform future deep learning model designs. We formally establish the algebraic properties of our framework and discuss efficient implementations. Finally, as our focus is theoretical, we include no experiments here and defer empirical validation to future work, which we plan to undertake.

Coloring Between the Lines: Personalization in the Null Space of Planning Constraints

Authors:Tom Silver, Rajat Kumar Jenamani, Ziang Liu, Ben Dodson, Tapomayukh Bhattacharjee
Date:2025-05-21 13:24:05

Generalist robots must personalize in-the-wild to meet the diverse needs and preferences of long-term users. How can we enable flexible personalization without sacrificing safety or competency? This paper proposes Coloring Between the Lines (CBTL), a method for personalization that exploits the null space of constraint satisfaction problems (CSPs) used in robot planning. CBTL begins with a CSP generator that ensures safe and competent behavior, then incrementally personalizes behavior by learning parameterized constraints from online interaction. By quantifying uncertainty and leveraging the compositionality of planning constraints, CBTL achieves sample-efficient adaptation without environment resets. We evaluate CBTL in (1) three diverse simulation environments; (2) a web-based user study; and (3) a real-robot assisted feeding system, finding that CBTL consistently achieves more effective personalization with fewer interactions than baselines. Our results demonstrate that CBTL provides a unified and practical approach for continual, flexible, active, and safe robot personalization. Website: https://emprise.cs.cornell.edu/cbtl/

Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models

Authors:Jiaying Wu, Fanxiao Li, Min-Yen Kan, Bryan Hooi
Date:2025-05-21 13:14:32

The real-world impact of misinformation stems from the underlying misleading narratives that creators seek to convey. As such, interpreting misleading creator intent is essential for multimodal misinformation detection (MMD) systems aimed at effective information governance. In this paper, we introduce an automated framework that simulates real-world multimodal news creation by explicitly modeling creator intent through two components: the desired influence and the execution plan. Using this framework, we construct DeceptionDecoded, a large-scale benchmark comprising 12,000 image-caption pairs aligned with trustworthy reference articles. The dataset captures both misleading and non-misleading intents and spans manipulations across visual and textual modalities. We conduct a comprehensive evaluation of 14 state-of-the-art vision-language models (VLMs) on three intent-centric tasks: (1) misleading intent detection, (2) misleading source attribution, and (3) creator desire inference. Despite recent advances, we observe that current VLMs fall short in recognizing misleading intent, often relying on spurious cues such as superficial cross-modal consistency, stylistic signals, and heuristic authenticity hints. Our findings highlight the pressing need for intent-aware modeling in MMD and open new directions for developing systems capable of deeper reasoning about multimodal misinformation.

LET-modifying joint optimization for mixed-modality photon-proton treatment planning

Authors:Lisa Seckler, Amit Ben Antony Bennan, Niklas Wahl
Date:2025-05-21 13:01:22

As depth increases, linear energy transfer (LET) rises toward the distal edge of the Bragg peak, boosting the radiobiological effectiveness (RBE). To manage the biological variation and limit normal-tissue damage, LET-modifying objective functions on, e.g., dose-weighted LET or dirty dose and/or usage of variable RBE models were introduced. Because shaping LET by proton irradiation alone has its limits, this work proposes to jointly optimize mixed-modality proton-photon treatments based on directly LET-modifying objective functions. The investigated objective functions rely on either dose-weighted LET or dirty dose concepts. To formulate a consistent combined optimization problem, the contribution of secondary electron LET in photon treatments is considered (and discussed) as well. Combined dose/LET calculation and optimization are realized in the open source toolkit matRad. Phantom plans as well as a patient plans are optimized for analysis on the method, combining five proton fractions with 25 photon fractions. Dose-optimized combined plans are used as a reference. The reference plan shows that protons are, in general, dosimetrically superior and thus preferred, with photons aiding in achieving conformity. The introduction of LET modified objectives locally modifies the proton contribution in the targeted regions of interest. Especially at the distal edge, the photon contribution increases to move high-LET/dirty dose out of the OARs. Dirty dose objectives seem to allow a more comprehensive steering of the high-LET regions compared to LETxDose. Incorporating LET-based objectives into a jointly optimized proton-photon system allows for improved dose conformity and reduced high-LET exposure in critical regions in proximity to the distal proton edge. This approach enables the utilization of modality-specific strengths and can contribute to safer, more effective treatment plans.

Impact of Wind Generation on Risk-based Security Assessment of Power System

Authors:Umair Shahzad
Date:2025-05-21 11:24:28

The electric power system is one of the largest and most intricate infrastructures. Therefore, it is critical to assess and maintain its security. A power system security assessment is indispensable for identifying post-contingency issues, taking corrective measures, and protecting the system from blackouts. This paper examined the impact of wind generation on the risk-based security assessment of a power transmission network in the context of planning. DIgSILENT PowerFactory software was used to conduct the analysis using a combination of the brute force technique and the nonsequential Monte Carlo (MC) simulation method on the IEEE 39-bus transmission test system. Optimal power flow (OPF) was used to quantify security, considering (N-1), (N-2), and (N-3) line outages and an (N-1) bus outage. Moreover, the average cost deviation from the mean optimal system operating cost was proposed as a novel security indicator. The results obtianed accurately depicted the effects of changing wind generation levels on system security in terms of risk. The most and least critical line(s) and bus in the system, for different wind generation levels, were also determined. Moreover, the worst-case wind-generation threshold level using two different cost functions for wind was identified.

A Risk-Based Probabilistic Transient Stability Approach for Ranking of Circuit Breakers in a Power System

Authors:Umair Shahzad
Date:2025-05-21 11:08:44

Power systems are getting more complex than ever and are consequently operating close to their limit of stability. Moreover, with the increasing demand of renewable wind generation, and the requirement to maintain a secure power system, the importance of transient stability cannot be overestimated. Current deterministic industry practices of transient stability assessment ignore the probability of variables involved. With increasing system uncertainties and widespread electricity market deregulation, there is a strong inevitability to incorporate probabilistic transient stability analysis. Circuit breakers play a critical role in fault clearing and consequently in determining the system transient stability. It is important that they undergo timely and appropriate maintenance procedures based on some criterion. Considering the need of incorporating risk in modern power systems, this paper proposes a risk-based probabilistic transient stability approach for ranking of circuit breakers in a power system. A novel priority index was proposed to rank the circuit breakers based on the system transient stability risk. DIgSILENT PowerFactory software was used to conduct the required simulations on IEEE 14 bus system. The proposed risk-based framework was deemed to be efficient in identification of the circuit breakers based on their priority rank index which can aid in power system planning process.

Reconsider the Template Mesh in Deep Learning-based Mesh Reconstruction

Authors:Fengting Zhang, Boxu Liang, Qinghao Liu, Min Liu, Xiang Chen, Yaonan Wang
Date:2025-05-21 09:10:31

Mesh reconstruction is a cornerstone process across various applications, including in-silico trials, digital twins, surgical planning, and navigation. Recent advancements in deep learning have notably enhanced mesh reconstruction speeds. Yet, traditional methods predominantly rely on deforming a standardised template mesh for individual subjects, which overlooks the unique anatomical variations between them, and may compromise the fidelity of the reconstructions. In this paper, we propose an adaptive-template-based mesh reconstruction network (ATMRN), which generates adaptive templates from the given images for the subsequent deformation, moving beyond the constraints of a singular, fixed template. Our approach, validated on cortical magnetic resonance (MR) images from the OASIS dataset, sets a new benchmark in voxel-to-cortex mesh reconstruction, achieving an average symmetric surface distance of 0.267mm across four cortical structures. Our proposed method is generic and can be easily transferred to other image modalities and anatomical structures.

Zero-Shot Gaze-based Volumetric Medical Image Segmentation

Authors:Tatyana Shmykova, Leila Khaertdinova, Ilya Pershin
Date:2025-05-21 08:34:13

Accurate segmentation of anatomical structures in volumetric medical images is crucial for clinical applications, including disease monitoring and cancer treatment planning. Contemporary interactive segmentation models, such as Segment Anything Model 2 (SAM-2) and its medical variant (MedSAM-2), rely on manually provided prompts like bounding boxes and mouse clicks. In this study, we introduce eye gaze as a novel informational modality for interactive segmentation, marking the application of eye-tracking for 3D medical image segmentation. We evaluate the performance of using gaze-based prompts with SAM-2 and MedSAM-2 using both synthetic and real gaze data. Compared to bounding boxes, gaze-based prompts offer a time-efficient interaction approach with slightly lower segmentation quality. Our findings highlight the potential of using gaze as a complementary input modality for interactive 3D medical image segmentation.

EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy

Authors:Chi Kit Ng, Long Bai, Guankun Wang, Yupeng Wang, Huxin Gao, Kun Yuan, Chenhan Jin, Tieyong Zeng, Hongliang Ren
Date:2025-05-21 07:35:00

In endoscopic procedures, autonomous tracking of abnormal regions and following circumferential cutting markers can significantly reduce the cognitive burden on endoscopists. However, conventional model-based pipelines are fragile for each component (e.g., detection, motion planning) requires manual tuning and struggles to incorporate high-level endoscopic intent, leading to poor generalization across diverse scenes. Vision-Language-Action (VLA) models, which integrate visual perception, language grounding, and motion planning within an end-to-end framework, offer a promising alternative by semantically adapting to surgeon prompts without manual recalibration. Despite their potential, applying VLA models to robotic endoscopy presents unique challenges due to the complex and dynamic anatomical environments of the gastrointestinal (GI) tract. To address this, we introduce EndoVLA, designed specifically for continuum robots in GI interventions. Given endoscopic images and surgeon-issued tracking prompts, EndoVLA performs three core tasks: (1) polyp tracking, (2) delineation and following of abnormal mucosal regions, and (3) adherence to circular markers during circumferential cutting. To tackle data scarcity and domain shifts, we propose a dual-phase strategy comprising supervised fine-tuning on our EndoVLA-Motion dataset and reinforcement fine-tuning with task-aware rewards. Our approach significantly improves tracking performance in endoscopy and enables zero-shot generalization in diverse scenes and complex sequential tasks.

A Stochastic Programming Model for Anticipative Planning of Integrated Electricity and Gas Systems with Bidirectional Energy Flows under Fuel and CO2 Price Uncertainty

Authors:Giovanni Micheli, Maria Teresa Vespucci, Alessia Cortazzi, Cinzia Puglisi
Date:2025-05-21 06:33:50

A two-stage multi-period mixed-integer linear stochastic programming model is proposed to assist qualified operators in long-term generation and transmission expansion planning of electricity and gas systems to meet policy objectives. The first-stage decisions concern investments in new plants, new connections in the electricity and gas sectors, and the decommissioning of existing thermal power plants; the second-stage variables represent operational decisions, with uncertainty about future fuel and CO2 prices represented by scenarios. The main features of the model are: (i) the bidirectional conversion between electricity and gas enabled by Power-to-Gas and thermal power plants, (ii) a detailed representation of short-term operation, crucial for addressing challenges associated with integrating large shares of renewables in the energy mix, and (iii) an integrated planning framework to evaluate the operation of flexibility resources, their ability to manage non-programmable generation, and their economic viability. Given the computational complexity of the proposed model, in this paper we also implement a solution algorithm based on Benders decomposition to compute near-optimal solutions. A case study on the decarbonisation of the Italian integrated energy system demonstrates the effectiveness of the model. The numerical results show: (i) the importance of including a detailed system representation for obtaining reliable results, and (ii) the need to consider price uncertainty to design adequate systems and reduce overall costs.

Cascaded Diffusion Models for Neural Motion Planning

Authors:Mohit Sharma, Adam Fishman, Vikash Kumar, Chris Paxton, Oliver Kroemer
Date:2025-05-21 06:21:50

Robots in the real world need to perceive and move to goals in complex environments without collisions. Avoiding collisions is especially difficult when relying on sensor perception and when goals are among clutter. Diffusion policies and other generative models have shown strong performance in solving local planning problems, but often struggle at avoiding all of the subtle constraint violations that characterize truly challenging global motion planning problems. In this work, we propose an approach for learning global motion planning using diffusion policies, allowing the robot to generate full trajectories through complex scenes and reasoning about multiple obstacles along the path. Our approach uses cascaded hierarchical models which unify global prediction and local refinement together with online plan repair to ensure the trajectories are collision free. Our method outperforms (by ~5%) a wide variety of baselines on challenging tasks in multiple domains including navigation and manipulation.

From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation

Authors:Quanwei Liu, Tao Huang, Yanni Dong, Jiaqi Yang, Wei Xiang
Date:2025-05-21 06:02:57

Remote sensing images (RSIs) capture both natural and human-induced changes on the Earth's surface, serving as essential data for environmental monitoring, urban planning, and resource management. Semantic segmentation (SS) of RSIs enables the fine-grained interpretation of surface features, making it a critical task in remote sensing analysis. With the increasing diversity and volume of RSIs collected by sensors on various platforms, traditional processing methods struggle to maintain efficiency and accuracy. In response, deep learning (DL) has emerged as a transformative approach, enabling substantial advances in remote sensing image semantic segmentation (RSISS) by automating feature extraction and improving segmentation accuracy across diverse modalities. This paper revisits the evolution of DL-based RSISS by categorizing existing approaches into four stages: the early pixel-based methods, the prevailing patch-based and tile-based techniques, and the emerging image-based strategies enabled by foundation models. We analyze these developments from the perspective of feature extraction and learning strategies, revealing the field's progression from pixel-level to tile-level and from unimodal to multimodal segmentation. Furthermore, we conduct a comprehensive evaluation of nearly 40 advanced techniques on a unified dataset to quantitatively characterize their performance and applicability. This review offers a holistic view of DL-based SS for RS, highlighting key advancements, comparative insights, and open challenges to guide future research.

iPad: Iterative Proposal-centric End-to-End Autonomous Driving

Authors:Ke Guo, Haochen Liu, Xiaojun Wu, Jia Pan, Chen Lv
Date:2025-05-21 05:05:38

End-to-end (E2E) autonomous driving systems offer a promising alternative to traditional modular pipelines by reducing information loss and error accumulation, with significant potential to enhance both mobility and safety. However, most existing E2E approaches directly generate plans based on dense bird's-eye view (BEV) grid features, leading to inefficiency and limited planning awareness. To address these limitations, we propose iterative Proposal-centric autonomous driving (iPad), a novel framework that places proposals - a set of candidate future plans - at the center of feature extraction and auxiliary tasks. Central to iPad is ProFormer, a BEV encoder that iteratively refines proposals and their associated features through proposal-anchored attention, effectively fusing multi-view image data. Additionally, we introduce two lightweight, proposal-centric auxiliary tasks - mapping and prediction - that improve planning quality with minimal computational overhead. Extensive experiments on the NAVSIM and CARLA Bench2Drive benchmarks demonstrate that iPad achieves state-of-the-art performance while being significantly more efficient than prior leading methods.

Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

Authors:Alessandro dos Santos Ferreira, Ana Paula Marques Ramos, José Marcato Junior, Wesley Nunes Gonçalves
Date:2025-05-21 03:57:10

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.

Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories

Authors:Nanxu Gong, Sixun Dong, Haoyue Bai, Xinyuan Wang, Wangyang Ying, Yanjie Fu
Date:2025-05-21 03:49:24

As a widely-used and practical tool, feature engineering transforms raw data into discriminative features to advance AI model performance. However, existing methods usually apply feature selection and generation separately, failing to strive a balance between reducing redundancy and adding meaningful dimensions. To fill this gap, we propose an agentic feature augmentation concept, where the unification of feature generation and selection is modeled as agentic teaming and planning. Specifically, we develop a Multi-Agent System with Long and Short-Term Memory (MAGS), comprising a selector agent to eliminate redundant features, a generator agent to produce informative new dimensions, and a router agent that strategically coordinates their actions. We leverage in-context learning with short-term memory for immediate feedback refinement and long-term memory for globally optimal guidance. Additionally, we employ offline Proximal Policy Optimization (PPO) reinforcement fine-tuning to train the router agent for effective decision-making to navigate a vast discrete feature space. Extensive experiments demonstrate that this unified agentic framework consistently achieves superior task performance by intelligently orchestrating feature selection and generation.

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

Authors:Suhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, Saeed Abdullah
Date:2025-05-21 03:32:46

Can small language models with 0.5B to 5B parameters meaningfully engage in trauma-informed, empathetic dialogue for individuals with PTSD? We address this question by introducing TIDE, a dataset of 10,000 two-turn dialogues spanning 500 diverse PTSD client personas and grounded in a three-factor empathy model: emotion recognition, distress normalization, and supportive reflection. All scenarios and reference responses were reviewed for realism and trauma sensitivity by a clinical psychologist specializing in PTSD. We evaluate eight small language models before and after fine-tuning, comparing their outputs to a frontier model (Claude Sonnet 3.5). Our IRB-approved human evaluation and automatic metrics show that fine-tuning generally improves perceived empathy, but gains are highly scenario- and user-dependent, with smaller models facing an empathy ceiling. Demographic analysis shows older adults value distress validation and graduate-educated users prefer nuanced replies, while gender effects are minimal. We highlight the limitations of automatic metrics and the need for context- and user-aware system design. Our findings, along with the planned release of TIDE, provide a foundation for building safe, resource-efficient, and ethically sound empathetic AI to supplement, not replace, clinical mental health care.

Histo-Planner: A Real-time Local Planner for MAVs Teleoperation based on Histogram of Obstacle Distribution

Authors:Ze Wang, Zhenyu Gao, Jingang Qu, Pascal Morin
Date:2025-05-21 02:53:12

This paper concerns real-time obstacle avoidance for micro aerial vehicles (MAVs). Motivated by teleoperation applications in cluttered environments with limited computational power, we propose a local planner that does not require the knowledge or construction of a global map of the obstacles. The proposed solution consists of a real-time trajectory planning algorithm that relies on the histogram of obstacle distribution and a planner manager that triggers different planning modes depending on obstacles location around the MAV. The proposed solution is validated, for a teleoperation application, with both simulations and indoor experiments. Benchmark comparisons based on a designed simulation platform are also provided.

Sensitivity of the Hyper-Kamiokande experiment to neutrino oscillation parameters using acceleration neutrinos

Authors:Hyper-Kamiokande Collaboration
Date:2025-05-21 01:56:34

This paper describes the analysis to estimate the sensitivity of the Hyper-Kamiokande experiment to long-baseline neutrino oscillation parameters using accelerator (anti)neutrinos. Results are presented for the CPV discovery sensitivity and precision measurements of the oscillation parameters $\delta_{CP}$, $\sin^2\theta_{23}$, $\Delta m^2_{32}$ and $\sin^2\theta_{13}$. With the assumed Hyper-Kamiokande running plan, a $5\sigma$ CPV discovery is possible in less than three years in the case of maximal CPV and known MO.In the absence of external constraints on the MO, considering the MO sensitivity of the Hyper-Kamiokande measurement using atmospheric neutrinos, the time for a CPV discovery could be estimated to be around six years. Using the nominal final exposure of $27 \times 10^{21}$ protons on target, corresponding to 10 years, with a ratio of 1:3 in neutrino to antineutrino beam mode, we expect to select approximately 10000 charged current, quasi-elastic-like, muon neutrino events, and a similar number of muon anti-neutrino events. In the electron (anti)neutrino appearance channels, we expect approximately 2000 charged current, quasi-elastic-like electron neutrino events and 800 electron antineutrino events. These larges event samples will allow Hyper-Kamiokande to exclude CP conservation at the $5\sigma$significance level for over 60% of the possible true values of $\delta_{CP}$.

Shape-Adaptive Planning and Control for a Deformable Quadrotor

Authors:Yuze Wu, Zhichao Han, Xuankang Wu, Yuan Zhou, Junjie Wang, Zheng Fang, Fei Gao
Date:2025-05-21 01:29:29

Drones have become essential in various applications, but conventional quadrotors face limitations in confined spaces and complex tasks. Deformable drones, which can adapt their shape in real-time, offer a promising solution to overcome these challenges, while also enhancing maneuverability and enabling novel tasks like object grasping. This paper presents a novel approach to autonomous motion planning and control for deformable quadrotors. We introduce a shape-adaptive trajectory planner that incorporates deformation dynamics into path generation, using a scalable kinodynamic A* search to handle deformation parameters in complex environments. The backend spatio-temporal optimization is capable of generating optimally smooth trajectories that incorporate shape deformation. Additionally, we propose an enhanced control strategy that compensates for external forces and torque disturbances, achieving a 37.3\% reduction in trajectory tracking error compared to our previous work. Our approach is validated through simulations and real-world experiments, demonstrating its effectiveness in narrow-gap traversal and multi-modal deformable tasks.

Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning

Authors:Amine Elhafsi, Daniel Morton, Marco Pavone
Date:2025-05-20 21:55:01

Autonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.

Coordinated motion control of a wire arc additive manufacturing robotic system for multi-directional building parts

Authors:Fernando Coutinho, Nicolas Lizarralde, Fernando Lizarralde
Date:2025-05-20 19:43:19

This work investigates the manufacturing of complex shapes parts with wire arc additive manufacturing (WAAM). In order to guarantee the integrity and quality of each deposited layer that composes the final piece, the deposition process is usually carried out in a flat position. However, for complex geometry parts with non-flat surfaces, this strategy causes unsupported overhangs and staircase effect, which contribute to a poor surface finishing. Generally, the build direction is not constant for every deposited section or layer in complex geometry parts. As a result, there is an additional concern to ensure the build direction is aligned with gravity, thus improving the quality of the final part. This paper proposes an algorithm to control the torch motion with respect to a deposition substrate as well as the torch orientation with respect to an inertial frame. The control scheme is based on task augmentation applied to an extended kinematic chain composed by two robots, which constitutes a coordinated control problem, and allows the deposition trajectory to be planned with respect to the deposition substrate coordinate frame while aligning each layer buildup direction with gravity (or any other direction defined for an inertial frame). Parts with complex geometry aspects have been produced in a WAAM cell composed by two robots (a manipulator with a welding torch and a positioning table holding the workpiece) in order to validate the proposed approach.

Voice to Vision: Enhancing Civic Decision-Making through Co-Designed Data Infrastructure

Authors:Maggie Hughes, Cassandra Overney, Ashima Kamra, Jasmin Tepale, Elizabeth Hamby, Mahmood Jasim, Deb Roy
Date:2025-05-20 19:32:29

Trust and transparency in civic decision-making processes, like neighborhood planning, are eroding as community members frequently report sending feedback "into a void" without understanding how, or whether, their input influences outcomes. To address this gap, we introduce Voice to Vision, a sociotechnical system that bridges community voices and planning outputs through a structured yet flexible data infrastructure and complementary interfaces for both community members and planners. Through a five-month iterative design process with 21 stakeholders and subsequent field evaluation involving 24 participants, we examine how this system facilitates shared understanding across the civic ecosystem. Our findings reveal that while planners value systematic sensemaking tools that find connections across diverse inputs, community members prioritize seeing themselves reflected in the process, discovering patterns within feedback, and observing the rigor behind decisions, while emphasizing the importance of actionable outcomes. We contribute insights into participatory design for civic contexts, a complete sociotechnical system with an interoperable data structure for civic decision-making, and empirical findings that inform how digital platforms can promote shared understanding among elected or appointed officials, planners, and community members by enhancing transparency and legitimacy.

Robust and Efficient AI-Based Attack Recovery in Autonomous Drones

Authors:Diego Ortiz Barbosa, Luis Burbano, Siwei Yang, Zijun Wang, Alvaro A. Cardenas, Cihang Xie, Yinzhi Cao
Date:2025-05-20 18:57:38

We introduce an autonomous attack recovery architecture to add common sense reasoning to plan a recovery action after an attack is detected. We outline use-cases of our architecture using drones, and then discuss how to implement this architecture efficiently and securely in edge devices.