planning - 2025-11-02

Clone Deterministic 3D Worlds with Geometrically-Regularized World Models

Authors:Zaishuo Xia, Yukuan Lu, Xinyi Li, Yifan Xu, Yubei Chen

Date:2025-10-30 17:56:43

A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future of both the embodied agent and its environment. Accurate world models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings. Despite rapid progress, current world models remain brittle and degrade over long horizons. We argue that a central cause is representation quality: exteroceptive inputs (e.g., images) are high-dimensional, and lossy or entangled latents make dynamics learning unnecessarily hard. We therefore ask whether improving representation learning alone can substantially improve world-model performance. In this work, we take a step toward building a truly accurate world model by addressing a fundamental yet open problem: constructing a model that can fully clone and overfit to a deterministic 3D world. We propose Geometrically-Regularized World Models (GRWM), which enforces that consecutive points along a natural sensory trajectory remain close in latent representation space. This approach yields significantly improved latent representations that align closely with the true topology of the environment. GRWM is plug-and-play, requires only minimal architectural modification, scales with trajectory length, and is compatible with diverse latent generative backbones. Across deterministic 3D settings and long-horizon prediction tasks, GRWM significantly increases rollout fidelity and stability. Analyses show that its benefits stem from learning a latent manifold with superior geometric structure. These findings support a clear takeaway: improving representation learning is a direct and useful path to robust world models, delivering reliable long-horizon predictions without enlarging the dynamics module.

Pareto-Optimal Sampling and Resource Allocation for Timely Communication in Shared-Spectrum Low-Altitude Networks

Authors:Bowen Li, Jiping Luo, Themistoklis Charalambous, Nikolaos Pappas

Date:2025-10-30 17:09:09

Guaranteeing stringent data freshness for low-altitude unmanned aerial vehicles (UAVs) in shared spectrum forces a critical trade-off between two operational costs: the UAV's own energy consumption and the occupation of terrestrial channel resources. The core challenge is to satisfy the aerial data freshness while finding a Pareto-optimal balance between these costs. Leveraging predictive channel models and predictive UAV trajectories, we formulate a bi-objective Pareto optimization problem over a long-term planning horizon to jointly optimize the sampling timing for aerial traffic and the power and spectrum allocation for fair coexistence. However, the problem's non-convex, mixed-integer nature renders classical methods incapable of fully characterizing the complete Pareto frontier. Notably, we show monotonicity properties of the frontier, building on which we transform the bi-objective problem into several single-objective problems. We then propose a new graph-based algorithm and prove that it can find the complete set of Pareto optima with low complexity, linear in the horizon and near-quadratic in the resource block (RB) budget. Numerical comparisons show that our approach meets the stringent timeliness requirement and achieves a six-fold reduction in RB utilization or a 6 dB energy saving compared to benchmarks.

Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments

Authors:Xiaoyi He, Danggui Chen, Zhenshuo Zhang, Zimeng Bai

Date:2025-10-30 16:12:01

This paper presents a hierarchical path-planning and control framework that combines a high-level Deep Q-Network (DQN) for discrete sub-goal selection with a low-level Twin Delayed Deep Deterministic Policy Gradient (TD3) controller for continuous actuation. The high-level module selects behaviors and sub-goals; the low-level module executes smooth velocity commands. We design a practical reward shaping scheme (direction, distance, obstacle avoidance, action smoothness, collision penalty, time penalty, and progress), together with a LiDAR-based safety gate that prevents unsafe motions. The system is implemented in ROS + Gazebo (TurtleBot3) and evaluated with PathBench metrics, including success rate, collision rate, path efficiency, and re-planning efficiency, in dynamic and partially observable environments. Experiments show improved success rate and sample efficiency over single-algorithm baselines (DQN or TD3 alone) and rule-based planners, with better generalization to unseen obstacle configurations and reduced abrupt control changes. Code and evaluation scripts are available at the project repository.

Putting a Price on Immobility: Food Deliveries and Pricing Approaches

Authors:Runyu Wang, Haotian Zhong

Date:2025-10-30 16:05:21

Urban food delivery services have become an integral part of daily life, yet their mobility and environmental externalities remain poorly addressed by planners. Most studies neglect whether consumers pay enough to internalize the broader social costs of these services. This study quantifies the value of access to and use of food delivery services in Beijing, China, through two discrete choice experiments. The first measures willingness to accept compensation for giving up access, with a median value of CNY588 (approximately USD80). The second captures willingness to pay for reduced waiting time and improved reliability, showing valuations far exceeding typical delivery fees (e.g., CNY96.6/hour and CNY4.83/min at work). These results suggest a substantial consumer surplus and a clear underpricing problem. These findings highlight the need for urban planning to integrate digital service economies into pricing and mobility frameworks. We propose a quantity-based pricing model that targets delivery speed rather than order volume, addressing the primary source of externalities while maintaining net welfare gains. This approach offers a pragmatic, equity-conscious strategy to curb delivery-related congestion, emissions, and safety risks, especially in dense urban cores.

Fraction-variant VMAT planning for patients with complex gynecological and head-and-neck cancer

Authors:Nathan Torelli, Madalyne Day, Jan Unkelbach

Date:2025-10-30 14:53:19

Background and Purpose: Increasing the number of arcs in volumetric modulated arc therapy (VMAT) allows for better intensity modulation and may improve plan quality. However, this leads to longer delivery times, which may cause patient discomfort and increase intra-fractional motion. In this study, it was investigated whether the delivery of different VMAT plans in different fractions may improve the dosimetric quality and delivery efficiency for the treatment of patients with complex tumor geometries. Materials and Methods: A direct aperture optimization algorithm was developed which allows for the simultaneous optimization of different VMAT plans to be delivered in different fractions, based on their cumulative physical dose. Each VMAT plan is constrained to deliver a uniform dose within the target volume, such that the entire treatment does not alter the fractionation scheme and is robust against inter-fractional setup errors. This approach was evaluated in-silico for ten patients with gynecological and head-and-neck cancer. Results: For all patients, fraction-variant treatments achieved better target coverage and reduced the dose to critical organs-at-risk compared to fraction-invariant treatments that deliver the same plan in every fraction, where the dosimetric benefit was shown to increase with the number of different plans to be delivered. In addition, 1-arc and 2-arc fraction-variant treatments could approximate the dosimetric quality of 3-arc fraction-invariant treatments, while reducing the delivery time from 180 s to 60 s and 120 s, respectively. Conclusions: Fraction-variant VMAT treatments may achieve excellent dosimetric quality for patients with complex tumor geometries, while keeping the delivery time per fraction viable.

RoboOS-NeXT: A Unified Memory-based Framework for Lifelong, Scalable, and Robust Multi-Robot Collaboration

Authors:Huajie Tan, Cheng Chi, Xiansheng Chen, Yuheng Ji, Zhongxia Zhao, Xiaoshuai Hao, Yaoxu Lyu, Mingyu Cao, Junkai Zhao, Huaihai Lyu, Enshen Zhou, Ning Chen, Yankai Fu, Cheng Peng, Wei Guo, Dong Liang, Zhuo Chen, Mengsi Lyu, Chenrui He, Yulong Ao, Yonghua Lin, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

Date:2025-10-30 14:26:40

The proliferation of collaborative robots across diverse tasks and embodiments presents a central challenge: achieving lifelong adaptability, scalable coordination, and robust scheduling in multi-agent systems. Existing approaches, from vision-language-action (VLA) models to hierarchical frameworks, fall short due to their reliance on limited or dividual-agent memory. This fundamentally constrains their ability to learn over long horizons, scale to heterogeneous teams, or recover from failures, highlighting the need for a unified memory representation. To address these limitations, we introduce RoboOS-NeXT, a unified memory-based framework for lifelong, scalable, and robust multi-robot collaboration. At the core of RoboOS-NeXT is the novel Spatio-Temporal-Embodiment Memory (STEM), which integrates spatial scene geometry, temporal event history, and embodiment profiles into a shared representation. This memory-centric design is integrated into a brain-cerebellum framework, where a high-level brain model performs global planning by retrieving and updating STEM, while low-level controllers execute actions locally. This closed loop between cognition, memory, and execution enables dynamic task allocation, fault-tolerant collaboration, and consistent state synchronization. We conduct extensive experiments spanning complex coordination tasks in restaurants, supermarkets, and households. Our results demonstrate that RoboOS-NeXT achieves superior performance across heterogeneous embodiments, validating its effectiveness in enabling lifelong, scalable, and robust multi-robot collaboration. Project website: https://flagopen.github.io/RoboOS/

Metacognition and Confidence Dynamics in Advice Taking from Generative AI

Authors:Clara Colombatto, Sean Rintel, Lev Tankelevitch

Date:2025-10-30 14:01:52

Generative Artificial Intelligence (GenAI) can aid humans in a wide range of tasks, but its effectiveness critically depends on users being able to evaluate the accuracy of GenAI outputs and their own expertise. Here we asked how confidence in self and GenAI contributes to decisions to seek and rely on advice from GenAI ('prospective confidence'), and how advice-taking in turn shapes this confidence ('retrospective confidence'). In a novel paradigm involving text generation, participants formulated plans for events, and could request advice from a GenAI (Study 1; N=200) or were randomly assigned to receive advice (Study 2; N=300), which they could rely on or ignore. Advice requests in Study 1 were related to higher prospective confidence in GenAI and lower confidence in self. Advice-seekers showed increased retrospective confidence in GenAI, while those who declined advice showed increased confidence in self. Random assignment in Study 2 revealed that advice exposure increases confidence in GenAI and in self, suggesting that GenAI advice-taking causally boosts retrospective confidence. These results were mirrored in advice reliance, operationalised as the textual similarity between GenAI advice and participants' responses, with reliance associated with increased retrospective confidence in both GenAI and self. Critically, participants who chose to obtain/rely on advice provided more detailed responses (likely due to the output's verbosity), but failed to check the output thoroughly, missing key information. These findings underscore a key role for confidence in interactions with GenAI, shaped by both prior beliefs about oneself and the reliability of AI, and context-dependent exposure to advice.

Co-Evolving Latent Action World Models

Authors:Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian

Date:2025-10-30 12:28:40

Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pre-trained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.

Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving

Authors:Lin Liu, Guanyi Yu, Ziying Song, Junqiao Li, Caiyan Jia, Feiyang Jia, Peiliang Wu, Yandan Luo

Date:2025-10-30 09:24:34

Planning is a critical component of end-to-end autonomous driving. However, prevailing imitation learning methods often suffer from mode collapse, failing to produce diverse trajectory hypotheses. Meanwhile, existing generative approaches struggle to incorporate crucial safety and physical constraints directly into the generative process, necessitating an additional optimization stage to refine their outputs. To address these limitations, we propose CATG, a novel planning framework that leverages Constrained Flow Matching. Concretely, CATG explicitly models the flow matching process, which inherently mitigates mode collapse and allows for flexible guidance from various conditioning signals. Our primary contribution is the novel imposition of explicit constraints directly within the flow matching process, ensuring that the generated trajectories adhere to vital safety and kinematic rules. Secondly, CATG parameterizes driving aggressiveness as a control signal during generation, enabling precise manipulation of trajectory style. Notably, on the NavSim v2 challenge, CATG achieved 2nd place with an EPDMS score of 51.31 and was honored with the Innovation Award.

Simultaneous optimization of non-coplanar beam orientations and cumulative EQD2 distribution for high-dose reirradiation of locoregionally recurrent non-small cell lung cancer

Authors:Nathan Torelli, Jonas Willmann, Katja Daehler, Madalyne Day, Nicolaus Andratschke, Jan Unkelbach

Date:2025-10-30 08:56:29

Background and Purpose: Reirradiation for non-small cell lung cancer (NSCLC) is commonly delivered using coplanar techniques. In this study, we developed a beam orientation optimization algorithm for reirradiation planning to investigate whether the selection of favorable non-coplanar beam orientations may limit cumulative doses to critical organs-at-risk (OARs) and thus improve the therapeutic window. Materials and Methods: Fifteen cases of challenging high-dose reirradiation for locoregionally recurrent NSCLC were included in this in-silico study. For each patient, the dose distribution from the previous treatment was first mapped to the reirradiation planning CT using rigid dose registration, and subsequently converted to equivalent dose in 2 Gy fractions (EQD2). A 2-arc non-coplanar reirradiation plan, combining dynamic gantry and couch rotation, was then generated using an EQD2-based direct aperture optimization algorithm, which allows for the simultaneous optimization of the dynamic gantry-couch path and the cumulative EQD2 distribution. Non-coplanar reirradiation plans were benchmarked against 2-arc coplanar VMAT plans, which mimic state-of-the-art practice for reirradiation of NSCLC. Results: Non-coplanar reirradiation plans could reduce the maximum cumulative EQD2 to critical OARs such as bronchial tree, esophagus, thoracic wall and trachea by at least 5 Gy2 for 6 out of 15 patients compared to coplanar reirradiation plans. At the same time, target coverage and lung EQD2 metrics were comparable for both methods. Conclusions: The automated selection of favorable non-coplanar beam orientations may reduce the maximum cumulative EQD2 to critical OARs in challenging thoracic reirradiation cases. This allows to explore either better OAR sparing or dose-escalation in future clinical studies.

Adaptive Trajectory Refinement for Optimization-based Local Planning in Narrow Passages

Authors:Hahjin Lee, Young J. Kim

Date:2025-10-30 04:53:08

Trajectory planning for mobile robots in cluttered environments remains a major challenge due to narrow passages, where conventional methods often fail or generate suboptimal paths. To address this issue, we propose the adaptive trajectory refinement algorithm, which consists of two main stages. First, to ensure safety at the path-segment level, a segment-wise conservative collision test is applied, where risk-prone trajectory path segments are recursively subdivided until collision risks are eliminated. Second, to guarantee pose-level safety, pose correction based on penetration direction and line search is applied, ensuring that each pose in the trajectory is collision-free and maximally clear from obstacles. Simulation results demonstrate that the proposed method achieves up to 1.69x higher success rates and up to 3.79x faster planning times than state-of-the-art approaches. Furthermore, real-world experiments confirm that the robot can safely pass through narrow passages while maintaining rapid planning performance.

GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks

Authors:Chenrui Shi, Zedong Yu, Zhi Gao, Ruining Feng, Enqi Liu, Yuwei Wu, Yunde Jia, Liuyu Xiang, Zhaofeng He, Qing Li

Date:2025-10-30 03:22:30

Large vision language models (VLMs) have advanced graphical user interface (GUI) task automation but still lag behind humans. We hypothesize this gap stems from missing core GUI knowledge, which existing training schemes (such as supervised fine tuning and reinforcement learning) alone cannot fully address. By analyzing common failure patterns in GUI task execution, we distill GUI knowledge into three dimensions: (1) interface perception, knowledge about recognizing widgets and system states; (2) interaction prediction, knowledge about reasoning action state transitions; and (3) instruction understanding, knowledge about planning, verifying, and assessing task completion progress. We further introduce GUI Knowledge Bench, a benchmark with multiple choice and yes/no questions across six platforms (Web, Android, MacOS, Windows, Linux, IOS) and 292 applications. Our evaluation shows that current VLMs identify widget functions but struggle with perceiving system states, predicting actions, and verifying task completion. Experiments on real world GUI tasks further validate the close link between GUI knowledge and task success. By providing a structured framework for assessing GUI knowledge, our work supports the selection of VLMs with greater potential prior to downstream training and provides insights for building more capable GUI agents.

FlexICL: A Flexible Visual In-context Learning Framework for Elbow and Wrist Ultrasound Segmentation

Authors:Yuyue Zhou, Jessica Knight, Shrimanti Ghosh, Banafshe Felfeliyan, Jacob L. Jaremko, Abhilash R. Hareendranathan

Date:2025-10-30 00:53:26

Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and treatment planning. Fractures appear as cortical defects but require expert interpretation. Deep learning (DL) can provide real-time feedback and highlight key structures, helping lightly trained users perform exams more confidently. However, pixel-wise expert annotations for training remain time-consuming and costly. To address this challenge, we propose FlexICL, a novel and flexible in-context learning (ICL) framework for segmenting bony regions in US images. We apply it to an intra-video segmentation setting, where experts annotate only a small subset of frames, and the model segments unseen frames. We systematically investigate various image concatenation techniques and training strategies for visual ICL and introduce novel concatenation methods that significantly enhance model performance with limited labeled data. By integrating multiple augmentation strategies, FlexICL achieves robust segmentation performance across four wrist and elbow US datasets while requiring only 5% of the training images. It outperforms state-of-the-art visual ICL models like Painter, MAE-VQGAN, and conventional segmentation models like U-Net and TransUNet by 1-27% Dice coefficient on 1,252 US sweeps. These initial results highlight the potential of FlexICL as an efficient and scalable solution for US image segmentation well suited for medical imaging use cases where labeled data is scarce.

Budget Forecasting and Integrated Strategic Planning for Leaders

Authors:Matt Salehi

Date:2025-10-30 00:22:50

This study explored how advanced budgeting techniques and economic indicators influence funding levels and strategic alignment in California Community Colleges (CCCs). Despite widespread implementation of budgeting reforms, many CCCs continue to face challenges aligning financial planning with institutional missions, particularly in supporting diversity, equity, and inclusion (DEI) initiatives. The study used a quantitative correlational design, analyzing 30 years of publicly available economic data, including unemployment rates, GDP growth, and CPI, in relation to CCC funding trends. Results revealed a strong positive correlation between GDP growth and CCC funding levels, as well as between CPI and funding levels, underscoring the predictive value of macroeconomic indicators in budget planning. These findings emphasize the need for educational leaders to integrate economic forecasting into budget planning processes to safeguard institutional effectiveness and sustain programs serving underrepresented student populations.

Climate Adaptation-Aware Flood Prediction for Coastal Cities Using Deep Learning

Authors:Bilal Hassan, Areg Karapetyan, Aaron Chung Hin Chow, Samer Madanat

Date:2025-10-29 23:23:11

Climate change and sea-level rise (SLR) pose escalating threats to coastal cities, intensifying the need for efficient and accurate methods to predict potential flood hazards. Traditional physics-based hydrodynamic simulators, although precise, are computationally expensive and impractical for city-scale coastal planning applications. Deep Learning (DL) techniques offer promising alternatives, however, they are often constrained by challenges such as data scarcity and high-dimensional output requirements. Leveraging a recently proposed vision-based, low-resource DL framework, we develop a novel, lightweight Convolutional Neural Network (CNN)-based model designed to predict coastal flooding under variable SLR projections and shoreline adaptation scenarios. Furthermore, we demonstrate the ability of the model to generalize across diverse geographical contexts by utilizing datasets from two distinct regions: Abu Dhabi and San Francisco. Our findings demonstrate that the proposed model significantly outperforms state-of-the-art methods, reducing the mean absolute error (MAE) in predicted flood depth maps on average by nearly 20%. These results highlight the potential of our approach to serve as a scalable and practical tool for coastal flood management, empowering decision-makers to develop effective mitigation strategies in response to the growing impacts of climate change. Project Page: https://caspiannet.github.io/

Estimating cognitive biases with attention-aware inverse planning

Authors:Sounak Banerjee, Daphne Cornelisse, Deepak Gopinath, Emily Sumner, Jonathan DeCastro, Guy Rosman, Eugene Vinitsky, Mark K. Ho

Date:2025-10-29 20:50:04

People's goal-directed behaviors are influenced by their cognitive biases, and autonomous systems that interact with people should be aware of this. For example, people's attention to objects in their environment will be biased in a way that systematically affects how they perform everyday tasks such as driving to work. Here, building on recent work in computational cognitive science, we formally articulate the attention-aware inverse planning problem, in which the goal is to estimate a person's attentional biases from their actions. We demonstrate how attention-aware inverse planning systematically differs from standard inverse reinforcement learning and how cognitive biases can be inferred from behavior. Finally, we present an approach to attention-aware inverse planning that combines deep reinforcement learning with computational cognitive modeling. We use this approach to infer the attentional strategies of RL agents in real-life driving scenarios selected from the Waymo Open Dataset, demonstrating the scalability of estimating cognitive biases with attention-aware inverse planning.

FinOps Agent -- A Use-Case for IT Infrastructure and Cost Optimization

Authors:Ngoc Phuoc An Vo, Manish Kesarwani, Ruchi Mahindru, Chandrasekhar Narayanaswami

Date:2025-10-29 19:34:14

FinOps (Finance + Operations) represents an operational framework and cultural practice which maximizes cloud business value through collaborative financial accountability across engineering, finance, and business teams. FinOps practitioners face a fundamental challenge: billing data arrives in heterogeneous formats, taxonomies, and metrics from multiple cloud providers and internal systems which eventually lead to synthesizing actionable insights, and making time-sensitive decisions. To address this challenge, we propose leveraging autonomous, goal-driven AI agents for FinOps automation. In this paper, we built a FinOps agent for a typical use-case for IT infrastructure and cost optimization. We built a system simulating a realistic end-to-end industry process starting with retrieving data from various sources to consolidating and analyzing the data to generate recommendations for optimization. We defined a set of metrics to evaluate our agent using several open-source and close-source language models and it shows that the agent was able to understand, plan, and execute tasks as well as an actual FinOps practitioner.

Quantum Stochastic Gradient Descent in its continuous-time limit based on the Wigner formulation of Open Quantum Systems

Authors:Jose A. Morales Escalante

Date:2025-10-29 19:24:57

The main ideas behind a research plan to use the Wigner formulation as a bridge between classical and quantum probabilistic algorithms are presented, focusing on a particular case: the Quantum analog of Stochastic Gradient Descent in its continuous-time limit based on the Wigner formulation of Open Quantum Systems.

Targeted Resilient Zoning for High Impact Events via Multi Circuit Polelines

Authors:Hritik Gopal Shah, Gregory Giustino, Elli Ntakou

Date:2025-10-29 18:32:54

The increasing frequency and severity of High Impact and Low Probability events such as hurricanes and windstorms pose significant challenges to the resilience of electrical power distribution systems, particularly in regions of New England where there is a significant amount of overhead infrastructure in areas where vegetation is predominant. Traditional reliability-focused planning is insufficient to address the systemic vulnerabilities exposed by such extreme events. This paper presents a novel risk based framework for long term resilience planning of active overhead distribution systems, with a specific focus on mitigating the impacts of high wind and hurricane induced outages.

One Join Order Does Not Fit All: Reducing Intermediate Results with Per-Split Query Plans

Authors:Yujun He, Hangdong Zhao, Simon Frisk, Yifei Yang, Kevin Kristensen, Paraschos Koutris, Xiangyao Yu

Date:2025-10-29 16:47:41

Minimizing intermediate results is critical for efficient multi-join query processing. Although the seminal Yannakakis algorithm offers strong guarantees for acyclic queries, cyclic queries remain an open challenge. In this paper, we propose SplitJoin, a framework that introduces split as a first-class query operator. By partitioning input tables into heavy and light parts, SplitJoin allows different data partitions to use distinct query plans, with the goal of reducing intermediate sizes using existing binary join engines. We systematically explore the design space for split-based optimizations, including threshold selection, split strategies, and join ordering after splits. Implemented as a front-end to DuckDB and Umbra, SplitJoin achieves substantial improvements: on DuckDB, SplitJoin completes 43 social network queries (vs. 29 natively), achieving 2.1x faster runtime and 7.9x smaller intermediates on average (up to 13.6x and 74x, respectively); on Umbra, it completes 45 queries (vs. 35), achieving 1.3x speedups and 1.2x smaller intermediates on average (up to 6.1x and 2.1x, respectively).

Collision avoidance and path finding in a robotic mobile fulfillment system using multi-objective meta-heuristics

Authors:Ahmad Kokhahi, Mary Kurz

Date:2025-10-29 16:10:58

Multi-Agent Path Finding (MAPF) has gained significant attention, with most research focusing on minimizing collisions and travel time. This paper also considers energy consumption in the path planning of automated guided vehicles (AGVs). It addresses two main challenges: i) resolving collisions between AGVs and ii) assigning tasks to AGVs. We propose a new collision avoidance strategy that takes both energy use and travel time into account. For task assignment, we present two multi-objective algorithms: Non-Dominated Sorting Genetic Algorithm (NSGA) and Adaptive Large Neighborhood Search (ALNS). Comparative evaluations show that these proposed methods perform better than existing approaches in both collision avoidance and task assignment.

Citizen science dataset on residents' urban heat perception in outdoor public spaces of climate-vulnerable neighborhoods

Authors:Ferran Larroya, Isabelle Bonhoure, Femke Min, Josep Perelló

Date:2025-10-29 16:05:58

We present a dataset generated to investigate urban heat and thermal perception across five neighborhoods in the Barcelona metropolitan area. In collaboration with 14 non-academic partner organizations, we conducted a series of citizen science campaigns involving 439 residents as co-researchers engaged throughout all stages of the research process. Participants, residents of areas classified as highly or very highly climate-vulnerable, identified 210 public outdoor sites relevant to their daily lives. These locations were subsequently characterized using a range of spatial and environmental indicators pertinent to urban heat island effects, urban health, and climate resilience. Over the course of 48 thermal walks, participants carried portable, low-cost sensors that continuously recorded air temperature, relative humidity, and geolocation, resulting in 296,286 processed microclimatic data points. At pre-defined sites, individuals completed standardized surveys to report their Thermal Sensation Votes and Thermal Comfort Votes, yielding 5,169 self-reported entries. Sociodemographic data were also collected to further contextualize participants' responses. The resulting dataset integrates objective environmental measurements with subjective perceptions of heat, enabling point-by-point analysis of thermal experience within the urban fabric. It offers a novel, multi-dimensional resource to support research on heat, thermal inequality, and the experiential dimensions of climate vulnerability, and is intended to inform evidence-based decision-making in urban planning, public health, and climate adaptation.

Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills

Authors:Weikang Wan, Fabio Ramos, Xuning Yang, Caelan Garrett

Date:2025-10-29 15:39:53

Long-horizon contact-rich bimanual manipulation presents a significant challenge, requiring complex coordination involving a mixture of parallel execution and sequential collaboration between arms. In this paper, we introduce a hierarchical framework that frames this challenge as an integrated skill planning & scheduling problem, going beyond purely sequential decision-making to support simultaneous skill invocation. Our approach is built upon a library of single-arm and bimanual primitive skills, each trained using Reinforcement Learning (RL) in GPU-accelerated simulation. We then train a Transformer-based planner on a dataset of skill compositions to act as a high-level scheduler, simultaneously predicting the discrete schedule of skills as well as their continuous parameters. We demonstrate that our method achieves higher success rates on complex, contact-rich tasks than end-to-end RL approaches and produces more efficient, coordinated behaviors than traditional sequential-only planners.

Using VLM Reasoning to Constrain Task and Motion Planning

Authors:Muyang Yan, Miras Mengdibayev, Ardon Floros, Weihang Guo, Lydia E. Kavraki, Zachary Kingston

Date:2025-10-29 14:12:45

In task and motion planning, high-level task planning is done over an abstraction of the world to enable efficient search in long-horizon robotics problems. However, the feasibility of these task-level plans relies on the downward refinability of the abstraction into continuous motion. When a domain's refinability is poor, task-level plans that appear valid may ultimately fail during motion planning, requiring replanning and resulting in slower overall performance. Prior works mitigate this by encoding refinement issues as constraints to prune infeasible task plans. However, these approaches only add constraints upon refinement failure, expending significant search effort on infeasible branches. We propose VIZ-COAST, a method of leveraging the common-sense spatial reasoning of large pretrained Vision-Language Models to identify issues with downward refinement a priori, bypassing the need to fix these failures during planning. Experiments on two challenging TAMP domains show that our approach is able to extract plausible constraints from images and domain descriptions, drastically reducing planning times and, in some cases, eliminating downward refinement failures altogether, generalizing to a diverse range of instances from the broader domain.

Comparative Study of UNet-based Architectures for Liver Tumor Segmentation in Multi-Phase Contrast-Enhanced Computed Tomography

Authors:Doan-Van-Anh Ly, Thi-Thu-Hien Pham, Thanh-Hai Le

Date:2025-10-29 13:46:19

Segmentation of liver structures in multi-phase contrast-enhanced computed tomography (CECT) plays a crucial role in computer-aided diagnosis and treatment planning for liver diseases, including tumor detection. In this study, we investigate the performance of UNet-based architectures for liver tumor segmentation, starting from the original UNet and extending to UNet3+ with various backbone networks. We evaluate ResNet, Transformer-based, and State-space (Mamba) backbones, all initialized with pretrained weights. Surprisingly, despite the advances in modern architecture, ResNet-based models consistently outperform Transformer- and Mamba-based alternatives across multiple evaluation metrics. To further improve segmentation quality, we introduce attention mechanisms into the backbone and observe that incorporating the Convolutional Block Attention Module (CBAM) yields the best performance. ResNetUNet3+ with CBAM module not only produced the best overlap metrics with a Dice score of 0.755 and IoU of 0.662, but also achieved the most precise boundary delineation, evidenced by the lowest HD95 distance of 77.911. The model's superiority was further cemented by its leading overall accuracy of 0.925 and specificity of 0.926, showcasing its robust capability in accurately identifying both lesion and healthy tissue. To further enhance interpretability, Grad-CAM visualizations were employed to highlight the region's most influential predictions, providing insights into its decision-making process. These findings demonstrate that classical ResNet architecture, when combined with modern attention modules, remain highly competitive for medical image segmentation tasks, offering a promising direction for liver tumor detection in clinical practice.

Multi-Objective Search: Algorithms, Applications, and Emerging Directions

Authors:Oren Salzman, Carlos Hernández Ulloa, Ariel Felner, Sven Koenig

Date:2025-10-29 13:30:01

Multi-objective search (MOS) has emerged as a unifying framework for planning and decision-making problems where multiple, often conflicting, criteria must be balanced. While the problem has been studied for decades, recent years have seen renewed interest in the topic across AI applications such as robotics, transportation, and operations research, reflecting the reality that real-world systems rarely optimize a single measure. This paper surveys developments in MOS while highlighting cross-disciplinary opportunities, and outlines open challenges that define the emerging frontier of MOS

A Vector-Based Algorithm for Generating Complete Balanced Reaction Sets with Arbitrary Numbers of Reagents

Authors:Nataliia Yilmaz, Pavlo Kozub, Svitlana Kozub

Date:2025-10-29 13:27:47

We present a vector-based method to balance chemical reactions. The algorithm builds candidates in a deterministic way, removes duplicates, and always prints coefficients in the lowest whole-number form. For redox cases, electrons and protons/hydroxide are treated explicitly, so both mass and charge are balanced. We also outline the basic principles of the vector formulation of stoichiometry, interpreting reactions as integer vectors in composition space, this geometric view supports compact visualizations of reagent-product interactions and helps surface distinct reaction families. The method enumerates valid balances for arbitrary user-specified species lists without special-case balancing rules or symbolic tricks, and it provides a clean foundation for developing new algorithmic variants (e.g., alternative objectives or constraints). On representative examples (neutralization, double displacement, decomposition, classical redox, small multicomponent sets) and a negative control, the method produced correct integer balances. When multiple balances exist, we report a canonical one - minimizing the total coefficient sum with a simple tie-breaker - without claiming global optimality beyond the solutions the search enumerates. The procedure applies per reaction and extends to reaction networks via consistent per-reaction application. We do not report runtimes, broader benchmarking and code/data release are planned.

Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions

Authors:Mohamad Abou Ali, Fadi Dornaika

Date:2025-10-29 12:11:34

Agentic AI represents a transformative shift in artificial intelligence, but its rapid advancement has led to a fragmented understanding, often conflating modern neural systems with outdated symbolic models -- a practice known as conceptual retrofitting. This survey cuts through this confusion by introducing a novel dual-paradigm framework that categorizes agentic systems into two distinct lineages: the Symbolic/Classical (relying on algorithmic planning and persistent state) and the Neural/Generative (leveraging stochastic generation and prompt-driven orchestration). Through a systematic PRISMA-based review of 90 studies (2018--2025), we provide a comprehensive analysis structured around this framework across three dimensions: (1) the theoretical foundations and architectural principles defining each paradigm; (2) domain-specific implementations in healthcare, finance, and robotics, demonstrating how application constraints dictate paradigm selection; and (3) paradigm-specific ethical and governance challenges, revealing divergent risks and mitigation strategies. Our analysis reveals that the choice of paradigm is strategic: symbolic systems dominate safety-critical domains (e.g., healthcare), while neural systems prevail in adaptive, data-rich environments (e.g., finance). Furthermore, we identify critical research gaps, including a significant deficit in governance models for symbolic systems and a pressing need for hybrid neuro-symbolic architectures. The findings culminate in a strategic roadmap arguing that the future of Agentic AI lies not in the dominance of one paradigm, but in their intentional integration to create systems that are both adaptable and reliable. This work provides the essential conceptual toolkit to guide future research, development, and policy toward robust and trustworthy hybrid intelligent systems.

What Challenges Do Developers Face in AI Agent Systems? An Empirical Study on Stack Overflow

Authors:Ali Asgari, Annibale Panichella, Pouria Derakhshanfar, Mitchell Olsthoorn

Date:2025-10-29 11:44:21

AI agents have rapidly gained popularity across research and industry as systems that extend large language models with additional capabilities to plan, use tools, remember, and act toward specific goals. Yet despite their promise, developers face persistent and often underexplored challenges when building, deploying, and maintaining these emerging systems. To identify these challenges, we study developer discussions on Stack Overflow, the world's largest developer-focused Q and A platform with about 60 million questions and answers and 30 million users. We construct a taxonomy of developer challenges through tag expansion and filtering, apply LDA-MALLET for topic modeling, and manually validate and label the resulting themes. Our analysis reveals seven major areas of recurring issues encompassing 77 distinct technical challenges related to runtime integration, dependency management, orchestration complexity, and evaluation reliability. We further quantify topic popularity and difficulty to identify which issues are most common and hardest to resolve, map the tools and programming languages used in agent development, and track their evolution from 2021 to 2025 in relation to major AI model and framework releases. Finally, we present the implications of our results, offering concrete guidance for practitioners, researchers, and educators on agent reliability and developer support.

Solving the Right Problem with Multi-Robot Formations

Authors:Chaz Cornwall, Jeremy P. Bos

Date:2025-10-29 11:42:19

Formation control simplifies minimizing multi-robot cost functions by encoding a cost function as a shape the robots maintain. However, by reducing complex cost functions to formations, discrepancies arise between maintaining the shape and minimizing the original cost function. For example, a Diamond or Box formation shape is often used for protecting all members of the formation. When more information about the surrounding environment becomes available, a static shape often no longer minimizes the original protection cost. We propose a formation planner to reduce mismatch between a formation and the cost function while still leveraging efficient formation controllers. Our formation planner is a two-step optimization problem that identifies desired relative robot positions. We first solve a constrained problem to estimate non-linear and non-differentiable costs with a weighted sum of surrogate cost functions. We theoretically analyze this problem and identify situations where weights do not need to be updated. The weighted, surrogate cost function is then minimized using relative positions between robots. The desired relative positions are realized using a non-cooperative formation controller derived from Lyapunov's direct approach. We then demonstrate the efficacy of this approach for military-like costs such as protection and obstacle avoidance. In simulations, we show a formation planner can reduce a single cost by over 75%. When minimizing a variety of cost functions simultaneously, using a formation planner with adaptive weights can reduce the cost by 20-40%. Formation planning provides better performance by minimizing a surrogate cost function that closely approximates the original cost function instead of relying on a shape abstraction.