planning - 2025-07-22

Label tree semantic losses for rich multi-class medical image segmentation

Authors:Junwen Wang, Oscar MacCormac, William Rochford, Aaron Kujawa, Jonathan Shapey, Tom Vercauteren

Date:2025-07-21 16:32:48

Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting precise post-operative assessment. However, commonly used learning methods for medical and surgical imaging segmentation tasks penalise all errors equivalently and thus fail to exploit any inter-class semantics in the labels space. This becomes particularly problematic as the cardinality and richness of labels increases to include subtly different classes. In this work, we propose two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels. We further incorporate our losses in a recently proposed approach for training with sparse, background-free annotations to extend the applicability of our proposed losses. Extensive experiments are reported on two medical and surgical image segmentation tasks, namely head MRI for whole brain parcellation (WBP) with full supervision and neurosurgical hyperspectral imaging (HSI) for scene understanding with sparse annotations. Results demonstrate that our proposed method reaches state-of-the-art performance in both cases.

Towards physician-centered oversight of conversational diagnostic AI

Authors:Elahe Vedadi, David Barrett, Natalie Harris, Ellery Wulczyn, Shashir Reddy, Roma Ruparel, Mike Schaekermann, Tim Strother, Ryutaro Tanno, Yash Sharma, Jihyeon Lee, Cían Hughes, Dylan Slack, Anil Palepu, Jan Freyberg, Khaled Saab, Valentin Liévin, Wei-Hung Weng, Tao Tu, Yun Liu, Nenad Tomasev, Kavita Kulkarni, S. Sara Mahdavi, Kelvin Guu, Joëlle Barral, Dale R. Webster, James Manyika, Avinatan Hassidim, Katherine Chou, Yossi Matias, Pushmeet Kohli, Adam Rodman, Vivek Natarajan, Alan Karthikesalingam, David Stutz

Date:2025-07-21 15:54:36

Recent work has demonstrated the promise of conversational AI systems for diagnostic dialogue. However, real-world assurance of patient safety means that providing individual diagnoses and treatment plans is considered a regulated activity by licensed professionals. Furthermore, physicians commonly oversee other team members in such activities, including nurse practitioners (NPs) or physician assistants/associates (PAs). Inspired by this, we propose a framework for effective, asynchronous oversight of the Articulate Medical Intelligence Explorer (AMIE) AI system. We propose guardrailed-AMIE (g-AMIE), a multi-agent system that performs history taking within guardrails, abstaining from individualized medical advice. Afterwards, g-AMIE conveys assessments to an overseeing primary care physician (PCP) in a clinician cockpit interface. The PCP provides oversight and retains accountability of the clinical decision. This effectively decouples oversight from intake and can thus happen asynchronously. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) of text consultations with asynchronous oversight, we compared g-AMIE to NPs/PAs or a group of PCPs under the same guardrails. Across 60 scenarios, g-AMIE outperformed both groups in performing high-quality intake, summarizing cases, and proposing diagnoses and management plans for the overseeing PCP to review. This resulted in higher quality composite decisions. PCP oversight of g-AMIE was also more time-efficient than standalone PCP consultations in prior work. While our study does not replicate existing clinical practices and likely underestimates clinicians' capabilities, our results demonstrate the promise of asynchronous oversight as a feasible paradigm for diagnostic AI systems to operate under expert human oversight for enhancing real-world care.

Selective Densification for Rapid Motion Planning in High Dimensions with Narrow Passages

Authors:Lu Huang, Lingxiao Meng, Jiankun Wang, Xingjian Jing

Date:2025-07-21 15:17:59

Sampling-based algorithms are widely used for motion planning in high-dimensional configuration spaces. However, due to low sampling efficiency, their performance often diminishes in complex configuration spaces with narrow corridors. Existing approaches address this issue using handcrafted or learned heuristics to guide sampling toward useful regions. Unfortunately, these strategies often lack generalizability to various problems or require extensive prior training. In this paper, we propose a simple yet efficient sampling-based planning framework along with its bidirectional version that overcomes these issues by integrating different levels of planning granularity. Our approach probes configuration spaces with uniform random samples at varying resolutions and explores these multi-resolution samples online with a bias towards sparse samples when traveling large free configuration spaces. By seamlessly transitioning between sparse and dense samples, our approach can navigate complex configuration spaces while maintaining planning speed and completeness. The simulation results demonstrate that our approach outperforms several state-of-the-art sampling-based planners in $\mathbb{SE}(2)$, $\mathbb{SE}(3)$, and $\mathbb{R}^{14}$ with challenging terrains. Furthermore, experiments conducted with the Franka Emika Panda robot operating in a constrained workspace provide additional evidence of the superiority of the proposed method.

A Voxel-Wise Uncertainty-Guided Framework for Glioma Segmentation Using Spherical Projection-Based U-Net and Localized Refinement in Multi-Parametric MRI

Authors:Zhenyu Yang, Chen Yang, Rihui Zhang, Minbin Chen, Chunhao Wang, Fang-Fang Yin

Date:2025-07-21 13:46:26

Purpose: Accurate segmentation of glioma subregions in multi-parametric MRI (MP-MRI) is essential for diagnosis and treatment planning but remains challenging due to tumor heterogeneity and ambiguous boundaries. This study proposes an uncertainty-guided hybrid framework integrating spherical projection-based 2D modeling with targeted 3D refinement to enhance segmentation accuracy and interpretability. Methods: Using the BraTS2020 dataset (369 patients, four-modality MP-MRI), three 2D U-Nets were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Voxel-wise uncertainty was quantified via a spherical projection-based 2D nnU-Net, capturing prediction variance across deformed inputs. A 3D sliding window was used to identify high-uncertainty regions, which were refined using a dedicated 3D nnU-Net. Final outputs combined 2D and 3D predictions through a weighted fusion optimized via Particle Swarm Optimization. Results: The proposed method outperformed standalone 2D and 3D baselines, achieving Dice scores of 0.8124 (ET), 0.7499 (TC), and 0.9055 (WT), with consistent gains in sensitivity and visual coherence. Conclusion: This work presents a novel uncertainty-aware segmentation strategy that adaptively integrates 2D and 3D modeling. By focusing refinement on ambiguous regions, it improves both efficiency and accuracy, offering broad applicability to precision neuro-oncology and other high-stakes medical imaging tasks.

A Universal Vehicle-Trailer Navigation System with Neural Kinematics and Online Residual Learning

Authors:Yanbo Chen, Yunzhe Tan, Yaojia Wang, Zhengzhe Xu, Junbo Tan, Xueqian Wang

Date:2025-07-21 13:31:02

Autonomous navigation of vehicle-trailer systems is crucial in environments like airports, supermarkets, and concert venues, where various types of trailers are needed to navigate with different payloads and conditions. However, accurately modeling such systems remains challenging, especially for trailers with castor wheels. In this work, we propose a novel universal vehicle-trailer navigation system that integrates a hybrid nominal kinematic model--combining classical nonholonomic constraints for vehicles and neural network-based trailer kinematics--with a lightweight online residual learning module to correct real-time modeling discrepancies and disturbances. Additionally, we develop a model predictive control framework with a weighted model combination strategy that improves long-horizon prediction accuracy and ensures safer motion planning. Our approach is validated through extensive real-world experiments involving multiple trailer types and varying payload conditions, demonstrating robust performance without manual tuning or trailer-specific calibration.

SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging

Authors:Salah Eddine Bekhouche, Gaby Maroun, Fadi Dornaika, Abdenour Hadid

Date:2025-07-21 13:18:05

Medical image segmentation is crucial for many healthcare tasks, including disease diagnosis and treatment planning. One key area is the segmentation of skin lesions, which is vital for diagnosing skin cancer and monitoring patients. In this context, this paper introduces SegDT, a new segmentation model based on diffusion transformer (DiT). SegDT is designed to work on low-cost hardware and incorporates Rectified Flow, which improves the generation quality at reduced inference steps and maintains the flexibility of standard diffusion models. Our method is evaluated on three benchmarking datasets and compared against several existing works, achieving state-of-the-art results while maintaining fast inference speeds. This makes the proposed model appealing for real-world medical applications. This work advances the performance and capabilities of deep learning models in medical image analysis, enabling faster, more accurate diagnostic tools for healthcare professionals. The code is made publicly available at \href{https://github.com/Bekhouche/SegDT}{GitHub}.

Assessment of the AlUla Manara astronomical site in Saudi Arabia using ECMWF ERA5 Reanalysis data

Authors:G. Lombardi, G. Fildes, O. Cuevas, T. Y. Alrefay, N. Almalik

Date:2025-07-21 12:54:18

As part of Saudi Arabia Vision 2030 and under the guidance of the Royal Commission for AlUla (RCU), efforts are underway to establish AlUla Manara as the Kingdom first major astronomical observatory. This study presents a preliminary assessment of the site based on ECMWF ERA5 reanalysis data to evaluate its suitability for hosting a 4m-class optical-IR telescope. AlUla Manara is located on a remote plateau 74 km north of the historical town of AlUla and was recently designated as an International Dark Sky Park. The analysis focuses on key astro-meteorological parameters such as seeing, temperature regimes, wind patterns, cloud cover and precipitable water vapor (PWV). Results show a median nighttime seeing of 1.5 arcsec, a median cold season PWV of 3.2 mm, and over 79% of nighttime hours with clear sky conditions. Wind regimes are generally mild, posing no constraints on infrastructure. The analysis includes three further sites in the Kingdom, namely Volcanic Top, Ward Mountain, and Dubba Mountain. These sites exhibit better turbulence conditions, but are located outside RCU jurisdiction. Nevertheless, AlUla Manara remains a competitive candidate thanks to its alignment with broader regional development goals. To validate these preliminary results, a dedicated Astronomical Site Monitor has been deployed on site to support the design and operational planning of the observatory.

Trade-offs between elective surgery rescheduling and length-of-stay prediction accuracy

Authors:Pieter Smet, Martina Doneda, Ettore Lanzarone, Giuliana Carello

Date:2025-07-21 12:46:18

The availability of downstream resources plays a critical role in planning the admission of patients undergoing elective surgery, with inpatient beds being one of the most crucial resources. When planning patient admissions, predictions on their length-of-stay (LOS) made by machine learning (ML) models are used to ensure bed availability. However, the actual LOS for each patient may differ considerably from the predicted value, potentially making the schedule infeasible. To address such infeasibilities, rescheduling strategies that take advantage of operational flexibility can be implemented. For example, adjustments may include postponing admission dates, relocating patients to different wards, or even transferring patients who are already admitted. The common assumption is that more accurate LOS predictions reduce the impact of rescheduling. However, training ML models that can make such accurate predictions can be costly. Building on previous work that proposed simulated \ac{ml} for evaluating data-driven approaches, this paper explores the relationship between LOS prediction accuracy and rescheduling flexibility across various corrective policies. Specifically, we examine the most effective patient rescheduling strategies under LOS prediction errors to prevent bed overflows while optimizing resource utilization.

FlowForge: Guiding the Creation of Multi-agent Workflows with Design Space Visualization as a Thinking Scaffold

Authors:Pan Hao, Dongyeop Kang, Nicholas Hinds, Qianwen Wang

Date:2025-07-21 12:39:08

Multi-agent workflows have become an effective strategy for tackling complicated tasks by decomposing them into multiple sub-tasks and assigning them to specialized agents. However, designing optimal workflows remains challenging due to the vast and intricate design space. Current practices rely heavily on the intuition and expertise of practitioners, often resulting in design fixation or an unstructured, time-consuming exploration of trial-and-error. To address these challenges, this work introduces FLOWFORGE, an interactive visualization tool to facilitate the creation of multi-agent workflow through i) a structured visual exploration of the design space and ii) in-situ guidance informed by established design patterns. Based on formative studies and literature review, FLOWFORGE organizes the workflow design process into three hierarchical levels (i.e., task planning, agent assignment, and agent optimization), ranging from abstract to concrete. This structured visual exploration enables users to seamlessly move from high-level planning to detailed design decisions and implementations, while comparing alternative solutions across multiple performance metrics. Additionally, drawing from established workflow design patterns, FLOWFORGE provides context-aware, in-situ suggestions at each level as users navigate the design space, enhancing the workflow creation process with practical guidance. Use cases and user studies demonstrate the usability and effectiveness of FLOWFORGE, while also yielding valuable insights into how practitioners explore design spaces and leverage guidance during workflow development.

Optimal polynomial approximants and orthogonal polynomials on the unit circle. An electrostatic approach

Authors:Ramón Orive, Joaquín Sánchez-Lara, Daniel Seco

Date:2025-07-21 10:43:50

We explore the connection between two seemingly distant fields: the set of cyclic functions $f$ in a Hilbert space of analytic functions over the unit disc $\D$, on the one hand, and the families of orthogonal polynomials for a weight on the unit circle $\T$ (OPUC), on the other. This link is established by so-called Optimal Polynomial Approximants (OPA) to $1/f$, that is, polynomials $p_n$ minimizing the norm of $1-p_nf$, among all polynomials $p_n$ of degree up to a given $n$. Here, we focus on the particular case of the Hardy space, and an electrostatic interpretation of the zeros of those OPA (and thus, of the corresponding OPUC) is studied. We find the electrostatic laws explaining the position of such zeros for a reduced but significant class of examples. This represents the first step towards a research plan proposed over a decade ago to understand zeros of OPA through their potential theoretic properties.

The Emergence of Deep Reinforcement Learning for Path Planning

Authors:Thanh Thi Nguyen, Saeid Nahavandi, Imran Razzak, Dung Nguyen, Nhat Truong Pham, Quoc Viet Hung Nguyen

Date:2025-07-21 10:21:42

The increasing demand for autonomous systems in complex and dynamic environments has driven significant research into intelligent path planning methodologies. For decades, graph-based search algorithms, linear programming techniques, and evolutionary computation methods have served as foundational approaches in this domain. Recently, deep reinforcement learning (DRL) has emerged as a powerful method for enabling autonomous agents to learn optimal navigation strategies through interaction with their environments. This survey provides a comprehensive overview of traditional approaches as well as the recent advancements in DRL applied to path planning tasks, focusing on autonomous vehicles, drones, and robotic platforms. Key algorithms across both conventional and learning-based paradigms are categorized, with their innovations and practical implementations highlighted. This is followed by a thorough discussion of their respective strengths and limitations in terms of computational efficiency, scalability, adaptability, and robustness. The survey concludes by identifying key open challenges and outlining promising avenues for future research. Special attention is given to hybrid approaches that integrate DRL with classical planning techniques to leverage the benefits of both learning-based adaptability and deterministic reliability, offering promising directions for robust and resilient autonomous navigation.

Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration

Authors:Hee Jun Yang, Min Jung Kim, Yeoneung Kim

Date:2025-07-21 10:06:53

We propose a mesh-free policy iteration framework that combines classical dynamic programming with physics-informed neural networks (PINNs) to solve high-dimensional, nonconvex Hamilton--Jacobi--Isaacs (HJI) equations arising in stochastic differential games and robust control. The method alternates between solving linear second-order PDEs under fixed feedback policies and updating the controls via pointwise minimax optimization using automatic differentiation. Under standard Lipschitz and uniform ellipticity assumptions, we prove that the value function iterates converge locally uniformly to the unique viscosity solution of the HJI equation. The analysis establishes equi-Lipschitz regularity of the iterates, enabling provable stability and convergence without requiring convexity of the Hamiltonian. Numerical experiments demonstrate the accuracy and scalability of the method. In a two-dimensional stochastic path-planning game with a moving obstacle, our method matches finite-difference benchmarks with relative $L^2$-errors below %10^{-2}%. In five- and ten-dimensional publisher-subscriber differential games with anisotropic noise, the proposed approach consistently outperforms direct PINN solvers, yielding smoother value functions and lower residuals. Our results suggest that integrating PINNs with policy iteration is a practical and theoretically grounded method for solving high-dimensional, nonconvex HJI equations, with potential applications in robotics, finance, and multi-agent reinforcement learning.

RAD: Retrieval High-quality Demonstrations to Enhance Decision-making

Authors:Lu Guo, Yixiang Shan, Zhengbang Zhu, Qifan Liang, Lichang Song, Ting Long, Weinan Zhang, Yi Chang

Date:2025-07-21 08:08:18

Offline reinforcement learning (RL) enables agents to learn policies from fixed datasets, avoiding costly or unsafe environment interactions. However, its effectiveness is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories, which makes long-horizon planning particularly challenging. Prior solutions based on synthetic data augmentation or trajectory stitching often fail to generalize to novel states and rely on heuristic stitching points. To address these challenges, we propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which combines non-parametric retrieval with diffusion-based generative modeling. RAD dynamically retrieves high-return states from the offline dataset as target states based on state similarity and return estimation, and plans toward them using a condition-guided diffusion model. Such retrieval-guided generation enables flexible trajectory stitching and improves generalization when encountered with underrepresented or out-of-distribution states. Extensive experiments confirm that RAD achieves competitive or superior performance compared to baselines across diverse benchmarks, validating its effectiveness.

MedSR-Impact: Transformer-Based Super-Resolution for Lung CT Segmentation, Radiomics, Classification, and Prognosis

Authors:Marc Boubnovski Martell, Kristofer Linton-Reid, Mitchell Chen, Sumeet Hindocha, Benjamin Hunter, Marco A. Calzado, Richard Lee, Joram M. Posma, Eric O. Aboagye

Date:2025-07-21 07:53:49

High-resolution volumetric computed tomography (CT) is essential for accurate diagnosis and treatment planning in thoracic diseases; however, it is limited by radiation dose and hardware costs. We present the Transformer Volumetric Super-Resolution Network (\textbf{TVSRN-V2}), a transformer-based super-resolution (SR) framework designed for practical deployment in clinical lung CT analysis. Built from scalable components, including Through-Plane Attention Blocks (TAB) and Swin Transformer V2 -- our model effectively reconstructs fine anatomical details in low-dose CT volumes and integrates seamlessly with downstream analysis pipelines. We evaluate its effectiveness on three critical lung cancer tasks -- lobe segmentation, radiomics, and prognosis -- across multiple clinical cohorts. To enhance robustness across variable acquisition protocols, we introduce pseudo-low-resolution augmentation, simulating scanner diversity without requiring private data. TVSRN-V2 demonstrates a significant improvement in segmentation accuracy (+4\% Dice), higher radiomic feature reproducibility, and enhanced predictive performance (+0.06 C-index and AUC). These results indicate that SR-driven recovery of structural detail significantly enhances clinical decision support, positioning TVSRN-V2 as a well-engineered, clinically viable system for dose-efficient imaging and quantitative analysis in real-world CT workflows.

VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving

Authors:Haichao Liu, Haoren Guo, Pei Liu, Benshan Ma, Yuxiang Zhang, Jun Ma, Tong Heng Lee

Date:2025-07-21 06:06:27

Scene understanding and risk-aware attentions are crucial for human drivers to make safe and effective driving decisions. To imitate this cognitive ability in urban autonomous driving while ensuring the transparency and interpretability, we propose a vision-language model (VLM)-enhanced unified decision-making and motion control framework, named VLM-UDMC. This framework incorporates scene reasoning and risk-aware insights into an upper-level slow system, which dynamically reconfigures the optimal motion planning for the downstream fast system. The reconfiguration is based on real-time environmental changes, which are encoded through context-aware potential functions. More specifically, the upper-level slow system employs a two-step reasoning policy with Retrieval-Augmented Generation (RAG), leveraging foundation models to process multimodal inputs and retrieve contextual knowledge, thereby generating risk-aware insights. Meanwhile, a lightweight multi-kernel decomposed LSTM provides real-time trajectory predictions for heterogeneous traffic participants by extracting smoother trend representations for short-horizon trajectory prediction. The effectiveness of the proposed VLM-UDMC framework is verified via both simulations and real-world experiments with a full-size autonomous vehicle. It is demonstrated that the presented VLM-UDMC effectively leverages scene understanding and attention decomposition for rational driving decisions, thus improving the overall urban driving performance. Our open-source project is available at https://github.com/henryhcliu/vlmudmc.git.

A Study of Anatomical Priors for Deep Learning-Based Segmentation of Pheochromocytoma in Abdominal CT

Authors:Tanjin Taher Toma, Tejas Sudharshan Mathai, Bikash Santra, Pritam Mukherjee, Jianfei Liu, Wesley Jong, Darwish Alabyad, Vivek Batheja, Abhishek Jha, Mayank Patel, Darko Pucar, Jayadira del Rivero, Karel Pacak, Ronald M. Summers

Date:2025-07-21 02:35:29

Accurate segmentation of pheochromocytoma (PCC) in abdominal CT scans is essential for tumor burden estimation, prognosis, and treatment planning. It may also help infer genetic clusters, reducing reliance on expensive testing. This study systematically evaluates anatomical priors to identify configurations that improve deep learning-based PCC segmentation. We employed the nnU-Net framework to evaluate eleven annotation strategies for accurate 3D segmentation of pheochromocytoma, introducing a set of novel multi-class schemes based on organ-specific anatomical priors. These priors were derived from adjacent organs commonly surrounding adrenal tumors (e.g., liver, spleen, kidney, aorta, adrenal gland, and pancreas), and were compared against a broad body-region prior used in previous work. The framework was trained and tested on 105 contrast-enhanced CT scans from 91 patients at the NIH Clinical Center. Performance was measured using Dice Similarity Coefficient (DSC), Normalized Surface Distance (NSD), and instance-wise F1 score. Among all strategies, the Tumor + Kidney + Aorta (TKA) annotation achieved the highest segmentation accuracy, significantly outperforming the previously used Tumor + Body (TB) annotation across DSC (p = 0.0097), NSD (p = 0.0110), and F1 score (25.84% improvement at an IoU threshold of 0.5), measured on a 70-30 train-test split. The TKA model also showed superior tumor burden quantification (R^2 = 0.968) and strong segmentation across all genetic subtypes. In five-fold cross-validation, TKA consistently outperformed TB across IoU thresholds (0.1 to 0.5), reinforcing its robustness and generalizability. These findings highlight the value of incorporating relevant anatomical context in deep learning models to achieve precise PCC segmentation, supporting clinical assessment and longitudinal monitoring.

Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction

Authors:Ce Zhang, Yale Song, Ruta Desai, Michael Louis Iuzzolino, Joseph Tighe, Gedas Bertasius, Satwik Kottur

Date:2025-07-20 21:39:05

Visual Planning for Assistance (VPA) aims to predict a sequence of user actions required to achieve a specified goal based on a video showing the user's progress. Although recent advances in multimodal large language models (MLLMs) have shown promising results in video understanding, long-horizon visual planning remains a challenging problem. We identify two challenges in training large MLLMs for video-based planning tasks: (1) scarcity of procedural annotations, limiting the model's ability to learn procedural task dynamics effectively, and (2) inefficiency of next-token prediction objective to explicitly capture the structured action space for visual planning when compared to free-form, natural language. To tackle data scarcity, we introduce Auxiliary Task Augmentation. We design and train our model on auxiliary tasks relevant to long-horizon video-based planning (e.g., goal prediction) to augment the model's planning ability. To more explicitly model the structured action space unique to visual planning tasks, we leverage Multi-token Prediction, extending traditional next-token prediction by using multiple heads to predict multiple future tokens during training. Our approach, VideoPlan, achieves state-of-the-art VPA performance on the COIN and CrossTask datasets, surpassing prior methods by 7.3% and 3.4%, respectively, when predicting 3 future actions. We further extend our method to the challenging Ego4D Long-term Action Anticipation task, and show that it is on par with the state-of-the-art approaches despite not using specialized egocentric features. Code will be made available.

Automated planning with ontologies under coherence update semantics

Authors:Stefan Borgwardt, Duy Nhu, Gabriele Röger

Date:2025-07-20 20:49:21

Standard automated planning employs first-order formulas under closed-world semantics to achieve a goal with a given set of actions from an initial state. We follow a line of research that aims to incorporate background knowledge into automated planning problems, for example, by means of ontologies, which are usually interpreted under open-world semantics. We present a new approach for planning with DL-Lite ontologies that combines the advantages of ontology-based action conditions provided by explicit-input knowledge and action bases (eKABs) and ontology-aware action effects under the coherence update semantics. We show that the complexity of the resulting formalism is not higher than that of previous approaches and provide an implementation via a polynomial compilation into classical planning. An evaluation of existing and new benchmarks examines the performance of a planning system on different variants of our compilation.

Search-Based Autonomous Vehicle Motion Planning Using Game Theory

Authors:Pouya Panahandeh, Mohammad Pirani, Baris Fidan, Amir Khajepour

Date:2025-07-20 19:02:10

In this paper, we propose a search-based interactive motion planning scheme for autonomous vehicles (AVs), using a game-theoretic approach. In contrast to traditional search-based approaches, the newly developed approach considers other road users (e.g. drivers and pedestrians) as intelligent agents rather than static obstacles. This leads to the generation of a more realistic path for the AV. Due to the low computational time, the proposed motion planning scheme is implementable in real-time applications. The performance of the developed motion planning scheme is compared with existing motion planning techniques and validated through experiments using WATonoBus, an electrical all-weather autonomous shuttle bus.

Reinforcement Learning for Flow-Matching Policies

Authors:Samuel Pfrommer, Yixiao Huang, Somayeh Sojoudi

Date:2025-07-20 18:15:18

Flow-matching policies have emerged as a powerful paradigm for generalist robotics. These models are trained to imitate an action chunk, conditioned on sensor observations and textual instructions. Often, training demonstrations are generated by a suboptimal policy, such as a human operator. This work explores training flow-matching policies via reinforcement learning to surpass the original demonstration policy performance. We particularly note minimum-time control as a key application and present a simple scheme for variable-horizon flow-matching planning. We then introduce two families of approaches: a simple Reward-Weighted Flow Matching (RWFM) scheme and a Group Relative Policy Optimization (GRPO) approach with a learned reward surrogate. Our policies are trained on an illustrative suite of simulated unicycle dynamics tasks, and we show that both approaches dramatically improve upon the suboptimal demonstrator performance, with the GRPO approach in particular generally incurring between $50\%$ and $85\%$ less cost than a naive Imitation Learning Flow Matching (ILFM) approach.

NavVI: A Telerobotic Simulation with Multimodal Feedback for Visually Impaired Navigation in Warehouse Environments

Authors:Maisha Maimuna, Minhaz Bin Farukee, Sama Nikanfar, Mahfuza Siddiqua, Ayon Roy, Fillia Makedon

Date:2025-07-20 18:14:55

Industrial warehouses are congested with moving forklifts, shelves and personnel, making robot teleoperation particularly risky and demanding for blind and low-vision (BLV) operators. Although accessible teleoperation plays a key role in inclusive workforce participation, systematic research on its use in industrial environments is limited, and few existing studies barely address multimodal guidance designed for BLV users. We present a novel multimodal guidance simulator that enables BLV users to control a mobile robot through a high-fidelity warehouse environment while simultaneously receiving synchronized visual, auditory, and haptic feedback. The system combines a navigation mesh with regular re-planning so routes remain accurate avoiding collisions as forklifts and human avatars move around the warehouse. Users with low vision are guided with a visible path line towards destination; navigational voice cues with clockwise directions announce upcoming turns, and finally proximity-based haptic feedback notifies the users of static and moving obstacles in the path. This real-time, closed-loop system offers a repeatable testbed and algorithmic reference for accessible teleoperation research. The simulator's design principles can be easily adapted to real robots due to the alignment of its navigation, speech, and haptic modules with commercial hardware, supporting rapid feasibility studies and deployment of inclusive telerobotic tools in actual warehouses.

A Stability-Driven Framework for Long-Term Hourly Electricity Demand Forecasting

Authors:Soumyadeep Dhar, Ayushkumar Parmar, Haifeng Qiu, Juan Ramon L. Senga, S. Viswanathan

Date:2025-07-20 15:13:09

Long-term electricity demand forecasting is essential for grid and operations planning, as well as for the analysis and planning of energy transition strategies. However, accurate long-term load forecasting with high temporal resolution remains challenging, as most existing approaches focus on aggregated forecasts, which require accurate prediction of numerous variables for bottom-up sectoral forecasts. In this study, we propose a parsimonious methodology that employs t-tests to verify load stability and the correlation of load with gross domestic product (GDP) to produce a long-term hourly load forecast. Applying this method to Singapore's electricity demand, analysis of multi-year historical data (2004-2022) reveals that its relative hourly load has remained statistically stable, with an overall percentage deviation of 4.24% across seasonality indices. Utilizing these stability findings, five-year-ahead total yearly forecasts were generated using GDP as a predictor, and hourly loads were forecasted using hourly seasonality index fractions. The maximum Mean Absolute Percentage Error (MAPE) across multiple experiments for six-year-ahead forecasts was 6.87%. The methodology was further applied to Belgium (an OECD country) and Bulgaria (a non-OECD country), yielding MAPE values of 6.81% and 5.64%, respectively. Additionally, stability results were incorporated into a short-term forecasting model based on exponential smoothing, demonstrating comparable or improved accuracy relative to existing machine learning-based methods. These findings indicate that parsimonious approaches can effectively produce long-term, high-resolution forecasts.

Complexity of Faceted Explanations in Propositional Abduction

Authors:Johannes Schmidt, Mohamed Maizia, Victor Lagerkvist, Johannes K. Fichte

Date:2025-07-20 13:50:26

Abductive reasoning is a popular non-monotonic paradigm that aims to explain observed symptoms and manifestations. It has many applications, such as diagnosis and planning in artificial intelligence and database updates. In propositional abduction, we focus on specifying knowledge by a propositional formula. The computational complexity of tasks in propositional abduction has been systematically characterized - even with detailed classifications for Boolean fragments. Unsurprisingly, the most insightful reasoning problems (counting and enumeration) are computationally highly challenging. Therefore, we consider reasoning between decisions and counting, allowing us to understand explanations better while maintaining favorable complexity. We introduce facets to propositional abductions, which are literals that occur in some explanation (relevant) but not all explanations (dispensable). Reasoning with facets provides a more fine-grained understanding of variability in explanations (heterogeneous). In addition, we consider the distance between two explanations, enabling a better understanding of heterogeneity/homogeneity. We comprehensively analyze facets of propositional abduction in various settings, including an almost complete characterization in Post's framework.

Designing Robots with, not for: A Co-Design Framework for Empowering Interactions in Forensic Psychiatry

Authors:Qiaoqiao Ren, Remko Proesmans, Arend Pissens, Lara Dehandschutter, William Denecker, Lotte Rouckhout, Joke Carrette, Peter Vanhopplinus, Tony Belpaeme, Francis wyffels

Date:2025-07-20 11:58:04

Forensic mental health care involves the treatment of individuals with severe mental disorders who have committed violent offences. These settings are often characterized by high levels of bureaucracy, risk avoidance, and restricted autonomy. Patients frequently experience a profound loss of control over their lives, leading to heightened psychological stress-sometimes resulting in isolation as a safety measure. In this study, we explore how co-design can be used to collaboratively develop a companion robot that helps monitor and regulate stress while maintaining tracking of the patients' interaction behaviours for long-term intervention. We conducted four co-design workshops in a forensic psychiatric clinic with patients, caregivers, and therapists. Our process began with the presentation of an initial speculative prototype to therapists, enabling reflection on shared concerns, ethical risks, and desirable features. This was followed by a creative ideation session with patients, a third workshop focused on defining desired functions and emotional responses, and we are planning a final prototype demo to gather direct patient feedback. Our findings emphasize the importance of empowering patients in the design process and adapting proposals based on their current emotional state. The goal was to empower the patient in the design process and ensure each patient's voice was heard.

CoMoCAVs: Cohesive Decision-Guided Motion Planning for Connected and Autonomous Vehicles with Multi-Policy Reinforcement Learning

Authors:Pan Hu

Date:2025-07-20 10:27:44

Autonomous driving demands reliable and efficient solutions to closely related problems such as decision-making and motion planning. In this work, decision-making refers specifically to highway lane selection, while motion planning involves generating control commands (such as speed and steering) to reach the chosen lane. In the context of Connected Autonomous Vehicles (CAVs), achieving both flexible and safe lane selection alongside precise trajectory execution remains a significant challenge. This paper proposes a framework called Cohesive Decision-Guided Motion Planning (CDGMP), which tightly integrates decision-making and motion planning using a Mixture of Experts (MoE) inspired architecture combined with multi-policy reinforcement learning. By coordinating multiple specialized sub-networks through a gating mechanism, the method decomposes the complex driving task into modular components. Each sub-network focuses on a specific aspect of driving, improving efficiency by activating only the most relevant modules during inference. This design also enhances safety through modular specialization. CDGMP improves the adaptability and robustness of CAVs across diverse traffic scenarios, offering a scalable solution to real-world autonomy challenges. The architectural principles behind CDGMP, especially the use of MoE, also provide a strong foundation for other high-dimensional decision and control tasks. Simulation results (available at https://youtu.be/_-4OXNHV0UY) demonstrate reliable performance in both lane selection and motion planning.

Strategic Integration of AI Chatbots in Physics Teacher Preparation: A TPACK-SWOT Analysis of Pedagogical, Epistemic, and Cybersecurity Dimensions

Authors:N. Mohammadipour

Date:2025-07-20 08:04:07

This study investigates the strategic and epistemically responsible integration of AI-powered chatbots into physics teacher education by employing a TPACK-guided SWOT framework across three structured learning activities. Conducted within a university-level capstone course on innovative tools for physics instruction, the activities targeted key intersections of technological, pedagogical, and content knowledge (TPACK) through chatbot-assisted tasks: simplifying abstract physics concepts, constructing symbolic concept maps, and designing instructional scenarios. Drawing on participant reflections, classroom artifacts, and iterative feedback, the results highlight internal strengths such as enhanced information-seeking behavior, scaffolded pedagogical planning, and support for symbolic reasoning. At the same time, internal weaknesses emerged, including domain-specific inaccuracies, symbolic limitations (e.g., LaTeX misrendering), and risks of overreliance on AI outputs. External opportunities were found in promoting inclusive education, multilingual engagement, and expanded zones of proximal development (ZPD), while external threats included prompt injection risks, institutional access gaps, and cybersecurity vulnerabilities. By extending existing TPACK-based models with constructs such as AI literacy, prompt-crafting competence, and epistemic verification protocols, this research offers a theoretically grounded and practically actionable roadmap for embedding AI in STEM teacher preparation. The findings affirm that, when critically scaffolded, AI chatbots can support metacognitive reflection, ethical reasoning, and instructional innovation in physics education if implementation is paired with digital fluency training and institutional support.

Preventing an Extractive Green Hydrogen Industry: Risks and Benefits of Grid Expansion and Green Hydrogen in and for Kenya

Authors:Xi Xi, Boniface Kinyanjui, Daniel M. Kammen

Date:2025-07-19 21:07:04

This study evaluates the role of grid-connected hydrogen electrolyzers in advancing a cost-effective and in particular an equitable green hydrogen industry in Kenya to serve both domestic and international needs and markets. Using a multi-nodal capacity expansion model with county-level spatial resolution, we assess how electrolyzer deployment affects electricity cost, grid flexibility, and carbon intensity under various renewable and demand scenarios. Results show that electrolyzers enable up to 30 percent reduction in levelized cost of electricity (LCOE) and US\$460 million in cumulative system cost savings by 2050 compared to a business-as-usual scenario. As a flexible demand available to absorb surplus generation, electrolyzers reduce curtailment and support large-scale wind integration while still requiring a diverse mix of renewable electricity. The resulting hydrogen reaches a levelized cost of \$3.2 per kg by 2050, and its carbon intensity from electricity use falls below one kg carbon dioxide per kg of hydrogen, suggesting likely compliance with international certification thresholds. Benefits persist across all demand trajectories, though their scale depends on the pace of wind expansion. Spatial analyses reveal unequal distribution of infrastructure gains, underscoring the need for equity-oriented planning. These findings suggest that grid-integrated hydrogen, if planned in coordination with wind investment, transmission, and equitable infrastructure deployment, can reduce costs, support certification, and promote a more equitable model of hydrogen development. In other words, connecting electrolyzers to the grid will not only make green hydrogen in Kenya but also for Kenya.

Corridor-based Adaptive Control Barrier and Lyapunov Functions for Safe Mobile Robot Navigation

Authors:Nicholas Mohammad, Nicola Bezzo

Date:2025-07-19 17:26:16

Safe navigation in unknown and cluttered environments remains a challenging problem in robotics. Model Predictive Contour Control (MPCC) has shown promise for performant obstacle avoidance by enabling precise and agile trajectory tracking, however, existing methods lack formal safety assurances. To address this issue, we propose a general Control Lyapunov Function (CLF) and Control Barrier Function (CBF) enabled MPCC framework that enforces safety constraints derived from a free-space corridor around the planned trajectory. To enhance feasibility, we dynamically adapt the CBF parameters at runtime using a Soft Actor-Critic (SAC) policy. The approach is validated with extensive simulations and an experiment on mobile robot navigation in unknown cluttered environments.

Efficient Story Point Estimation With Comparative Learning

Authors:Monoshiz Mahbub Khan, Xioayin Xi, Andrew Meneely, Zhe Yu

Date:2025-07-19 14:36:19

Story point estimation is an essential part of agile software development. Story points are unitless, project-specific effort estimates that help developers plan their sprints. Traditionally, developers estimate story points collaboratively using planning poker or other manual techniques. While the initial calibrating of the estimates to each project is helpful, once a team has converged on a set of precedents, story point estimation can become tedious and labor-intensive. Machine learning can reduce this burden, but only with enough context from the historical decisions made by the project team. That is, state-of-the-art models, such as GPT2SP and FastText-SVM, only make accurate predictions (within-project) when trained on data from the same project. The goal of this work is to streamline story point estimation by evaluating a comparative learning-based framework for calibrating project-specific story point prediction models. Instead of assigning a specific story point value to every backlog item, developers are presented with pairs of items, and indicate which item requires more effort. Using these comparative judgments, a machine learning model is trained to predict the story point estimates. We empirically evaluated our technique using data with 23,313 manual estimates in 16 projects. The model learned from comparative judgments can achieve on average 0.34 Spearman's rank correlation coefficient between its predictions and the ground truth story points. This is similar to, if not better than, the performance of a regression model learned from the ground truth story points. Therefore, the proposed comparative learning approach is more efficient than state-of-the-art regression-based approaches according to the law of comparative judgments - providing comparative judgments yields a lower cognitive burden on humans than providing ratings or categorical labels.

Koopman Operator Based Linear Model Predictive Control for 2D Quadruped Trotting, Bounding, and Gait Transition

Authors:Chun-Ming Yang, Pranav A. Bhounsule

Date:2025-07-19 13:06:39

Online optimal control of quadrupedal robots would enable them to plan their movement in novel scenarios. Linear Model Predictive Control (LMPC) has emerged as a practical approach for real-time control. In LMPC, an optimization problem with a quadratic cost and linear constraints is formulated over a finite horizon and solved on the fly. However, LMPC relies on linearizing the equations of motion (EOM), which may lead to poor solution quality. In this paper, we use Koopman operator theory and the Extended Dynamic Mode Decomposition (EDMD) to create a linear model of the system in high dimensional space, thus retaining the nonlinearity of the EOM. We model the aerial phase and ground contact phases using different linear models. Then, using LMPC, we demonstrate bounding, trotting, and bound-to-trot and trot-to-bound gait transitions in level and rough terrains. The main novelty is the use of Koopman operator theory to create hybrid models of a quadrupedal system and demonstrate the online generation of multiple gaits and gaits transitions.