planning - 2025-11-03

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

Authors:Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang
Date:2025-10-31 16:50:49

Multimodal large language models (MLLMs) have advanced embodied agents by enabling direct perception, reasoning, and planning task-oriented actions from visual inputs. However, such vision driven embodied agents open a new attack surface: visual backdoor attacks, where the agent behaves normally until a visual trigger appears in the scene, then persistently executes an attacker-specified multi-step policy. We introduce BEAT, the first framework to inject such visual backdoors into MLLM-based embodied agents using objects in the environments as triggers. Unlike textual triggers, object triggers exhibit wide variation across viewpoints and lighting, making them difficult to implant reliably. BEAT addresses this challenge by (1) constructing a training set that spans diverse scenes, tasks, and trigger placements to expose agents to trigger variability, and (2) introducing a two-stage training scheme that first applies supervised fine-tuning (SFT) and then our novel Contrastive Trigger Learning (CTL). CTL formulates trigger discrimination as preference learning between trigger-present and trigger-free inputs, explicitly sharpening the decision boundaries to ensure precise backdoor activation. Across various embodied agent benchmarks and MLLMs, BEAT achieves attack success rates up to 80%, while maintaining strong benign task performance, and generalizes reliably to out-of-distribution trigger placements. Notably, compared to naive SFT, CTL boosts backdoor activation accuracy up to 39% under limited backdoor data. These findings expose a critical yet unexplored security risk in MLLM-based embodied agents, underscoring the need for robust defenses before real-world deployment.

Beyond Demographics: Behavioural Segmentation and Spatial Analytics to Enhance Visitor Experience at The British Museum

Authors:Naomi Muggleton, Timothy Monteath, Taha Yasseri
Date:2025-10-31 15:16:50

This study explores visitor behaviour at The British Museum using data science methods applied to novel sources, including audio guide usage logs and TripAdvisor reviews. Analysing 42,000 visitor journeys and over 50,000 reviews, we identify key drivers of satisfaction, segment visitors by behavioural patterns, examine tour engagement, model spatial navigation, and investigate room popularity. Behavioural clustering uncovered four distinct visitor types: Committed Trekkers, Leisurely Explorers, Targeted Visitors, and Speedy Samplers, each characterised by different levels of engagement and movement. Tour usage analysis revealed high drop-off rates and variation in completion rates across different language groups. Spatial flow modelling revealed that accessibility and proximity, particularly aversion to stairs, shaped visitor paths more than thematic organisation. Room popularity was more strongly predicted by physical accessibility than curatorial content. We propose practical strategies for improving engagement and flow, offering a scalable framework for visitor-centred, data-informed museum planning.

Context-Gated Cross-Modal Perception with Visual Mamba for PET-CT Lung Tumor Segmentation

Authors:Elena Mulero Ayllón, Linlin Shen, Pierangelo Veltri, Fabrizia Gelardi, Arturo Chiti, Paolo Soda, Matteo Tortora
Date:2025-10-31 14:29:52

Accurate lung tumor segmentation is vital for improving diagnosis and treatment planning, and effectively combining anatomical and functional information from PET and CT remains a major challenge. In this study, we propose vMambaX, a lightweight multimodal framework integrating PET and CT scan images through a Context-Gated Cross-Modal Perception Module (CGM). Built on the Visual Mamba architecture, vMambaX adaptively enhances inter-modality feature interaction, emphasizing informative regions while suppressing noise. Evaluated on the PCLT20K dataset, the model outperforms baseline models while maintaining lower computational complexity. These results highlight the effectiveness of adaptive cross-modal gating for multimodal tumor segmentation and demonstrate the potential of vMambaX as an efficient and scalable framework for advanced lung cancer analysis. The code is available at https://github.com/arco-group/vMambaX.

A Multi-tiered Human-in-the-loop Approach for Interactive School Mapping Using Earth Observation and Machine Learning

Authors:Casper Fibaek, Abi Riley, Kelsey Doerksen, Do-Hyung Kim, Rochelle Schneider
Date:2025-10-31 13:15:22

This paper presents a multi-tiered human-in-the-loop framework for interactive school mapping designed to improve the accuracy and completeness of educational facility records, particularly in developing regions where such data may be scarce and infrequently updated. The first tier involves a machine learning based analysis of population density, land cover, and existing infrastructure compared with known school locations. The first tier identifies potential gaps and "mislabelled" schools. In subsequent tiers, medium-resolution satellite imagery (Sentinel-2) is investigated to pinpoint regions with a high likelihood of school presence, followed by the application of very high-resolution (VHR) imagery and deep learning models to generate detailed candidate locations for schools within these prioritised areas. The medium-resolution approach was later removed due to insignificant improvements. The medium and VHR resolution models build upon global pre-trained steps to improve generalisation. A key component of the proposed approach is an interactive interface to allow human operators to iteratively review, validate, and refine the mapping results. Preliminary evaluations indicate that the multi-tiered strategy provides a scalable and cost-effective solution for educational infrastructure mapping to support planning and resource allocation.

Bi-martingale optimal transport and its applications

Authors:Karol Bołbotowski
Date:2025-10-31 13:00:38

We introduce a new non-linear optimal transport formulation for a pair of probability measures on $\mathbb{R}^d$ sharing a common barycentre, in which admissible transference plans satisfy two martingale-type constraints. This bi-martingale framework underlies and interconnects several variational problems on the space of probability measures. For the quadratic cost, it provides an optimal transport interpretation of the second Zolotarev distance on $\mathrm{P}_2(\mathbb{R}^d)$. For a broader class of convex costs, it leads to optimization problems under convex order constraints, encompassing in particular the Zolotarev projection onto the cone of dominating probability measures. As a main application, we construct a $\Gamma$-convergent bi-martingale approximation of the classical martingale optimal transport problem. This scheme robustly accommodates deviations from convex order between the marginal distributions and overcomes the well-known instability of MOT with respect to variations of the marginals in higher dimensions.

MVeLMA: Multimodal Vegetation Loss Modeling Architecture for Predicting Post-fire Vegetation Loss

Authors:Meenu Ravi, Shailik Sarkar, Yanshen Sun, Vaishnavi Singh, Chang-Tien Lu
Date:2025-10-31 12:49:33

Understanding post-wildfire vegetation loss is critical for developing effective ecological recovery strategies and is often challenging due to the extended time and effort required to capture the evolving ecosystem features. Recent works in this area have not fully explored all the contributing factors, their modalities, and interactions with each other. Furthermore, most research in this domain is limited by a lack of interpretability in predictive modeling, making it less useful in real-world settings. In this work, we propose a novel end-to-end ML pipeline called MVeLMA (\textbf{M}ultimodal \textbf{Ve}getation \textbf{L}oss \textbf{M}odeling \textbf{A}rchitecture) to predict county-wise vegetation loss from fire events. MVeLMA uses a multimodal feature integration pipeline and a stacked ensemble-based architecture to capture different modalities while also incorporating uncertainty estimation through probabilistic modeling. Through comprehensive experiments, we show that our model outperforms several state-of-the-art (SOTA) and baseline models in predicting post-wildfire vegetation loss. Furthermore, we generate vegetation loss confidence maps to identify high-risk counties, thereby helping targeted recovery efforts. The findings of this work have the potential to inform future disaster relief planning, ecological policy development, and wildlife recovery management.

Assessing the metal and rare earth element mining potential of undifferentiated asteroids through the study of carbonaceous chondrites

Authors:Josep M. Trigo-Rodríguez, Pau Grèbol-Tomàs, Jordi Ibáñez-Insa, Jacinto Alonso-Azcárate, Maria Gritsevich
Date:2025-10-31 11:04:06

Undifferentiated asteroids, particularly the parent bodies of carbon-rich chondrite groups, might be promising candidates for future space resource utilization due to their primitive composition and potential to host valuable metals and rare earth elements. However, our understanding of their bulk elemental composition remains limited, as most data are derived from reflectance spectra with low mineralogical resolution. Sample return missions have started to change that, as returned materials are already available to study. Still the available meteorites provide a valuable source of information about the diversity of undifferentiated asteroids in the interplanetary space. To improve compositional insights, we conducted Inductively Coupled Plasma Mass Spectrometry (ICP-MS) and ICP-AES (Inductively coupled Plasma Atomic Emission Spectroscopy) analyses on a representative suite of carbonaceous chondrites. These meteorites, considered analogs of undifferentiated asteroids, preserve materials from the early solar system and provide a geochemical record of their parent bodies. Our results highlight the abundance and distribution of transition metals, siderophile elements, and rare earth elements across several chondrite groups. These findings support the view that C-type asteroids may serve as viable sources of critical materials, while also informing future mission planning, extraction strategies, and the development of new technologies for low-gravity resource operations.

CASR-Net: An Image Processing-focused Deep Learning-based Coronary Artery Segmentation and Refinement Network for X-ray Coronary Angiogram

Authors:Alvee Hassan, Rusab Sarmun, Muhammad E. H. Chowdhury, M. Murugappan, Md. Sakib Abrar Hossain, Sakib Mahmud, Abdulrahman Alqahtani, Sohaib Bassam Zoghoul, Amith Khandakar, Susu M. Zughaier, Somaya Al-Maadeed, Anwarul Hasan
Date:2025-10-31 09:40:29

Early detection of coronary artery disease (CAD) is critical for reducing mortality and improving patient treatment planning. While angiographic image analysis from X-rays is a common and cost-effective method for identifying cardiac abnormalities, including stenotic coronary arteries, poor image quality can significantly impede clinical diagnosis. We present the Coronary Artery Segmentation and Refinement Network (CASR-Net), a three-stage pipeline comprising image preprocessing, segmentation, and refinement. A novel multichannel preprocessing strategy combining CLAHE and an improved Ben Graham method provides incremental gains, increasing Dice Score Coefficient (DSC) by 0.31-0.89% and Intersection over Union (IoU) by 0.40-1.16% compared with using the techniques individually. The core innovation is a segmentation network built on a UNet with a DenseNet121 encoder and a Self-organized Operational Neural Network (Self-ONN) based decoder, which preserves the continuity of narrow and stenotic vessel branches. A final contour refinement module further suppresses false positives. Evaluated with 5-fold cross-validation on a combination of two public datasets that contain both healthy and stenotic arteries, CASR-Net outperformed several state-of-the-art models, achieving an IoU of 61.43%, a DSC of 76.10%, and clDice of 79.36%. These results highlight a robust approach to automated coronary artery segmentation, offering a valuable tool to support clinicians in diagnosis and treatment planning.

Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery

Authors:Mahmoud El Hussieni, Bahadır K. Güntürk, Hasan F. Ateş, Oğuz Hanoğlu
Date:2025-10-31 06:37:08

Accurate building instance segmentation and height classification are critical for urban planning, 3D city modeling, and infrastructure monitoring. This paper presents a detailed analysis of YOLOv11, the recent advancement in the YOLO series of deep learning models, focusing on its application to joint building extraction and discrete height classification from satellite imagery. YOLOv11 builds on the strengths of earlier YOLO models by introducing a more efficient architecture that better combines features at different scales, improves object localization accuracy, and enhances performance in complex urban scenes. Using the DFC2023 Track 2 dataset -- which includes over 125,000 annotated buildings across 12 cities -- we evaluate YOLOv11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLOv11 achieves strong instance segmentation performance with 60.4\% mAP@50 and 38.3\% mAP@50--95 while maintaining robust classification accuracy across five predefined height tiers. The model excels in handling occlusions, complex building shapes, and class imbalance, particularly for rare high-rise structures. Comparative analysis confirms that YOLOv11 outperforms earlier multitask frameworks in both detection accuracy and inference speed, making it well-suited for real-time, large-scale urban mapping. This research highlights YOLOv11's potential to advance semantic urban reconstruction through streamlined categorical height modeling, offering actionable insights for future developments in remote sensing and geospatial intelligence.

Vectorized Online POMDP Planning

Authors:Marcus Hoerger, Muhammad Sudrajat, Hanna Kurniawati
Date:2025-10-31 05:21:39

Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization of today's hardware, but parallelizing POMDP solvers has been challenging. They rely on interleaving numerical optimization over actions with the estimation of their values, which creates dependencies and synchronization bottlenecks between parallel processes that can quickly offset the benefits of parallelization. In this paper, we propose Vectorized Online POMDP Planner (VOPP), a novel parallel online solver that leverages a recent POMDP formulation that analytically solves part of the optimization component, leaving only the estimation of expectations for numerical computation. VOPP represents all data structures related to planning as a collection of tensors and implements all planning steps as fully vectorized computations over this representation. The result is a massively parallel solver with no dependencies and synchronization bottlenecks between parallel computations. Experimental results indicate that VOPP is at least 20X more efficient in computing near-optimal solutions compared to an existing state-of-the-art parallel online solver.

A Hierarchical Deep Learning Model for Predicting Pedestrian-Level Urban Winds

Authors:Reda Snaiki, Jiachen Lu, Shaopeng Li, Negin Nazarian
Date:2025-10-31 01:55:39

Deep learning-based surrogate models offer a computationally efficient alternative to high-fidelity computational fluid dynamics (CFD) simulations for predicting urban wind flow. However, conventional approaches usually only yield low-frequency predictions (essentially averaging values from proximate pixels), missing critical high-frequency details such as sharp gradients and peak wind speeds. This study proposes a hierarchical approach for accurately predicting pedestrian-level urban winds, which adopts a two-stage predictor-refiner framework. In the first stage, a U-Net architecture generates a baseline prediction from urban geometry. In the second stage, a conditional Generative Adversarial Network (cGAN) refines this baseline by restoring the missing high-frequency content. The cGAN's generator incorporates a multi-scale architecture with stepwise kernel sizes, enabling simultaneous learning of global flow structures and fine-grained local features. Trained and validated on the UrbanTALES dataset with comprehensive urban configurations, the proposed hierarchical framework significantly outperforms the baseline predictor. With a marked qualitative improvement in resolving high-speed wind jets and complex turbulent wakes as well as wind statistics, the results yield quantitative enhancement in prediction accuracy (e.g., RMSE reduced by 76% for the training set and 60% for the validation set). This work presents an effective and robust methodology for enhancing the prediction fidelity of surrogate models in urban planning, pedestrian comfort assessment, and wind safety analysis. The proposed model will be integrated into an interactive web platform as Feilian Version 2.

A Cloud-Based Spatio-Temporal GNN-Transformer Hybrid Model for Traffic Flow Forecasting with External Feature Integration

Authors:Zhuo Zheng, Lingran Meng, Ziyu Lin
Date:2025-10-30 22:57:58

Accurate traffic flow forecasting is essential for the development of intelligent transportation systems (ITS), supporting tasks such as traffic signal optimization, congestion management, and route planning. Traditional models often fail to effectively capture complex spatial-temporal dependencies in large-scale road networks, especially under the influence of external factors such as weather, holidays, and traffic accidents. To address this challenge, this paper proposes a cloud-based hybrid model that integrates Spatio-Temporal Graph Neural Networks (ST-GNN) with a Transformer architecture for traffic flow prediction. The model leverages the strengths of GNNs in modeling spatial correlations across road networks and the Transformers' ability to capture long-term temporal dependencies. External contextual features are incorporated via feature fusion to enhance predictive accuracy. The proposed model is deployed on a cloud computing platform to achieve scalability and real-time adaptability. Experimental evaluation of the dataset shows that our model outperforms baseline methods (LSTM, TCN, GCN, pure Transformer) with an RMSE of only 17.92 and a MAE of only 10.53. These findings suggest that the hybrid GNN-Transformer approach provides an effective and scalable solution for cloud-based ITS applications, offering methodological advancements for traffic flow forecasting and practical implications for congestion mitigation.

Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules

Authors:John J. Vastola, Samuel J. Gershman, Kanaka Rajan
Date:2025-10-30 20:56:35

Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered optimal? We propose a theoretical framework that casts learning rules as policies for navigating (partially observable) loss landscapes, and identifies optimal rules as solutions to an associated optimal control problem. A range of well-known rules emerge naturally within this framework under different assumptions: gradient descent from short-horizon optimization, momentum from longer-horizon planning, natural gradients from accounting for parameter space geometry, non-gradient rules from partial controllability, and adaptive optimizers like Adam from online Bayesian inference of loss landscape shape. We further show that continual learning strategies like weight resetting can be understood as optimal responses to task uncertainty. By unifying these phenomena under a single objective, our framework clarifies the computational structure of learning and offers a principled foundation for designing adaptive algorithms.

RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification

Authors:Yunhao Yang, Neel P. Bhatt, Pranay Samineni, Rohan Siva, Zhanyang Wang, Ufuk Topcu
Date:2025-10-30 18:46:34

As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-crafted temporal-logic specifications, offering limited expressiveness and accessibility. Deep learning approaches enable evaluation of plans against natural-language constraints, yet their opaque decision process invites misclassifications with potentially severe consequences. We introduce RepV, a neurosymbolic verifier that unifies both views by learning a latent space where safe and unsafe plans are linearly separable. Starting from a modest seed set of plans labeled by an off-the-shelf model checker, RepV trains a lightweight projector that embeds each plan, together with a language model-generated rationale, into a low-dimensional space; a frozen linear boundary then verifies compliance for unseen natural-language rules in a single forward pass. Beyond binary classification, RepV provides a probabilistic guarantee on the likelihood of correct verification based on its position in the latent space. This guarantee enables a guarantee-driven refinement of the planner, improving rule compliance without human annotations. Empirical evaluations show that RepV improves compliance prediction accuracy by up to 15% compared to baseline methods while adding fewer than 0.2M parameters. Furthermore, our refinement framework outperforms ordinary fine-tuning baselines across various planning domains. These results show that safety-separable latent spaces offer a scalable, plug-and-play primitive for reliable neurosymbolic plan verification. Code and data are available at: https://repv-project.github.io/.

The Denario project: Deep knowledge AI agents for scientific discovery

Authors:Francisco Villaescusa-Navarro, Boris Bolliet, Pablo Villanueva-Domingo, Adrian E. Bayer, Aidan Acquah, Chetana Amancharla, Almog Barzilay-Siegal, Pablo Bermejo, Camille Bilodeau, Pablo Cárdenas Ramírez, Miles Cranmer, Urbano L. França, ChangHoon Hahn, Yan-Fei Jiang, Raul Jimenez, Jun-Young Lee, Antonio Lerario, Osman Mamun, Thomas Meier, Anupam A. Ojha, Pavlos Protopapas, Shimanto Roy, David N. Spergel, Pedro Tarancón-Álvarez, Ujjwal Tiwari, Matteo Viel, Digvijay Wadekar, Chi Wang, Bonny Y. Wang, Licong Xu, Yossi Yovel, Shuwen Yue, Wen-Han Zhou, Qiyao Zhu, Jiajun Zou, Íñigo Zubeldia
Date:2025-10-30 18:00:12

We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at https://github.com/AstroPilot-AI/Denario. A Denario demo can also be run directly on the web at https://huggingface.co/spaces/astropilot-ai/Denario, and the full app will be deployed on the cloud.

Clone Deterministic 3D Worlds with Geometrically-Regularized World Models

Authors:Zaishuo Xia, Yukuan Lu, Xinyi Li, Yifan Xu, Yubei Chen
Date:2025-10-30 17:56:43

A world model is an internal model that simulates how the world evolves. Given past observations and actions, it predicts the future of both the embodied agent and its environment. Accurate world models are essential for enabling agents to think, plan, and reason effectively in complex, dynamic settings. Despite rapid progress, current world models remain brittle and degrade over long horizons. We argue that a central cause is representation quality: exteroceptive inputs (e.g., images) are high-dimensional, and lossy or entangled latents make dynamics learning unnecessarily hard. We therefore ask whether improving representation learning alone can substantially improve world-model performance. In this work, we take a step toward building a truly accurate world model by addressing a fundamental yet open problem: constructing a model that can fully clone and overfit to a deterministic 3D world. We propose Geometrically-Regularized World Models (GRWM), which enforces that consecutive points along a natural sensory trajectory remain close in latent representation space. This approach yields significantly improved latent representations that align closely with the true topology of the environment. GRWM is plug-and-play, requires only minimal architectural modification, scales with trajectory length, and is compatible with diverse latent generative backbones. Across deterministic 3D settings and long-horizon prediction tasks, GRWM significantly increases rollout fidelity and stability. Analyses show that its benefits stem from learning a latent manifold with superior geometric structure. These findings support a clear takeaway: improving representation learning is a direct and useful path to robust world models, delivering reliable long-horizon predictions without enlarging the dynamics module.

Pareto-Optimal Sampling and Resource Allocation for Timely Communication in Shared-Spectrum Low-Altitude Networks

Authors:Bowen Li, Jiping Luo, Themistoklis Charalambous, Nikolaos Pappas
Date:2025-10-30 17:09:09

Guaranteeing stringent data freshness for low-altitude unmanned aerial vehicles (UAVs) in shared spectrum forces a critical trade-off between two operational costs: the UAV's own energy consumption and the occupation of terrestrial channel resources. The core challenge is to satisfy the aerial data freshness while finding a Pareto-optimal balance between these costs. Leveraging predictive channel models and predictive UAV trajectories, we formulate a bi-objective Pareto optimization problem over a long-term planning horizon to jointly optimize the sampling timing for aerial traffic and the power and spectrum allocation for fair coexistence. However, the problem's non-convex, mixed-integer nature renders classical methods incapable of fully characterizing the complete Pareto frontier. Notably, we show monotonicity properties of the frontier, building on which we transform the bi-objective problem into several single-objective problems. We then propose a new graph-based algorithm and prove that it can find the complete set of Pareto optima with low complexity, linear in the horizon and near-quadratic in the resource block (RB) budget. Numerical comparisons show that our approach meets the stringent timeliness requirement and achieves a six-fold reduction in RB utilization or a 6 dB energy saving compared to benchmarks.

Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments

Authors:Xiaoyi He, Danggui Chen, Zhenshuo Zhang, Zimeng Bai
Date:2025-10-30 16:12:01

This paper presents a hierarchical path-planning and control framework that combines a high-level Deep Q-Network (DQN) for discrete sub-goal selection with a low-level Twin Delayed Deep Deterministic Policy Gradient (TD3) controller for continuous actuation. The high-level module selects behaviors and sub-goals; the low-level module executes smooth velocity commands. We design a practical reward shaping scheme (direction, distance, obstacle avoidance, action smoothness, collision penalty, time penalty, and progress), together with a LiDAR-based safety gate that prevents unsafe motions. The system is implemented in ROS + Gazebo (TurtleBot3) and evaluated with PathBench metrics, including success rate, collision rate, path efficiency, and re-planning efficiency, in dynamic and partially observable environments. Experiments show improved success rate and sample efficiency over single-algorithm baselines (DQN or TD3 alone) and rule-based planners, with better generalization to unseen obstacle configurations and reduced abrupt control changes. Code and evaluation scripts are available at the project repository.

Putting a Price on Immobility: Food Deliveries and Pricing Approaches

Authors:Runyu Wang, Haotian Zhong
Date:2025-10-30 16:05:21

Urban food delivery services have become an integral part of daily life, yet their mobility and environmental externalities remain poorly addressed by planners. Most studies neglect whether consumers pay enough to internalize the broader social costs of these services. This study quantifies the value of access to and use of food delivery services in Beijing, China, through two discrete choice experiments. The first measures willingness to accept compensation for giving up access, with a median value of CNY588 (approximately USD80). The second captures willingness to pay for reduced waiting time and improved reliability, showing valuations far exceeding typical delivery fees (e.g., CNY96.6/hour and CNY4.83/min at work). These results suggest a substantial consumer surplus and a clear underpricing problem. These findings highlight the need for urban planning to integrate digital service economies into pricing and mobility frameworks. We propose a quantity-based pricing model that targets delivery speed rather than order volume, addressing the primary source of externalities while maintaining net welfare gains. This approach offers a pragmatic, equity-conscious strategy to curb delivery-related congestion, emissions, and safety risks, especially in dense urban cores.

Fraction-variant VMAT planning for patients with complex gynecological and head-and-neck cancer

Authors:Nathan Torelli, Madalyne Day, Jan Unkelbach
Date:2025-10-30 14:53:19

Background and Purpose: Increasing the number of arcs in volumetric modulated arc therapy (VMAT) allows for better intensity modulation and may improve plan quality. However, this leads to longer delivery times, which may cause patient discomfort and increase intra-fractional motion. In this study, it was investigated whether the delivery of different VMAT plans in different fractions may improve the dosimetric quality and delivery efficiency for the treatment of patients with complex tumor geometries. Materials and Methods: A direct aperture optimization algorithm was developed which allows for the simultaneous optimization of different VMAT plans to be delivered in different fractions, based on their cumulative physical dose. Each VMAT plan is constrained to deliver a uniform dose within the target volume, such that the entire treatment does not alter the fractionation scheme and is robust against inter-fractional setup errors. This approach was evaluated in-silico for ten patients with gynecological and head-and-neck cancer. Results: For all patients, fraction-variant treatments achieved better target coverage and reduced the dose to critical organs-at-risk compared to fraction-invariant treatments that deliver the same plan in every fraction, where the dosimetric benefit was shown to increase with the number of different plans to be delivered. In addition, 1-arc and 2-arc fraction-variant treatments could approximate the dosimetric quality of 3-arc fraction-invariant treatments, while reducing the delivery time from 180 s to 60 s and 120 s, respectively. Conclusions: Fraction-variant VMAT treatments may achieve excellent dosimetric quality for patients with complex tumor geometries, while keeping the delivery time per fraction viable.

RoboOS-NeXT: A Unified Memory-based Framework for Lifelong, Scalable, and Robust Multi-Robot Collaboration

Authors:Huajie Tan, Cheng Chi, Xiansheng Chen, Yuheng Ji, Zhongxia Zhao, Xiaoshuai Hao, Yaoxu Lyu, Mingyu Cao, Junkai Zhao, Huaihai Lyu, Enshen Zhou, Ning Chen, Yankai Fu, Cheng Peng, Wei Guo, Dong Liang, Zhuo Chen, Mengsi Lyu, Chenrui He, Yulong Ao, Yonghua Lin, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang
Date:2025-10-30 14:26:40

The proliferation of collaborative robots across diverse tasks and embodiments presents a central challenge: achieving lifelong adaptability, scalable coordination, and robust scheduling in multi-agent systems. Existing approaches, from vision-language-action (VLA) models to hierarchical frameworks, fall short due to their reliance on limited or dividual-agent memory. This fundamentally constrains their ability to learn over long horizons, scale to heterogeneous teams, or recover from failures, highlighting the need for a unified memory representation. To address these limitations, we introduce RoboOS-NeXT, a unified memory-based framework for lifelong, scalable, and robust multi-robot collaboration. At the core of RoboOS-NeXT is the novel Spatio-Temporal-Embodiment Memory (STEM), which integrates spatial scene geometry, temporal event history, and embodiment profiles into a shared representation. This memory-centric design is integrated into a brain-cerebellum framework, where a high-level brain model performs global planning by retrieving and updating STEM, while low-level controllers execute actions locally. This closed loop between cognition, memory, and execution enables dynamic task allocation, fault-tolerant collaboration, and consistent state synchronization. We conduct extensive experiments spanning complex coordination tasks in restaurants, supermarkets, and households. Our results demonstrate that RoboOS-NeXT achieves superior performance across heterogeneous embodiments, validating its effectiveness in enabling lifelong, scalable, and robust multi-robot collaboration. Project website: https://flagopen.github.io/RoboOS/

Metacognition and Confidence Dynamics in Advice Taking from Generative AI

Authors:Clara Colombatto, Sean Rintel, Lev Tankelevitch
Date:2025-10-30 14:01:52

Generative Artificial Intelligence (GenAI) can aid humans in a wide range of tasks, but its effectiveness critically depends on users being able to evaluate the accuracy of GenAI outputs and their own expertise. Here we asked how confidence in self and GenAI contributes to decisions to seek and rely on advice from GenAI ('prospective confidence'), and how advice-taking in turn shapes this confidence ('retrospective confidence'). In a novel paradigm involving text generation, participants formulated plans for events, and could request advice from a GenAI (Study 1; N=200) or were randomly assigned to receive advice (Study 2; N=300), which they could rely on or ignore. Advice requests in Study 1 were related to higher prospective confidence in GenAI and lower confidence in self. Advice-seekers showed increased retrospective confidence in GenAI, while those who declined advice showed increased confidence in self. Random assignment in Study 2 revealed that advice exposure increases confidence in GenAI and in self, suggesting that GenAI advice-taking causally boosts retrospective confidence. These results were mirrored in advice reliance, operationalised as the textual similarity between GenAI advice and participants' responses, with reliance associated with increased retrospective confidence in both GenAI and self. Critically, participants who chose to obtain/rely on advice provided more detailed responses (likely due to the output's verbosity), but failed to check the output thoroughly, missing key information. These findings underscore a key role for confidence in interactions with GenAI, shaped by both prior beliefs about oneself and the reliability of AI, and context-dependent exposure to advice.

Co-Evolving Latent Action World Models

Authors:Yucen Wang, Fengming Zhang, De-Chuan Zhan, Li Zhao, Kaixin Wang, Jiang Bian
Date:2025-10-30 12:28:40

Adapting pre-trained video generation models into controllable world models via latent actions is a promising step towards creating generalist world models. The dominant paradigm adopts a two-stage approach that trains latent action model (LAM) and the world model separately, resulting in redundant training and limiting their potential for co-adaptation. A conceptually simple and appealing idea is to directly replace the forward dynamic model in LAM with a powerful world model and training them jointly, but it is non-trivial and prone to representational collapse. In this work, we propose CoLA-World, which for the first time successfully realizes this synergistic paradigm, resolving the core challenge in joint learning through a critical warm-up phase that effectively aligns the representations of the from-scratch LAM with the pre-trained world model. This unlocks a co-evolution cycle: the world model acts as a knowledgeable tutor, providing gradients to shape a high-quality LAM, while the LAM offers a more precise and adaptable control interface to the world model. Empirically, CoLA-World matches or outperforms prior two-stage methods in both video simulation quality and downstream visual planning, establishing a robust and efficient new paradigm for the field.

Beyond Imitation: Constraint-Aware Trajectory Generation with Flow Matching For End-to-End Autonomous Driving

Authors:Lin Liu, Guanyi Yu, Ziying Song, Junqiao Li, Caiyan Jia, Feiyang Jia, Peiliang Wu, Yandan Luo
Date:2025-10-30 09:24:34

Planning is a critical component of end-to-end autonomous driving. However, prevailing imitation learning methods often suffer from mode collapse, failing to produce diverse trajectory hypotheses. Meanwhile, existing generative approaches struggle to incorporate crucial safety and physical constraints directly into the generative process, necessitating an additional optimization stage to refine their outputs. To address these limitations, we propose CATG, a novel planning framework that leverages Constrained Flow Matching. Concretely, CATG explicitly models the flow matching process, which inherently mitigates mode collapse and allows for flexible guidance from various conditioning signals. Our primary contribution is the novel imposition of explicit constraints directly within the flow matching process, ensuring that the generated trajectories adhere to vital safety and kinematic rules. Secondly, CATG parameterizes driving aggressiveness as a control signal during generation, enabling precise manipulation of trajectory style. Notably, on the NavSim v2 challenge, CATG achieved 2nd place with an EPDMS score of 51.31 and was honored with the Innovation Award.

Simultaneous optimization of non-coplanar beam orientations and cumulative EQD2 distribution for high-dose reirradiation of locoregionally recurrent non-small cell lung cancer

Authors:Nathan Torelli, Jonas Willmann, Katja Daehler, Madalyne Day, Nicolaus Andratschke, Jan Unkelbach
Date:2025-10-30 08:56:29

Background and Purpose: Reirradiation for non-small cell lung cancer (NSCLC) is commonly delivered using coplanar techniques. In this study, we developed a beam orientation optimization algorithm for reirradiation planning to investigate whether the selection of favorable non-coplanar beam orientations may limit cumulative doses to critical organs-at-risk (OARs) and thus improve the therapeutic window. Materials and Methods: Fifteen cases of challenging high-dose reirradiation for locoregionally recurrent NSCLC were included in this in-silico study. For each patient, the dose distribution from the previous treatment was first mapped to the reirradiation planning CT using rigid dose registration, and subsequently converted to equivalent dose in 2 Gy fractions (EQD2). A 2-arc non-coplanar reirradiation plan, combining dynamic gantry and couch rotation, was then generated using an EQD2-based direct aperture optimization algorithm, which allows for the simultaneous optimization of the dynamic gantry-couch path and the cumulative EQD2 distribution. Non-coplanar reirradiation plans were benchmarked against 2-arc coplanar VMAT plans, which mimic state-of-the-art practice for reirradiation of NSCLC. Results: Non-coplanar reirradiation plans could reduce the maximum cumulative EQD2 to critical OARs such as bronchial tree, esophagus, thoracic wall and trachea by at least 5 Gy2 for 6 out of 15 patients compared to coplanar reirradiation plans. At the same time, target coverage and lung EQD2 metrics were comparable for both methods. Conclusions: The automated selection of favorable non-coplanar beam orientations may reduce the maximum cumulative EQD2 to critical OARs in challenging thoracic reirradiation cases. This allows to explore either better OAR sparing or dose-escalation in future clinical studies.

Adaptive Trajectory Refinement for Optimization-based Local Planning in Narrow Passages

Authors:Hahjin Lee, Young J. Kim
Date:2025-10-30 04:53:08

Trajectory planning for mobile robots in cluttered environments remains a major challenge due to narrow passages, where conventional methods often fail or generate suboptimal paths. To address this issue, we propose the adaptive trajectory refinement algorithm, which consists of two main stages. First, to ensure safety at the path-segment level, a segment-wise conservative collision test is applied, where risk-prone trajectory path segments are recursively subdivided until collision risks are eliminated. Second, to guarantee pose-level safety, pose correction based on penetration direction and line search is applied, ensuring that each pose in the trajectory is collision-free and maximally clear from obstacles. Simulation results demonstrate that the proposed method achieves up to 1.69x higher success rates and up to 3.79x faster planning times than state-of-the-art approaches. Furthermore, real-world experiments confirm that the robot can safely pass through narrow passages while maintaining rapid planning performance.

GUI Knowledge Bench: Revealing the Knowledge Gap Behind VLM Failures in GUI Tasks

Authors:Chenrui Shi, Zedong Yu, Zhi Gao, Ruining Feng, Enqi Liu, Yuwei Wu, Yunde Jia, Liuyu Xiang, Zhaofeng He, Qing Li
Date:2025-10-30 03:22:30

Large vision language models (VLMs) have advanced graphical user interface (GUI) task automation but still lag behind humans. We hypothesize this gap stems from missing core GUI knowledge, which existing training schemes (such as supervised fine tuning and reinforcement learning) alone cannot fully address. By analyzing common failure patterns in GUI task execution, we distill GUI knowledge into three dimensions: (1) interface perception, knowledge about recognizing widgets and system states; (2) interaction prediction, knowledge about reasoning action state transitions; and (3) instruction understanding, knowledge about planning, verifying, and assessing task completion progress. We further introduce GUI Knowledge Bench, a benchmark with multiple choice and yes/no questions across six platforms (Web, Android, MacOS, Windows, Linux, IOS) and 292 applications. Our evaluation shows that current VLMs identify widget functions but struggle with perceiving system states, predicting actions, and verifying task completion. Experiments on real world GUI tasks further validate the close link between GUI knowledge and task success. By providing a structured framework for assessing GUI knowledge, our work supports the selection of VLMs with greater potential prior to downstream training and provides insights for building more capable GUI agents.

FlexICL: A Flexible Visual In-context Learning Framework for Elbow and Wrist Ultrasound Segmentation

Authors:Yuyue Zhou, Jessica Knight, Shrimanti Ghosh, Banafshe Felfeliyan, Jacob L. Jaremko, Abhilash R. Hareendranathan
Date:2025-10-30 00:53:26

Elbow and wrist fractures are the most common fractures in pediatric populations. Automatic segmentation of musculoskeletal structures in ultrasound (US) can improve diagnostic accuracy and treatment planning. Fractures appear as cortical defects but require expert interpretation. Deep learning (DL) can provide real-time feedback and highlight key structures, helping lightly trained users perform exams more confidently. However, pixel-wise expert annotations for training remain time-consuming and costly. To address this challenge, we propose FlexICL, a novel and flexible in-context learning (ICL) framework for segmenting bony regions in US images. We apply it to an intra-video segmentation setting, where experts annotate only a small subset of frames, and the model segments unseen frames. We systematically investigate various image concatenation techniques and training strategies for visual ICL and introduce novel concatenation methods that significantly enhance model performance with limited labeled data. By integrating multiple augmentation strategies, FlexICL achieves robust segmentation performance across four wrist and elbow US datasets while requiring only 5% of the training images. It outperforms state-of-the-art visual ICL models like Painter, MAE-VQGAN, and conventional segmentation models like U-Net and TransUNet by 1-27% Dice coefficient on 1,252 US sweeps. These initial results highlight the potential of FlexICL as an efficient and scalable solution for US image segmentation well suited for medical imaging use cases where labeled data is scarce.

Budget Forecasting and Integrated Strategic Planning for Leaders

Authors:Matt Salehi
Date:2025-10-30 00:22:50

This study explored how advanced budgeting techniques and economic indicators influence funding levels and strategic alignment in California Community Colleges (CCCs). Despite widespread implementation of budgeting reforms, many CCCs continue to face challenges aligning financial planning with institutional missions, particularly in supporting diversity, equity, and inclusion (DEI) initiatives. The study used a quantitative correlational design, analyzing 30 years of publicly available economic data, including unemployment rates, GDP growth, and CPI, in relation to CCC funding trends. Results revealed a strong positive correlation between GDP growth and CCC funding levels, as well as between CPI and funding levels, underscoring the predictive value of macroeconomic indicators in budget planning. These findings emphasize the need for educational leaders to integrate economic forecasting into budget planning processes to safeguard institutional effectiveness and sustain programs serving underrepresented student populations.

Climate Adaptation-Aware Flood Prediction for Coastal Cities Using Deep Learning

Authors:Bilal Hassan, Areg Karapetyan, Aaron Chung Hin Chow, Samer Madanat
Date:2025-10-29 23:23:11

Climate change and sea-level rise (SLR) pose escalating threats to coastal cities, intensifying the need for efficient and accurate methods to predict potential flood hazards. Traditional physics-based hydrodynamic simulators, although precise, are computationally expensive and impractical for city-scale coastal planning applications. Deep Learning (DL) techniques offer promising alternatives, however, they are often constrained by challenges such as data scarcity and high-dimensional output requirements. Leveraging a recently proposed vision-based, low-resource DL framework, we develop a novel, lightweight Convolutional Neural Network (CNN)-based model designed to predict coastal flooding under variable SLR projections and shoreline adaptation scenarios. Furthermore, we demonstrate the ability of the model to generalize across diverse geographical contexts by utilizing datasets from two distinct regions: Abu Dhabi and San Francisco. Our findings demonstrate that the proposed model significantly outperforms state-of-the-art methods, reducing the mean absolute error (MAE) in predicted flood depth maps on average by nearly 20%. These results highlight the potential of our approach to serve as a scalable and practical tool for coastal flood management, empowering decision-makers to develop effective mitigation strategies in response to the growing impacts of climate change. Project Page: https://caspiannet.github.io/