planning - 2025-09-29

WoW: Towards a World omniscient World model Through Embodied Interaction

Authors:Xiaowei Chi, Peidong Jia, Chun-Kai Fan, Xiaozhu Ju, Weishi Mi, Kevin Zhang, Zhiyuan Qin, Wanxin Tian, Kuangzhi Ge, Hao Li, Zezhong Qian, Anthony Chen, Qiang Zhou, Yueru Jia, Jiaming Liu, Yong Dai, Qingpo Wuwu, Chengyu Bai, Yu-Kai Wang, Ying Li, Lizhang Chen, Yong Bao, Zhiyuan Jiang, Jiacheng Zhu, Kai Tang, Ruichuan An, Yulin Luo, Qiuxuan Feng, Siyuan Zhou, Chi-min Chan, Chengkai Hou, Wei Xue, Sirui Han, Yike Guo, Shanghang Zhang, Jian Tang

Date:2025-09-26 17:59:07

Humans develop an understanding of intuitive physics through active interaction with the world. This approach is in stark contrast to current video models, such as Sora, which rely on passive observation and therefore struggle with grasping physical causality. This observation leads to our central hypothesis: authentic physical intuition of the world model must be grounded in extensive, causally rich interactions with the real world. To test this hypothesis, we present WoW, a 14-billion-parameter generative world model trained on 2 million robot interaction trajectories. Our findings reveal that the model's understanding of physics is a probabilistic distribution of plausible outcomes, leading to stochastic instabilities and physical hallucinations. Furthermore, we demonstrate that this emergent capability can be actively constrained toward physical realism by SOPHIA, where vision-language model agents evaluate the DiT-generated output and guide its refinement by iteratively evolving the language instructions. In addition, a co-trained Inverse Dynamics Model translates these refined plans into executable robotic actions, thus closing the imagination-to-action loop. We establish WoWBench, a new benchmark focused on physical consistency and causal reasoning in video, where WoW achieves state-of-the-art performance in both human and autonomous evaluation, demonstrating strong ability in physical causality, collision dynamics, and object permanence. Our work provides systematic evidence that large-scale, real-world interaction is a cornerstone for developing physical intuition in AI. Models, data, and benchmarks will be open-sourced.

An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment

Authors:Xiaoyun Qiu, Haichao Liu, Yue Pan, Jun Ma, Xinhu Zheng

Date:2025-09-26 16:31:34

In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.

Ontological foundations for contrastive explanatory narration of robot plans

Authors:Alberto Olivares-Alarcos, Sergi Foix, Júlia Borràs, Gerard Canal, Guillem Alenyà

Date:2025-09-26 15:37:47

Mutual understanding of artificial agents' decisions is key to ensuring a trustworthy and successful human-robot interaction. Hence, robots are expected to make reasonable decisions and communicate them to humans when needed. In this article, the focus is on an approach to modeling and reasoning about the comparison of two competing plans, so that robots can later explain the divergent result. First, a novel ontological model is proposed to formalize and reason about the differences between competing plans, enabling the classification of the most appropriate one (e.g., the shortest, the safest, the closest to human preferences, etc.). This work also investigates the limitations of a baseline algorithm for ontology-based explanatory narration. To address these limitations, a novel algorithm is presented, leveraging divergent knowledge between plans and facilitating the construction of contrastive narratives. Through empirical evaluation, it is observed that the explanations excel beyond the baseline method.

Improving accuracy in short mortality rate series: Exploring Multi-step Forecasting Approaches in Hybrid Systems

Authors:Filipe C. L. Duarte, Paulo S. G. de Mattos Neto, Paulo R. A. Firmino

Date:2025-09-26 14:22:48

The decline in interest rates and economic stabilization has heightened the importance of accurate mortality rate forecasting, particularly in insurance and pension markets. Multi-step-ahead predictions are crucial for public health, demographic planning, and insurance risk assessments; however, they face challenges when data are limited. Hybrid systems that combine statistical and Machine Learning (ML) models offer a promising solution for handling both linear and nonlinear patterns. This study evaluated the impact of different multi-step forecasting approaches (Recursive, Direct, and Multi-Input Multi-Output) and ML models on the accuracy of hybrid systems. Results from 12 datasets and 21 models show that the selection of both the multi-step approach and the ML model is essential for improving performance, with the ARIMA-LSTM hybrid using a recursive approach outperforming other models in most cases.

Engaging and Educating Eclipse Observers Through Workshops, Media, Planetarium shows and Citizen Science

Authors:Patricia H. Reiff, Carolyn T. Sumners, Charles H. Gardner, Amir Caspi, Sarah Kovac

Date:2025-09-26 13:48:06

The Heliophysics Big Year was an extended year where major solar events engaged the public. It included two eclipses (annular on October 14, 2023 and total on April 8, 2024), plus solar max and the Parker Solar Probe perihelion December 24, 2024. After the eclipse of 2017, many millions more Americans planned to view the solar corona. We expanded our eclipse website with activities, citizen science projects, resources, training videos, equipment, and external links. We were the Southwest Regional Coordinator for Citizen CATE 2024 project, training the state coordinators and their teams with the equipment and procedures. We trained teachers at local, regional, national, and international workshops, providing eclipse viewing cards, lenses to make solar cup projectors, a safe viewing screen pattern, and access to the training materials. We made presentations to the media and hosted public events to demonstrate safe eclipse viewing techniques. HMNS hosted live viewing for the annular and total plus solstice and equinox events, reaching tens of thousands of people. HMNS also secured a grant to provide 100 eclipse viewing cards for every public school (8,800+) in Texas. We distributed another 57,000 eclipse viewers to teachers and the public. We appeared in media both in advance of the eclipses and as live commentators. The most lasting and impactful product was our planetarium show Totality, which was given away free and shown in various formats (flatscreen, fisheye, or prewarped). Over 180,000 views of the show and its animations have been documented. We continued to improve our space weather forecasting site, which correctly predicted the major solar storms of May 10-11 and October 8-10, 2024. In total, we reached nearly two million learners.

Aerial Path Planning for Urban Geometry and Texture Co-Capture

Authors:Weidan Xiong, Bochuan Zeng, Ziyu Hu, Jianwei Guo, Ke Xie, Hui Huang

Date:2025-09-26 11:38:42

Recent advances in image acquisition and scene reconstruction have enabled the generation of high-quality structural urban scene geometry, given sufficient site information. However, current capture techniques often overlook the crucial importance of texture quality, resulting in noticeable visual artifacts in the textured models. In this work, we introduce the urban geometry and texture co-capture problem under limited prior knowledge before a site visit. The only inputs are a 2D building contour map of the target area and a safe flying altitude above the buildings. We propose an innovative aerial path planning framework designed to co-capture images for reconstructing both structured geometry and high-fidelity textures. To evaluate and guide view planning, we introduce a comprehensive texture quality assessment system, including two novel metrics tailored for building facades. Firstly, our method generates high-quality vertical dipping views and horizontal planar views to effectively capture both geometric and textural details. A multi-objective optimization strategy is then proposed to jointly maximize texture fidelity, improve geometric accuracy, and minimize the cost associated with aerial views. Furthermore, we present a sequential path planning algorithm that accounts for texture consistency during image capture. Extensive experiments on large-scale synthetic and real-world urban datasets demonstrate that our approach effectively produces image sets suitable for concurrent geometric and texture reconstruction, enabling the creation of realistic, textured scene proxies at low operational cost.

Machine Learning-based beam delivery time model for Mevion 250i with Hyperscan technology

Authors:Giorgio Cartechini, Francesco Giuseppe Cordoni, Mirko Unipan, Ilaria Rinaldi

Date:2025-09-26 09:50:41

Purpose: Accurate prediction of beam delivery time (BDT) is essential for operational efficiency, 4D dose calculations, and advanced proton therapy techniques. Despite its importance, no machine-specific BDT model exists for Mevion systems. Methods: We developed the first machine learning-based BDT model for the Mevion S250i Hyperscan system. Institutional machine log files from 11 patients (1120 files) were used to extract features including spot position, energy layer changes, Adaptive Aperture (AA) movements, and spot charge. Inter-pulse time ($\Delta$T) was the target variable. A Random Forest model was trained with cross-validation and tested on held-out data. SHAP (Shapley Additive Explanations) analysis was used to quantify feature contributions. Results: The model achieved mean absolute errors (MAE) ranging from 0.9 ms for short intervals (<50 ms) to 222 ms for long delays (>1000 ms). AA movements were the dominant global predictor for $\Delta$T > 50 ms, while spot positions and pulse charge influenced short intervals. Energy changes had minor global impact but locally contributed to large $\Delta$T values, consistent with range modulator physics. The model was tested in two clinical applications: volumetric repainting and 4D dose recalculation for interplay evaluation. Predicted cumulative delivery times deviated by only -1.7% from machine log data, and dosimetric metrics (D98, D95, V95) remained within intrinsic delivery variability. Conclusions: This study presents the first machine-specific BDT model for the Mevion S250i, accurately capturing temporal dynamics and predictive performance. SHAP analysis provided insight into system behavior, highlighting the roles of AA adjustments, energy switching, and spot positioning. The model supports applications in interplay assessment, 4D dose calculation, and delivery time-based plan optimization.

Personalized Oncology: Feasibility of Evaluating Treatment Effects for Individual Patients

Authors:Lydia Jang, Stefan Konigorski

Date:2025-09-26 09:11:11

The effectiveness of personalized oncology treatments ultimately depends on whether outcomes can be causally attributed to the treatment. Advances in precision oncology have improved molecular profiling of individuals, and tailored therapies have led to more effective treatments for select patient groups. However, treatment responses still vary among individuals. As cancer is a heterogeneous and dynamic disease with varying treatment outcomes across different molecular types and resistance mechanisms, it requires customized approaches to identify cause-and-effect relationships. N-of-1 trials, or single-subject clinical trials, are designed to evaluate individual treatment effects. Several works have described different causal frameworks to identify treatment effects in N-of-1 trials, yet whether these approaches can be extended to single-cancer patient settings remains unclear. To explore this possibility, a longitudinal dataset from a single metastatic cancer patient with adaptively chosen treatments was considered. The dataset consisted of a detailed treatment plan as well as biomarker and lesion measurements recorded over time. After data processing, a treatment period with sufficient data points to conduct causal inference was selected. Under this setting, a causal framework was applied to define an estimand, identify causal relationships and assumptions, and calculate an individual-specific treatment effect using a time-varying g-formula. Through this application, we illustrate explicitly when and how causal treatment effects can be estimated in single-patient oncology settings. Our findings not only demonstrate the feasibility of applying causal methods in a single-cancer patient setting but also offer a blueprint for using causal methods across a broader spectrum of cancer types in individualized settings.

Generalizing Multi-Objective Search via Objective-Aggregation Functions

Authors:Hadar Peer, Eyal Weiss, Ron Alterovitz, Oren Salzman

Date:2025-09-26 09:06:03

Multi-objective search (MOS) has become essential in robotics, as real-world robotic systems need to simultaneously balance multiple, often conflicting objectives. Recent works explore complex interactions between objectives, leading to problem formulations that do not allow the usage of out-of-the-box state-of-the-art MOS algorithms. In this paper, we suggest a generalized problem formulation that optimizes solution objectives via aggregation functions of hidden (search) objectives. We show that our formulation supports the application of standard MOS algorithms, necessitating only to properly extend several core operations to reflect the specific aggregation functions employed. We demonstrate our approach in several diverse robotics planning problems, spanning motion-planning for navigation, manipulation and planning fr medical systems under obstacle uncertainty as well as inspection planning, and route planning with different road types. We solve the problems using state-of-the-art MOS algorithms after properly extending their core operations, and provide empirical evidence that they outperform by orders of magnitude the vanilla versions of the algorithms applied to the same problems but without objective aggregation.

Effect of Gait Design on Proprioceptive Sensing of Terrain Properties in a Quadrupedal Robot

Authors:Ethan Fulcher, J. Diego Caporale, Yifeng Zhang, John Ruck, Feifei Qian

Date:2025-09-26 08:49:05

In-situ robotic exploration is an important tool for advancing knowledge of geological processes that describe the Earth and other Planetary bodies. To inform and enhance operations for these roving laboratories, it is imperative to understand the terramechanical properties of their environments, especially for traversing on loose, deformable substrates. Recent research suggested that legged robots with direct-drive and low-gear ratio actuators can sensitively detect external forces, and therefore possess the potential to measure terrain properties with their legs during locomotion, providing unprecedented sampling speed and density while accessing terrains previously too risky to sample. This paper explores these ideas by investigating the impact of gait on proprioceptive terrain sensing accuracy, particularly comparing a sensing-oriented gait, Crawl N' Sense, with a locomotion-oriented gait, Trot-Walk. Each gait's ability to measure the strength and texture of deformable substrate is quantified as the robot locomotes over a laboratory transect consisting of a rigid surface, loose sand, and loose sand with synthetic surface crusts. Our results suggest that with both the sensing-oriented crawling gait and locomotion-oriented trot gait, the robot can measure a consistent difference in the strength (in terms of penetration resistance) between the low- and high-resistance substrates; however, the locomotion-oriented trot gait contains larger magnitude and variance in measurements. Furthermore, the slower crawl gait can detect brittle ruptures of the surface crusts with significantly higher accuracy than the faster trot gait. Our results offer new insights that inform legged robot "sensing during locomotion" gait design and planning for scouting the terrain and producing scientific measurements on other worlds to advance our understanding of their geology and formation.

Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Authors:Saurav Jha, Stefan K. Ehrlich

Date:2025-09-26 07:49:49

Healthcare robotics requires robust multimodal perception and reasoning to ensure safety in dynamic clinical environments. Current Vision-Language Models (VLMs) demonstrate strong general-purpose capabilities but remain limited in temporal reasoning, uncertainty estimation, and structured outputs needed for robotic planning. We present a lightweight agentic multimodal framework for video-based scene understanding. Combining the Qwen2.5-VL-3B-Instruct model with a SmolAgent-based orchestration layer, it supports chain-of-thought reasoning, speech-vision fusion, and dynamic tool invocation. The framework generates structured scene graphs and leverages a hybrid retrieval module for interpretable and adaptive reasoning. Evaluations on the Video-MME benchmark and a custom clinical dataset show competitive accuracy and improved robustness compared to state-of-the-art VLMs, demonstrating its potential for applications in robot-assisted surgery, patient monitoring, and decision support.

Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning

Authors:Sigmund Hennum Høeg, Aksel Vaaler, Chaoqi Liu, Olav Egeland, Yilun Du

Date:2025-09-26 07:06:26

Constructing robots to accomplish long-horizon tasks is a long-standing challenge within artificial intelligence. Approaches using generative methods, particularly Diffusion Models, have gained attention due to their ability to model continuous robotic trajectories for planning and control. However, we show that these models struggle with long-horizon tasks that involve complex decision-making and, in general, are prone to confusing different modes of behavior, leading to failure. To remedy this, we propose to augment continuous trajectory generation by simultaneously generating a high-level symbolic plan. We show that this requires a novel mix of discrete variable diffusion and continuous diffusion, which dramatically outperforms the baselines. In addition, we illustrate how this hybrid diffusion process enables flexible trajectory synthesis, allowing us to condition synthesized actions on partial and complete symbolic conditions.

FlowDrive: moderated flow matching with data balancing for trajectory planning

Authors:Lingguang Wang, Ömer Şahin Taş, Marlon Steiner, Christoph Stiller

Date:2025-09-26 06:49:22

Learning-based planners are sensitive to the long-tailed distribution of driving data. Common maneuvers dominate datasets, while dangerous or rare scenarios are sparse. This imbalance can bias models toward the frequent cases and degrade performance on critical scenarios. To tackle this problem, we compare balancing strategies for sampling training data and find reweighting by trajectory pattern an effective approach. We then present FlowDrive, a flow-matching trajectory planner that learns a conditional rectified flow to map noise directly to trajectory distributions with few flow-matching steps. We further introduce moderated, in-the-loop guidance that injects small perturbation between flow steps to systematically increase trajectory diversity while remaining scene-consistent. On nuPlan and the interaction-focused interPlan benchmarks, FlowDrive achieves state-of-the-art results among learning-based planners and approaches methods with rule-based refinements. After adding moderated guidance and light post-processing (FlowDrive*), it achieves overall state-of-the-art performance across nearly all benchmark splits.

Learnable Conformal Prediction with Context-Aware Nonconformity Functions for Robotic Planning and Perception

Authors:Divake Kumar, Sina Tayebati, Francesco Migliarba, Ranganath Krishnan, Amit Ranjan Trivedi

Date:2025-09-26 06:44:58

Deep learning models in robotics often output point estimates with poorly calibrated confidences, offering no native mechanism to quantify predictive reliability under novel, noisy, or out-of-distribution inputs. Conformal prediction (CP) addresses this gap by providing distribution-free coverage guarantees, yet its reliance on fixed nonconformity scores ignores context and can yield intervals that are overly conservative or unsafe. We address this with Learnable Conformal Prediction (LCP), which replaces fixed scores with a lightweight neural function that leverages geometric, semantic, and task-specific features to produce context-aware uncertainty sets. LCP maintains CP's theoretical guarantees while reducing prediction set sizes by 18% in classification, tightening detection intervals by 52%, and improving path planning safety from 72% to 91% success with minimal overhead. Across three robotic tasks on seven benchmarks, LCP consistently outperforms Standard CP and ensemble baselines. In classification on CIFAR-100 and ImageNet, it achieves smaller set sizes (4.7-9.9% reduction) at target coverage. For object detection on COCO, BDD100K, and Cityscapes, it produces 46-54% tighter bounding boxes. In path planning through cluttered environments, it improves success to 91.5% with only 4.5% path inflation, compared to 12.2% for Standard CP. The method is lightweight (approximately 4.8% runtime overhead, 42 KB memory) and supports online adaptation, making it well suited to resource-constrained autonomous systems. Hardware evaluation shows LCP adds less than 1% memory and 15.9% inference overhead, yet sustains 39 FPS on detection tasks while being 7.4 times more energy-efficient than ensembles.

MultiCrafter: High-Fidelity Multi-Subject Generation via Spatially Disentangled Attention and Identity-Aware Reinforcement Learning

Authors:Tao Wu, Yibo Jiang, Yehao Lu, Zhizhong Wang, Zeyi Huang, Zequn Qin, Xi Li

Date:2025-09-26 06:41:43

Multi-subject image generation aims to synthesize user-provided subjects in a single image while preserving subject fidelity, ensuring prompt consistency, and aligning with human aesthetic preferences. However, existing methods, particularly those built on the In-Context-Learning paradigm, are limited by their reliance on simple reconstruction-based objectives, leading to both severe attribute leakage that compromises subject fidelity and failing to align with nuanced human preferences. To address this, we propose MultiCrafter, a framework that ensures high-fidelity, preference-aligned generation. First, we find that the root cause of attribute leakage is a significant entanglement of attention between different subjects during the generation process. Therefore, we introduce explicit positional supervision to explicitly separate attention regions for each subject, effectively mitigating attribute leakage. To enable the model to accurately plan the attention region of different subjects in diverse scenarios, we employ a Mixture-of-Experts architecture to enhance the model's capacity, allowing different experts to focus on different scenarios. Finally, we design a novel online reinforcement learning framework to align the model with human preferences, featuring a scoring mechanism to accurately assess multi-subject fidelity and a more stable training strategy tailored for the MoE architecture. Experiments validate that our framework significantly improves subject fidelity while aligning with human preferences better.

Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning

Authors:Xianghua Zeng, Hao Peng, Angsheng Li, Yicheng Pan

Date:2025-09-26 06:24:06

Diffusion-based generative methods have shown promising potential for modeling trajectories from offline reinforcement learning (RL) datasets, and hierarchical diffusion has been introduced to mitigate variance accumulation and computational challenges in long-horizon planning tasks. However, existing approaches typically assume a fixed two-layer diffusion hierarchy with a single predefined temporal scale, which limits adaptability to diverse downstream tasks and reduces flexibility in decision making. In this work, we propose SIHD, a novel Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning in long-horizon environments with sparse rewards. Specifically, we analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively, enabling flexible trajectory modeling across multiple temporal scales. Rather than relying on reward predictions from localized sub-trajectories, we quantify the structural information gain of each state community and use it as a conditioning signal within the corresponding diffusion layer. To reduce overreliance on offline datasets, we introduce a structural entropy regularizer that encourages exploration of underrepresented states while avoiding extrapolation errors from distributional shifts. Extensive evaluations on challenging offline RL tasks show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance and demonstrates superior generalization across diverse scenarios.

EqDiff-CT: Equivariant Conditional Diffusion model for CT Image Synthesis from CBCT

Authors:Alzahra Altalib, Chunhui Li, Alessandro Perelli

Date:2025-09-26 05:51:59

Cone-beam computed tomography (CBCT) is widely used for image-guided radiotherapy (IGRT). It provides real time visualization at low cost and dose. However, photon scattering and beam hindrance cause artifacts in CBCT. These include inaccurate Hounsfield Units (HU), reducing reliability for dose calculation, and adaptive planning. By contrast, computed tomography (CT) offers better image quality and accurate HU calibration but is usually acquired offline and fails to capture intra-treatment anatomical changes. Thus, accurate CBCT-to-CT synthesis is needed to close the imaging-quality gap in adaptive radiotherapy workflows. To cater to this, we propose a novel diffusion-based conditional generative model, coined EqDiff-CT, to synthesize high-quality CT images from CBCT. EqDiff-CT employs a denoising diffusion probabilistic model (DDPM) to iteratively inject noise and learn latent representations that enable reconstruction of anatomically consistent CT images. A group-equivariant conditional U-Net backbone, implemented with e2cnn steerable layers, enforces rotational equivariance (cyclic C4 symmetry), helping preserve fine structural details while minimizing noise and artifacts. The system was trained and validated on the SynthRAD2025 dataset, comprising CBCT-CT scans across multiple head-and-neck anatomical sites, and we compared it with advanced methods such as CycleGAN and DDPM. EqDiff-CT provided substantial gains in structural fidelity, HU accuracy and quantitative metrics. Visual findings further confirm the improved recovery, sharper soft tissue boundaries, and realistic bone reconstructions. The findings suggest that the diffusion model has offered a robust and generalizable framework for CBCT improvements. The proposed solution helps in improving the image quality as well as the clinical confidence in the CBCT-guided treatment planning and dose calculations.

DyRo-MCTS: A Robust Monte Carlo Tree Search Approach to Dynamic Job Shop Scheduling

Authors:Ruiqi Chen, Yi Mei, Fangfang Zhang, Mengjie Zhang

Date:2025-09-26 05:35:51

Dynamic job shop scheduling, a fundamental combinatorial optimisation problem in various industrial sectors, poses substantial challenges for effective scheduling due to frequent disruptions caused by the arrival of new jobs. State-of-the-art methods employ machine learning to learn scheduling policies offline, enabling rapid responses to dynamic events. However, these offline policies are often imperfect, necessitating the use of planning techniques such as Monte Carlo Tree Search (MCTS) to improve performance at online decision time. The unpredictability of new job arrivals complicates online planning, as decisions based on incomplete problem information are vulnerable to disturbances. To address this issue, we propose the Dynamic Robust MCTS (DyRo-MCTS) approach, which integrates action robustness estimation into MCTS. DyRo-MCTS guides the production environment toward states that not only yield good scheduling outcomes but are also easily adaptable to future job arrivals. Extensive experiments show that DyRo-MCTS significantly improves the performance of offline-learned policies with negligible additional online planning time. Moreover, DyRo-MCTS consistently outperforms vanilla MCTS across various scheduling scenarios. Further analysis reveals that its ability to make robust scheduling decisions leads to long-term, sustainable performance gains under disturbances.

MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation

Authors:Yu Shang, Yangcheng Yu, Xin Zhang, Xin Jin, Haisheng Su, Wei Wu, Yong Li

Date:2025-09-26 02:54:36

Embodied action planning is a core challenge in robotics, requiring models to generate precise actions from visual observations and language instructions. While video generation world models are promising, their reliance on pixel-level reconstruction often introduces visual redundancies that hinder action decoding and generalization. Latent world models offer a compact, motion-aware representation, but overlook the fine-grained details critical for precise manipulation. To overcome these limitations, we propose MoWM, a mixture-of-world-model framework that fuses representations from hybrid world models for embodied action planning. Our approach uses motion-aware representations from a latent model as a high-level prior, which guides the extraction of fine-grained visual features from the pixel space model. This design allows MoWM to highlight the informative visual details needed for action decoding. Extensive evaluations on the CALVIN benchmark demonstrate that our method achieves state-of-the-art task success rates and superior generalization. We also provide a comprehensive analysis of the strengths of each feature space, offering valuable insights for future research in embodied planning. The code is available at: https://github.com/tsinghua-fib-lab/MoWM.

UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

Authors:Jiannan Xiang, Yun Zhu, Lei Shu, Maria Wang, Lijun Yu, Gabriel Barcik, James Lyon, Srinivas Sunkara, Jindong Chen

Date:2025-09-26 01:02:00

Developing and testing user interfaces (UIs) and training AI agents to interact with them are challenging due to the dynamic and diverse nature of real-world mobile environments. Existing methods often rely on cumbersome physical devices or limited static analysis of screenshots, which hinders scalable testing and the development of intelligent UI agents. We introduce UISim, a novel image-based UI simulator that offers a dynamic and interactive platform for exploring mobile phone environments purely from screen images. Our system employs a two-stage method: given an initial phone screen image and a user action, it first predicts the abstract layout of the next UI state, then synthesizes a new, visually consistent image based on this predicted layout. This approach enables the realistic simulation of UI transitions. UISim provides immediate practical benefits for UI testing, rapid prototyping, and synthetic data generation. Furthermore, its interactive capabilities pave the way for advanced applications, such as UI navigation task planning for AI agents. Our experimental results show that UISim outperforms end-to-end UI generation baselines in generating realistic and coherent subsequent UI states, highlighting its fidelity and potential to streamline UI development and enhance AI agent training.

MusicWeaver: Coherent Long-Range and Editable Music Generation from a Beat-Aligned Structural Plan

Authors:Xuanchen Wang, Heng Wang, Weidong Cai

Date:2025-09-26 00:23:42

Current music generators capture local textures but often fail to model long-range structure, leading to off-beat outputs, weak section transitions, and limited editing capability. We present MusicWeaver, a music generation model conditioned on a beat-aligned structural plan. This plan serves as an editable intermediate between the input prompt and the generated music, preserving global form and enabling professional, localized edits. MusicWeaver consists of a planner, which translates prompts into a structural plan encoding musical form and compositional cues, and a diffusion-based generator, which synthesizes music under the plan's guidance. To assess generation and editing quality, we introduce two metrics: the Structure Coherence Score (SCS) for evaluating long-range form and timing, and the Edit Fidelity Score (EFS) for measuring the accuracy of realizing plan edits. Experiments demonstrate that MusicWeaver achieves state-of-the-art fidelity and controllability, producing music closer to human-composed works. Music results can be found on our project page: https://musicweaver.github.io/.

Verification, Validation, and Uncertainty Quantification (VVUQ) of PBPK Models for Theranostic Digital Twins: Towards Reliable Model-Informed Treatment Planning for Radiopharmaceutical Therapies

Authors:Nouran R. R. Zaid, Deni Hardiansyah, Tahir Yusufaly, Arman Rahmim

Date:2025-09-25 23:48:22

Physiologically based pharmacokinetic (PBPK) models provide a mechanistic framework for simulating radiopharmaceutical kinetics and estimating patient-specific absorbed doses (ADs). PBPK models incorporate prior knowledge of patient physiology and drug-specific properties, which can enhance the models predictive performance. PBPK models can ultimately be used to predict treatment response and thereby enable theranostic digital twins (TDTs) for personalized treatment planning in radiopharmaceutical therapies (RPTs). To achieve this potential of precision RPT, however, the reliability of the underlying modeling, including the PBPK-based dosimetry, must be established through rigorous verification, validation, and uncertainty quantification (VVUQ). This review outlines the role of VVUQ in ensuring the credibility and clinical applicability of PBPK models in radiotheranostics. Key methodologies for PBPK model VVUQ are discussed, including goodness-of-fit (GOF) assessment, prediction evaluation, and uncertainty propagation.

Causal Machine Learning Analysis of Empirical Relative Biological Effectiveness (RBE) for Mandible Osteoradionecrosis in Head and Neck Cancer Radiotherapy

Authors:Jingyuan Chen, Zhong Liu, Yunze Yang, Olivia M. Muller, Zhengliang Liu, Tianming Liu, Lei Zeng, Robert L. Foote, Daniel J. Ma, Samir H. Patel, Wei Liu

Date:2025-09-25 22:37:57

Mandible Osteoradionecrosis (ORN) is one of the most severe adverse events (AEs) for head and neck (H&N) cancer radiotherapy. Previous retrospective investigations on real-world data relied on conventional statistical models that primarily elucidate correlation rather than establishing causal relationships. Through the novel causal machine learning, we aim to obtain empirical relative biological effectiveness (RBE) for ORN in H&N cancer patients treated with pencil-beam-scanning proton therapy (PBSPT). 335 patients treated by PBSPT and 931 patients treated by volumetric-modulated arc therapy (VMAT) were included. We use 1:1 case-matching to minimize the imbalance in clinical factors between PBSPT and VMAT. The bias test of standardized mean differences (SMD) was applied on the case-matched patient cohorts. The causal machine learning method, causal forest (CF), was adopted to investigate the causal effects between dosimetric factors and the incidence of ORN. The dose volume constraints (DVCs) for VMAT and PBSPT were derived based on causal effects. RBE values were further empirically derived based on tolerance curves formed from DVCs. 335 VMAT patients were case-matched to 335 PBSPT patients; however, SMD analysis revealed persistent covariate imbalances within each group, indicating residual confounding influence. Using CF modeling, we identified DVCs of mandible ORN and found that PBSPT had lower critical volumes than those of VMAT, leading to empirical RBE exceeding 1.1 in the moderate dose range (1.61 at 40 Gy[RBE=1.1], 1.30 at 50 Gy, and 1.13 at 60 Gy). This study presents a novel application of causal machine learning to evaluate mandible ORN in radiotherapy. The results indicate that proton RBE may significantly exceed 1.1 in the moderate dose range, underscoring the importance of incorporating the variable RBE into PBSPT treatment planning to mitigate the risk of ORN.

Limitations on Safe, Trusted, Artificial General Intelligence

Authors:Rina Panigrahy, Vatsal Sharan

Date:2025-09-25 22:16:38

Safety, trust and Artificial General Intelligence (AGI) are aspirational goals in artificial intelligence (AI) systems, and there are several informal interpretations of these notions. In this paper, we propose strict, mathematical definitions of safety, trust, and AGI, and demonstrate a fundamental incompatibility between them. We define safety of a system as the property that it never makes any false claims, trust as the assumption that the system is safe, and AGI as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- a safe and trusted AI system cannot be an AGI system: for such a safe, trusted system there are task instances which are easily and provably solvable by a human but not by the system. We note that we consider strict mathematical definitions of safety and trust, and it is possible for real-world deployments to instead rely on alternate, practical interpretations of these notions. We show our results for program verification, planning, and graph reachability. Our proofs draw parallels to G\"odel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of G\"odel's and Turing's results.

DroneFL: Federated Learning for Multi-UAV Visual Target Tracking

Authors:Xiaofan Yu, Yuwei Wu, Katherine Mao, Ye Tian, Vijay Kumar, Tajana Rosing

Date:2025-09-25 20:09:15

Multi-robot target tracking is a fundamental problem that requires coordinated monitoring of dynamic entities in applications such as precision agriculture, environmental monitoring, disaster response, and security surveillance. While Federated Learning (FL) has the potential to enhance learning across multiple robots without centralized data aggregation, its use in multi-Unmanned Aerial Vehicle (UAV) target tracking remains largely underexplored. Key challenges include limited onboard computational resources, significant data heterogeneity in FL due to varying targets and the fields of view, and the need for tight coupling between trajectory prediction and multi-robot planning. In this paper, we introduce DroneFL, the first federated learning framework specifically designed for efficient multi-UAV target tracking. We design a lightweight local model to predict target trajectories from sensor inputs, using a frozen YOLO backbone and a shallow transformer for efficient onboard training. The updated models are periodically aggregated in the cloud for global knowledge sharing. To alleviate the data heterogeneity that hinders FL convergence, DroneFL introduces a position-invariant model architecture with altitude-based adaptive instance normalization. Finally, we fuse predictions from multiple UAVs in the cloud and generate optimal trajectories that balance target prediction accuracy and overall tracking performance. Our results show that DroneFL reduces prediction error by 6%-83% and tracking distance by 0.4%-4.6% compared to a distributed non-FL framework. In terms of efficiency, DroneFL runs in real time on a Raspberry Pi 5 and has on average just 1.56 KBps data rate to the cloud.

Carbon-Negative Commuting: Integrating Urban Design, Behavior, and Technology for Climate-Positive Mobility

Authors:Ebrahim Eslami

Date:2025-09-25 19:13:39

Commuting contributes substantially to urban greenhouse gas emissions and represents a critical focus for climate mitigation efforts. This paper explores the multifaceted nature of commuting-related carbon dioxide emissions by analyzing the influence of urban form, socio-economic attributes, and individual behaviors. It reviews analytical approaches including structural equation modeling, multi-objective optimization, and agent-based simulations that have been employed to understand and mitigate emissions. Building on these insights, the paper develops a conceptual framework for carbon-negative commuting that integrates spatial planning, behavioral interventions, technological innovations, and carbon offsetting strategies. Case studies from diverse global contexts illustrate both the feasibility and challenges of implementing these interventions. The discussion highlights key trade-offs, equity considerations, and governance barriers while identifying co-benefits such as improved public health and urban resilience. The paper concludes by emphasizing the need for interdisciplinary research and adaptive policymaking to operationalize carbon-negative commuting and align urban mobility systems with global decarbonization goals.

\LARGE GMP$^{3}$: Learning-Driven, Bellman-Guided Trajectory Planning for UAVs in Real-Time on SE(3)

Authors:Babak Salamat, Dominik Mattern, Sebastian-Sven Olzem, Gerhard Elsbacher, Christian Seidel, Andrea M. Tonello

Date:2025-09-25 14:56:49

We propose $\text{GMP}^{3}$, a multiphase global path planning framework that generates dynamically feasible three-dimensional trajectories for unmanned aerial vehicles (UAVs) operating in cluttered environments. The framework extends traditional path planning from Euclidean position spaces to the Lie group $\mathrm{SE}(3)$, allowing joint learning of translational motion and rotational dynamics. A modified Bellman-based operator is introduced to support reinforcement learning (RL) policy updates while leveraging prior trajectory information for improved convergence. $\text{GMP}^{3}$ is designed as a distributed framework in which agents influence each other and share policy information along the trajectory: each agent refines its assigned segment and shares with its neighbors via a consensus-based scheme, enabling cooperative policy updates and convergence toward a path shaped globally even under kinematic constraints. We also propose DroneManager, a modular ground control software that interfaces the planner with real UAV platforms via the MAVLink protocol, supporting real-time deployment and feedback. Simulation studies and indoor flight experiments validate the effectiveness of the proposed method in constrained 3D environments, demonstrating reliable obstacle avoidance and smooth, feasible trajectories across both position and orientation. The open-source implementation is available at https://github.com/Domattee/DroneManager

A Causality-Aware Spatiotemporal Model for Multi-Region and Multi-Pollutant Air Quality Forecasting

Authors:Junxin Lu, Shiliang Sun

Date:2025-09-25 14:54:23

Air pollution, a pressing global problem, threatens public health, environmental sustainability, and climate stability. Achieving accurate and scalable forecasting across spatially distributed monitoring stations is challenging due to intricate multi-pollutant interactions, evolving meteorological conditions, and region specific spatial heterogeneity. To address this challenge, we propose AirPCM, a novel deep spatiotemporal forecasting model that integrates multi-region, multi-pollutant dynamics with explicit meteorology-pollutant causality modeling. Unlike existing methods limited to single pollutants or localized regions, AirPCM employs a unified architecture to jointly capture cross-station spatial correlations, temporal auto-correlations, and meteorology-pollutant dynamic causality. This empowers fine-grained, interpretable multi-pollutant forecasting across varying geographic and temporal scales, including sudden pollution episodes. Extensive evaluations on multi-scale real-world datasets demonstrate that AirPCM consistently surpasses state-of-the-art baselines in both predictive accuracy and generalization capability. Moreover, the long-term forecasting capability of AirPCM provides actionable insights into future air quality trends and potential high-risk windows, offering timely support for evidence-based environmental governance and carbon mitigation planning.

Gravitational waves from two scalar fields unifying the dark sector with inflation

Authors:Orlando Luongo, Tommaso Mengoni, Paulo M. Sá

Date:2025-09-25 14:11:59

We investigate the gravitational-wave background predicted by a two-scalar-field cosmological model that aims to unify primordial inflation with the dark sector, namely late-time dark energy and dark matter, in a single and self-consistent theoretical framework. The model is constructed from an action inspired by several extensions of general relativity and string-inspired scenarios and features a non-minimal interaction between the two scalar fields, while both remain minimally coupled to gravity. In this context, we derive the gravitational-wave energy spectrum over wavelengths ranging from today's Hubble horizon to those at the end of inflation. We employ the continuous Bogoliubov coefficient formalism, originally introduced to describe particle creation in an expanding Universe, in analogy to the well-established mechanism of gravitational particle production and, in particular, generalized to gravitons. Using this method, which enables an accurate description of graviton creation across all cosmological epochs, we find that inflation provides the dominant gravitational-wave contribution, while subdominant features arise at the inflation-radiation, radiation-matter, and matter-dark energy transitions, i.e., epochs naturally encoded inside our scalar field picture. The resulting energy density spectrum is thus compared with the sensitivity curves of the planned next-generation ground- and space-based gravitational-wave observatories. The comparison identifies frequency bands where the predicted signal could be probed, providing those windows associated with potentially detectable signals, bounded by our analyses. Consequences of our recipe are thus compared with numerical outcomes and the corresponding physical properties discussed in detail.

Multi-Robot Vision-Based Task and Motion Planning for EV Battery Disassembly and Sorting

Authors:Abdelaziz Shaarawy, Cansu Erdogan, Rustam Stolkin, Alireza Rastegarpanah

Date:2025-09-25 11:30:45

Electric-vehicle (EV) battery disassembly requires precise multi-robot coordination, short and reliable motions, and robust collision safety in cluttered, dynamic scenes. We propose a four-layer task-and-motion planning (TAMP) framework that couples symbolic task planning and cost- and accessibility-aware allocation with a TP-GMM-guided motion planner learned from demonstrations. Stereo vision with YOLOv8 provides real-time component localization, while OctoMap-based 3D mapping and FCL(Flexible Collision Library) checks in MoveIt unify predictive digital-twin collision checking with reactive, vision-based avoidance. Validated on two UR10e robots across cable, busbar, service plug, and three leaf-cell removals, the approach yields substantially more compact and safer motions than a default RRTConnect baseline under identical perception and task assignments: average end-effector path length drops by $-63.3\%$ and makespan by $-8.1\%$; per-arm swept volumes shrink (R1: $0.583\rightarrow0.139\,\mathrm{m}^3$; R2: $0.696\rightarrow0.252\,\mathrm{m}^3$), and mutual overlap decreases by $47\%$ ($0.064\rightarrow0.034\,\mathrm{m}^3$). These results highlight improved autonomy, precision, and safety for multi-robot EV battery disassembly in unstructured, dynamic environments.