planning - 2025-06-30

Derivation of Tissue Properties from Basis-Vector Model Weights for Dual-Energy CT-Based Monte Carlo Proton Beam Dose Calculations

Authors:Maria Jose Medrano, Xinyuan Chen, Lucas Norberto Burigo, Joseph A. O'Sullivan, Jeffrey F. Williamson

Date:2025-06-27 17:47:05

We propose a novel method, basis vector model material indexing (BVM-MI), for predicting atomic composition and mass density from two independent basis vector model weights derived from dual-energy CT (DECT) for Monte Carlo (MC) dose planning. BVM-MI employs multiple linear regression on BVM weights and their quotient to predict elemental composition and mass density for 70 representative tissues. Predicted values were imported into the TOPAS MC code to simulate proton dose deposition to a uniform cylinder phantom composed of each tissue type. The performance of BVM-MI was compared to the conventional Hounsfield Unit material indexing method (HU-MI), which estimates elemental composition and density based on CT numbers (HU). Evaluation metrics included absolute errors in predicted elemental compositions and relative percent errors in calculated mass density and mean excitation energy. Dose distributions were assessed by quantifying absolute error in the depth of 80% maximum scored dose (R80) and relative percent errors in stopping power (SP) between MC simulations using HU-MI, BVM-MI, and benchmark compositions. Lateral dose profiles were analyzed at R80 and Bragg Peak (RBP) depths for three tissues showing the largest discrepancies in R80 depth. BVM-MI outperformed HU-MI in elemental composition predictions, with mean RMSEs of 1.30% (soft tissue) and 0.1% (bony tissue), compared to 4.20% and 1.9% for HU-MI. R80 depth RMSEs were 0.2 mm (soft) and 0.1 mm (bony) for BVM-MI, vs. 1.8 mm and 0.7 mm for HU-MI. Lateral dose profile analysis showed overall smaller dose errors for BVM-MI across core, halo, and proximal aura regions. In conclusion, fully utilizing the two-parameter BVM space for material indexing led to significantly improved TOPAS MC dose calculations over the conventional HU-MI method, demonstrating BVM-MI's potential to enhance proton therapy planning.

Embodied AI Agents: Modeling the World

Authors:Pascale Fung, Yoram Bachrach, Asli Celikyilmaz, Kamalika Chaudhuri, Delong Chen, Willy Chung, Emmanuel Dupoux, Hervé Jégou, Alessandro Lazaric, Arjun Majumdar, Andrea Madotto, Franziska Meier, Florian Metze, Théo Moutakanni, Juan Pino, Basile Terver, Joseph Tighe, Jitendra Malik

Date:2025-06-27 16:05:34

This paper describes our research on AI agents embodied in visual, virtual or physical forms, enabling them to interact with both users and their environments. These agents, which include virtual avatars, wearable devices, and robots, are designed to perceive, learn and act within their surroundings, which makes them more similar to how humans learn and interact with the environments as compared to disembodied agents. We propose that the development of world models is central to reasoning and planning of embodied AI agents, allowing these agents to understand and predict their environment, to understand user intentions and social contexts, thereby enhancing their ability to perform complex tasks autonomously. World modeling encompasses the integration of multimodal perception, planning through reasoning for action and control, and memory to create a comprehensive understanding of the physical world. Beyond the physical world, we also propose to learn the mental world model of users to enable better human-agent collaboration.

General measures of effect size to calculate power and sample size for Wald tests with generalized linear models

Authors:Amy L Cochran, Shijie Yuan, Paul J Rathouz

Date:2025-06-27 15:36:30

Power and sample size calculations for Wald tests in generalized linear models (GLMs) are often limited to specific cases like logistic regression. More general methods typically require detailed study parameters that are difficult to obtain during planning. We introduce two new effect size measures for estimating power, sample size, or the minimally detectable effect size in studies using Wald tests across any GLM. These measures accommodate any number of predictors or adjusters and require only basic study information. We provide practical guidance for interpreting and applying these measures to approximate a key parameter in power calculations. We also derive asymptotic bounds on the relative error of these approximations, showing that accuracy depends on features of the GLM such as the nonlinearity of the link function. To complement this analysis, we conduct simulation studies across common model specifications, identifying best use cases and opportunities for improvement. Finally, we test the methods in finite samples to confirm their practical utility.

Advanced Deep Learning Techniques for Automated Segmentation of Type B Aortic Dissections

Authors:Hao Xu, Ruth Lim, Brian E. Chapman

Date:2025-06-27 13:38:33

Purpose: Aortic dissections are life-threatening cardiovascular conditions requiring accurate segmentation of true lumen (TL), false lumen (FL), and false lumen thrombosis (FLT) from CTA images for effective management. Manual segmentation is time-consuming and variable, necessitating automated solutions. Materials and Methods: We developed four deep learning-based pipelines for Type B aortic dissection segmentation: a single-step model, a sequential model, a sequential multi-task model, and an ensemble model, utilizing 3D U-Net and Swin-UnetR architectures. A dataset of 100 retrospective CTA images was split into training (n=80), validation (n=10), and testing (n=10). Performance was assessed using the Dice Coefficient and Hausdorff Distance. Results: Our approach achieved superior segmentation accuracy, with Dice Coefficients of 0.91 $\pm$ 0.07 for TL, 0.88 $\pm$ 0.18 for FL, and 0.47 $\pm$ 0.25 for FLT, outperforming Yao et al. (1), who reported 0.78 $\pm$ 0.20, 0.68 $\pm$ 0.18, and 0.25 $\pm$ 0.31, respectively. Conclusion: The proposed pipelines provide accurate segmentation of TBAD features, enabling derivation of morphological parameters for surveillance and treatment planning

KnotDLO: Toward Interpretable Knot Tying

Authors:Holly Dinkel, Raghavendra Navaratna, Jingyi Xiang, Brian Coltin, Trey Smith, Timothy Bretl

Date:2025-06-27 12:43:05

This work presents KnotDLO, a method for one-handed Deformable Linear Object (DLO) knot tying that is robust to occlusion, repeatable for varying rope initial configurations, interpretable for generating motion policies, and requires no human demonstrations or training. Grasp and target waypoints for future DLO states are planned from the current DLO shape. Grasp poses are computed from indexing the tracked piecewise linear curve representing the DLO state based on the current curve shape and are piecewise continuous. KnotDLO computes intermediate waypoints from the geometry of the current DLO state and the desired next state. The system decouples visual reasoning from control. In 16 trials of knot tying, KnotDLO achieves a 50% success rate in tying an overhand knot from previously unseen configurations.

RM-Dijkstra: A surface optimal path planning algorithm based on Riemannian metric

Authors:Yu Zhang, Xiao-Song Yang

Date:2025-06-27 12:31:50

The Dijkstra algorithm is a classic path planning method, which operates in a discrete graph space to determine the shortest path from a specified source point to a target node or all other nodes based on non-negative edge weights. Numerous studies have focused on the Dijkstra algorithm due to its potential application. However, its application in surface path planning for mobile robots remains largely unexplored. In this letter, a surface optimal path planning algorithm called RM-Dijkstra is proposed, which is based on Riemannian metric model. By constructing a new Riemannian metric on the 2D projection plane, the surface optimal path planning problem is therefore transformed into a geometric problem on the 2D plane with new Riemannian metric. Induced by the standard Euclidean metric on surface, the constructed new metric reflects environmental information of the robot and ensures that the projection map is an isometric immersion. By conducting a series of simulation tests, the experimental results demonstrate that the RM-Dijkstra algorithm not only effectively solves the optimal path planning problem on surfaces, but also outperforms traditional path planning algorithms in terms of path accuracy and smoothness, particularly in complex scenarios.

Multi-Robot Assembly of Deformable Linear Objects Using Multi-Modal Perception

Authors:Kejia Chen, Celina Dettmering, Florian Pachler, Zhuo Liu, Yue Zhang, Tailai Cheng, Jonas Dirr, Zhenshan Bing, Alois Knoll, Rüdiger Daub

Date:2025-06-27 09:28:44

Industrial assembly of deformable linear objects (DLOs) such as cables offers great potential for many industries. However, DLOs pose several challenges for robot-based automation due to the inherent complexity of deformation and, consequentially, the difficulties in anticipating the behavior of DLOs in dynamic situations. Although existing studies have addressed isolated subproblems like shape tracking, grasping, and shape control, there has been limited exploration of integrated workflows that combine these individual processes. To address this gap, we propose an object-centric perception and planning framework to achieve a comprehensive DLO assembly process throughout the industrial value chain. The framework utilizes visual and tactile information to track the DLO's shape as well as contact state across different stages, which facilitates effective planning of robot actions. Our approach encompasses robot-based bin picking of DLOs from cluttered environments, followed by a coordinated handover to two additional robots that mount the DLOs onto designated fixtures. Real-world experiments employing a setup with multiple robots demonstrate the effectiveness of the approach and its relevance to industrial scenarios.

RoboEnvision: A Long-Horizon Video Generation Model for Multi-Task Robot Manipulation

Authors:Liudi Yang, Yang Bai, George Eskandar, Fengyi Shen, Mohammad Altillawi, Dong Chen, Soumajit Majumder, Ziyuan Liu, Gitta Kutyniok, Abhinav Valada

Date:2025-06-27 08:21:55

We address the problem of generating long-horizon videos for robotic manipulation tasks. Text-to-video diffusion models have made significant progress in photorealism, language understanding, and motion generation but struggle with long-horizon robotic tasks. Recent works use video diffusion models for high-quality simulation data and predictive rollouts in robot planning. However, these works predict short sequences of the robot achieving one task and employ an autoregressive paradigm to extend to the long horizon, leading to error accumulations in the generated video and in the execution. To overcome these limitations, we propose a novel pipeline that bypasses the need for autoregressive generation. We achieve this through a threefold contribution: 1) we first decompose the high-level goals into smaller atomic tasks and generate keyframes aligned with these instructions. A second diffusion model then interpolates between each of the two generated frames, achieving the long-horizon video. 2) We propose a semantics preserving attention module to maintain consistency between the keyframes. 3) We design a lightweight policy model to regress the robot joint states from generated videos. Our approach achieves state-of-the-art results on two benchmarks in video quality and consistency while outperforming previous policy models on long-horizon tasks.

A MILP-Based Solution to Multi-Agent Motion Planning and Collision Avoidance in Constrained Environments

Authors:Akshay Jaitly, Jack Cline, Siavash Farzan

Date:2025-06-27 07:42:52

We propose a mixed-integer linear program (MILP) for multi-agent motion planning that embeds Polytopic Action-based Motion Planning (PAAMP) into a sequence-then-solve pipeline. Region sequences confine each agent to adjacent convex polytopes, while a big-M hyperplane model enforces inter-agent separation. Collision constraints are applied only to agents sharing or neighboring a region, which reduces binary variables exponentially compared with naive formulations. An L1 path-length-plus-acceleration cost yields smooth trajectories. We prove finite-time convergence and demonstrate on representative multi-agent scenarios with obstacles that our formulation produces collision-free trajectories an order of magnitude faster than an unstructured MILP baseline.

Non-Parametric Time Between Events and Amplitude Methods for Monitoring Drought Characteristics

Authors:Michele Scagliarini

Date:2025-06-27 07:21:01

Drought is a significant natural phenomenon with profound environmental, economic, and societal impacts. Effective monitoring of drought characteristics -- such as intensity, magnitude, and duration -- is crucial for resilience and mitigation strategies. This study proposes the use of non-parametric Time Between Events and Amplitude (TBEA) control charts for detecting changes in drought characteristics, specifically applying them to the Standardized Precipitation and Evapotranspiration Index. Aware of being non-exhaustive, we considered two non-parametric change-point control charts based on the Mann-Whitney and Kolmogorov-Smirnov statistics, respectively. We studied the in-control statistical performances of the change-point control charts in the time between events and amplitude framework through a simulation study. Furthermore, we assessed the coherence of the results obtained with a distribution-free upper sided Exponentially Weighted Moving Average control chart specifically designed for monitoring TBEA data. The findings suggest that the proposed methods may serve as valuable tools for climate resilience planning and water resource management.

A Deep Learning Algorithm Based on CNN-LSTM Framework for Predicting Cancer Drug Sales Volume

Authors:Yinghan Li, Yilin Yao, Junghua Lin, Nanxi Wang

Date:2025-06-27 05:36:47

This study explores the application potential of a deep learning model based on the CNN-LSTM framework in forecasting the sales volume of cancer drugs, with a focus on modeling complex time series data. As advancements in medical technology and cancer treatment continue, the demand for oncology medications is steadily increasing. Accurate forecasting of cancer drug sales plays a critical role in optimizing production planning, supply chain management, and healthcare policy formulation. The dataset used in this research comprises quarterly sales records of a specific cancer drug in Egypt from 2015 to 2024, including multidimensional information such as date, drug type, pharmaceutical company, price, sales volume, effectiveness, and drug classification. To improve prediction accuracy, a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks is employed. The CNN component is responsible for extracting local temporal features from the sales data, while the LSTM component captures long-term dependencies and trends. Model performance is evaluated using two widely adopted metrics: Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). The results demonstrate that the CNN-LSTM model performs well on the test set, achieving an MSE of 1.150 and an RMSE of 1.072, indicating its effectiveness in handling nonlinear and volatile sales data. This research provides theoretical and technical support for data-driven decision-making in pharmaceutical marketing and healthcare resource planning.

Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds

Authors:Edward Chen, Sang T. Truong, Natalie Dullerud, Sanmi Koyejo, Carlos Guestrin

Date:2025-06-27 03:44:20

High-stakes decision-making involves navigating multiple competing objectives with expensive evaluations. For instance, in brachytherapy, clinicians must balance maximizing tumor coverage (e.g., an aspirational target or soft bound of >95% coverage) against strict organ dose limits (e.g., a non-negotiable hard bound of <601 cGy to the bladder), with each plan evaluation being resource-intensive. Selecting Pareto-optimal solutions that match implicit preferences is challenging, as exhaustive Pareto frontier exploration is computationally and cognitively prohibitive, necessitating interactive frameworks to guide users. While decision-makers (DMs) often possess domain knowledge to narrow the search via such soft-hard bounds, current methods often lack systematic approaches to iteratively refine these multi-faceted preference structures. Critically, DMs must trust their final decision, confident they haven't missed superior alternatives; this trust is paramount in high-consequence scenarios. We present Active-MoSH, an interactive local-global framework designed for this process. Its local component integrates soft-hard bounds with probabilistic preference learning, maintaining distributions over DM preferences and bounds for adaptive Pareto subset refinement. This is guided by an active sampling strategy optimizing exploration-exploitation while minimizing cognitive burden. To build DM trust, Active-MoSH's global component, T-MoSH, leverages multi-objective sensitivity analysis to identify potentially overlooked, high-value points beyond immediate feedback. We demonstrate Active-MoSH's performance benefits through diverse synthetic and real-world applications. A user study on AI-generated image selection further validates our hypotheses regarding the framework's ability to improve convergence, enhance DM trust, and provide expressive preference articulation, enabling more effective DMs.

M3PO: Massively Multi-Task Model-Based Policy Optimization

Authors:Aditya Narendra, Dmitry Makarov, Aleksandr Panov

Date:2025-06-26 21:39:01

We introduce Massively Multi-Task Model-Based Policy Optimization (M3PO), a scalable model-based reinforcement learning (MBRL) framework designed to address sample inefficiency in single-task settings and poor generalization in multi-task domains. Existing model-based approaches like DreamerV3 rely on pixel-level generative models that neglect control-centric representations, while model-free methods such as PPO suffer from high sample complexity and weak exploration. M3PO integrates an implicit world model, trained to predict task outcomes without observation reconstruction, with a hybrid exploration strategy that combines model-based planning and model-free uncertainty-driven bonuses. This eliminates the bias-variance trade-off in prior methods by using discrepancies between model-based and model-free value estimates to guide exploration, while maintaining stable policy updates through a trust-region optimizer. M3PO provides an efficient and robust alternative to existing model-based policy optimization approaches and achieves state-of-the-art performance across multiple benchmarks.

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Authors:Xinzhuo Li, Adheesh Juvekar, Xingyou Liu, Muntasir Wahed, Kiet A. Nguyen, Ismini Lourentzou

Date:2025-06-26 17:59:12

Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Existing evaluation protocols for segmentation hallucination primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucinations in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1340 counterfactual instance pairs spanning 281 unique object classes, and a set of newly introduced metrics that quantify hallucination sensitivity under visually coherent scene edits. Experiments on HalluSegBench with state-of-the-art vision-language segmentation models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the need for counterfactual reasoning to diagnose grounding fidelity.

Evaluation of Traffic Signals for Daily Traffic Pattern

Authors:Mohammad Shokrolah Shirazi, Hung-Fu Chang

Date:2025-06-26 16:56:59

The turning movement count data is crucial for traffic signal design, intersection geometry planning, traffic flow, and congestion analysis. This work proposes three methods called dynamic, static, and hybrid configuration for TMC-based traffic signals. A vision-based tracking system is developed to estimate the TMC of six intersections in Las Vegas using traffic cameras. The intersection design, route (e.g. vehicle movement directions), and signal configuration files with compatible formats are synthesized and imported into Simulation of Urban MObility for signal evaluation with realistic data. The initial experimental results based on estimated waiting times indicate that the cycle time of 90 and 120 seconds works best for all intersections. In addition, four intersections show better performance for dynamic signal timing configuration, and the other two with lower performance have a lower ratio of total vehicle count to total lanes of the intersection leg. Since daily traffic flow often exhibits a bimodal pattern, we propose a hybrid signal method that switches between dynamic and static methods, adapting to peak and off-peak traffic conditions for improved flow management. So, a built-in traffic generator module creates vehicle routes for 4 hours, including peak hours, and a signal design module produces signal schedule cycles according to static, dynamic, and hybrid methods. Vehicle count distributions are weighted differently for each zone (i.e., West, North, East, South) to generate diverse traffic patterns. The extended experimental results for 6 intersections with 4 hours of simulation time imply that zone-based traffic pattern distributions affect signal design selection. Although the static method works great for evenly zone-based traffic distribution, the hybrid method works well for highly weighted traffic at intersection pairs of the West-East and North-South zones.

CoPa-SG: Dense Scene Graphs with Parametric and Proto-Relations

Authors:Julian Lorenz, Mrunmai Phatak, Robin Schön, Katja Ludwig, Nico Hörmann, Annemarie Friedrich, Rainer Lienhart

Date:2025-06-26 15:09:23

2D scene graphs provide a structural and explainable framework for scene understanding. However, current work still struggles with the lack of accurate scene graph data. To overcome this data bottleneck, we present CoPa-SG, a synthetic scene graph dataset with highly precise ground truth and exhaustive relation annotations between all objects. Moreover, we introduce parametric and proto-relations, two new fundamental concepts for scene graphs. The former provides a much more fine-grained representation than its traditional counterpart by enriching relations with additional parameters such as angles or distances. The latter encodes hypothetical relations in a scene graph and describes how relations would form if new objects are placed in the scene. Using CoPa-SG, we compare the performance of various scene graph generation models. We demonstrate how our new relation types can be integrated in downstream applications to enhance planning and reasoning capabilities.

Guarding Offices with Maximum Dispersion

Authors:Sándor P. Fekete, Kai Kobbe, Dominik Krupke, Joseph S. B. Mitchell, Christian Rieck, Christian Scheffer

Date:2025-06-26 14:25:35

We investigate the Dispersive Art Gallery Problem with vertex guards and rectangular visibility ($r$-visibility) for a class of orthogonal polygons that reflect the properties of real-world floor plans: these office-like polygons consist of rectangular rooms and corridors. In the dispersive variant of the Art Gallery Problem, the objective is not to minimize the number of guards but to maximize the minimum geodesic $L_1$-distance between any two guards, called the dispersion distance. Our main contributions are as follows. We prove that determining whether a vertex guard set can achieve a dispersion distance of $4$ in office-like polygons is NP-complete, where vertices of the polygon are restricted to integer coordinates. Additionally, we present a simple worst-case optimal algorithm that guarantees a dispersion distance of $3$ in polynomial time. Our complexity result extends to polyominoes, resolving an open question posed by Rieck and Scheffer (CGTA 2024). When vertex coordinates are allowed to be rational, we establish analogous results, proving that achieving a dispersion distance of $2+\varepsilon$ is NP-hard for any $\varepsilon > 0$, while the classic Art Gallery Problem remains solvable in polynomial time for this class of polygons. Furthermore, we give a straightforward polynomial-time algorithm that computes worst-case optimal solutions with a dispersion distance of $2$. On the other hand, for the more restricted class of hole-free independent office-like polygons, we propose a dynamic programming approach that computes optimal solutions. Moreover, we demonstrate that the problem is practically tractable for arbitrary orthogonal polygons. To this end, we compare solvers based on SAT, CP, and MIP formulations. Notably, SAT solvers efficiently compute optimal solutions for randomly generated instances with up to $1600$ vertices in under $15$s.

Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou

Authors:Pengfei Fan, Yuli Zhang, Xinheng Wang, Ruiyuan Jiang, Hankang Gu, Dongyao Jia, Shangbo Wang

Date:2025-06-26 13:53:22

This study presents and publicly releases the Suzhou Urban Road Acoustic Dataset (SZUR-Acoustic Dataset), which is accompanied by comprehensive data-acquisition protocols and annotation guidelines to ensure transparency and reproducibility of the experimental workflow. To model the coupling between vehicular noise and driving speed, we propose a bimodal-feature-fusion deep convolutional neural network (BMCNN). During preprocessing, an adaptive denoising and normalization strategy is applied to suppress environmental background interference; in the network architecture, parallel branches extract Mel-frequency cepstral coefficients (MFCCs) and wavelet-packet energy features, which are subsequently fused via a cross-modal attention mechanism in the intermediate feature space to fully exploit time-frequency information. Experimental results demonstrate that BMCNN achieves a classification accuracy of 87.56% on the SZUR-Acoustic Dataset and 96.28% on the public IDMT-Traffic dataset. Ablation studies and robustness tests on the Suzhou dataset further validate the contributions of each module to performance improvement and overfitting mitigation. The proposed acoustics-based speed classification method can be integrated into smart-city traffic management systems for real-time noise monitoring and speed estimation, thereby optimizing traffic flow control, reducing roadside noise pollution, and supporting sustainable urban planning.

Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents

Authors:Tianyi Men, Zhuoran Jin, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao

Date:2025-06-26 13:36:12

As Multimodal Large Language Models (MLLMs) advance, multimodal agents show promise in real-world tasks like web navigation and embodied intelligence. However, due to limitations in a lack of external feedback, these agents struggle with self-correction and generalization. A promising approach is to use reward models as external feedback, but there is no clear on how to select reward models for agents. Thus, there is an urgent need to build a reward bench targeted at agents. To address these challenges, we propose Agent-RewardBench, a benchmark designed to evaluate reward modeling ability in MLLMs. The benchmark is characterized by three key features: (1) Multiple dimensions and real-world agent scenarios evaluation. It covers perception, planning, and safety with 7 scenarios; (2) Step-level reward evaluation. It allows for the assessment of agent capabilities at the individual steps of a task, providing a more granular view of performance during the planning process; and (3) Appropriately difficulty and high-quality. We carefully sample from 10 diverse models, difficulty control to maintain task challenges, and manual verification to ensure the integrity of the data. Experiments demonstrate that even state-of-the-art multimodal models show limited performance, highlighting the need for specialized training in agent reward modeling. Code is available at github.

World-aware Planning Narratives Enhance Large Vision-Language Model Planner

Authors:Junhao Shi, Zhaoye Fei, Siyin Wang, Qipeng Guo, Jingjing Gong, Xipeng QIu

Date:2025-06-26 13:20:55

Large Vision-Language Models (LVLMs) show promise for embodied planning tasks but struggle with complex scenarios involving unfamiliar environments and multi-step goals. Current approaches rely on environment-agnostic imitation learning that disconnects instructions from environmental contexts, causing models to struggle with context-sensitive instructions and rely on supplementary cues rather than visual reasoning during long-horizon interactions. In this work, we propose World-Aware Planning Narrative Enhancement (WAP), a framework that infuses LVLMs with comprehensive environmental understanding through four cognitive capabilities (visual appearance modeling, spatial reasoning, functional abstraction, and syntactic grounding) while developing and evaluating models using only raw visual observations through curriculum learning. Evaluations on the EB-ALFRED benchmark demonstrate substantial improvements, with Qwen2.5-VL achieving a 60.7 absolute improvement in task success rates, particularly in commonsense reasoning (+60.0) and long-horizon planning (+70.0). Notably, our enhanced open-source models outperform proprietary systems like GPT-4o and Claude-3.5-Sonnet by a large margin.

Robust Deep Learning for Myocardial Scar Segmentation in Cardiac MRI with Noisy Labels

Authors:Aida Moafi, Danial Moafi, Evgeny M. Mirkes, Gerry P. McCann, Abbas S. Alatrany, Jayanth R. Arnold, Mostafa Mehdipour Ghazi

Date:2025-06-26 11:21:58

The accurate segmentation of myocardial scars from cardiac MRI is essential for clinical assessment and treatment planning. In this study, we propose a robust deep-learning pipeline for fully automated myocardial scar detection and segmentation by fine-tuning state-of-the-art models. The method explicitly addresses challenges of label noise from semi-automatic annotations, data heterogeneity, and class imbalance through the use of Kullback-Leibler loss and extensive data augmentation. We evaluate the model's performance on both acute and chronic cases and demonstrate its ability to produce accurate and smooth segmentations despite noisy labels. In particular, our approach outperforms state-of-the-art models like nnU-Net and shows strong generalizability in an out-of-distribution test set, highlighting its robustness across various imaging conditions and clinical tasks. These results establish a reliable foundation for automated myocardial scar quantification and support the broader clinical adoption of deep learning in cardiac imaging.

GoIRL: Graph-Oriented Inverse Reinforcement Learning for Multimodal Trajectory Prediction

Authors:Muleilan Pei, Shaoshuai Shi, Lu Zhang, Peiliang Li, Shaojie Shen

Date:2025-06-26 09:46:53

Trajectory prediction for surrounding agents is a challenging task in autonomous driving due to its inherent uncertainty and underlying multimodality. Unlike prevailing data-driven methods that primarily rely on supervised learning, in this paper, we introduce a novel Graph-oriented Inverse Reinforcement Learning (GoIRL) framework, which is an IRL-based predictor equipped with vectorized context representations. We develop a feature adaptor to effectively aggregate lane-graph features into grid space, enabling seamless integration with the maximum entropy IRL paradigm to infer the reward distribution and obtain the policy that can be sampled to induce multiple plausible plans. Furthermore, conditioned on the sampled plans, we implement a hierarchical parameterized trajectory generator with a refinement module to enhance prediction accuracy and a probability fusion strategy to boost prediction confidence. Extensive experimental results showcase our approach not only achieves state-of-the-art performance on the large-scale Argoverse & nuScenes motion forecasting benchmarks but also exhibits superior generalization abilities compared to existing supervised models.

V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling

Authors:Junwei You, Pei Li, Zhuoyu Jiang, Zilin Huang, Rui Gan, Haotian Shi, Bin Ran

Date:2025-06-26 06:42:03

Ensuring robust planning and decision-making under rare, diverse, and visually degraded long-tail scenarios remains a fundamental challenge for autonomous driving in urban environments. This issue becomes more critical in cooperative settings, where vehicles and infrastructure jointly perceive and reason across complex environments. To address this challenge, we propose V2X-REALM, a vision-language model (VLM)-based framework with adaptive multimodal learning for robust cooperative autonomous driving under long-tail scenarios. V2X-REALM introduces three core innovations: (i) a prompt-driven long-tail scenario generation and evaluation pipeline that leverages foundation models to synthesize realistic long-tail conditions such as snow and fog across vehicle- and infrastructure-side views, enriching training diversity efficiently; (ii) a gated multi-scenario adaptive attention module that modulates the visual stream using scenario priors to recalibrate ambiguous or corrupted features; and (iii) a multi-task scenario-aware contrastive learning objective that improves multimodal alignment and promotes cross-scenario feature separability. Extensive experiments demonstrate that V2X-REALM significantly outperforms existing baselines in robustness, semantic reasoning, safety, and planning accuracy under complex, challenging driving conditions, advancing the scalability of end-to-end cooperative autonomous driving.

Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning

Authors:Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han

Date:2025-06-26 06:35:42

Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.

Rational Miner Behaviour, Protocol Stability, and Time Preference: An Austrian and Game-Theoretic Analysis of Bitcoin's Incentive Environment

Authors:Craig Steven Wright

Date:2025-06-26 03:04:21

This paper integrates Austrian capital theory with repeated game theory to examine strategic miner behaviour under different institutional conditions in blockchain systems. It shows that when protocol rules are mutable, effective time preference rises, undermining rational long-term planning and cooperative equilibria. Using formal game-theoretic analysis and Austrian economic principles, the paper demonstrates how mutable protocols shift miner incentives from productive investment to political rent-seeking and influence games. The original Bitcoin protocol is interpreted as an institutional anchor: a fixed rule-set enabling calculability and low time preference. Drawing on the work of Bohm-Bawerk, Mises, and Hayek, the argument is made that protocol immutability is essential for restoring strategic coherence, entrepreneurial confidence, and sustainable network equilibrium.

Neutrino Physics and Astrophysics at Colliders

Authors:Pedro Machado, Bei Zhou

Date:2025-06-25 22:00:37

Nonzero neutrino masses guarantee new physics and neutrinos are excellent probes of extreme environments in the Universe. The recent collider neutrino experimental program, including FASER$\nu$ and SND@LHC, along with the planned Forward Physics Facility at the High-Luminosity Large Hadron Collider, is opening a new window into neutrino physics and astrophysics. In this article, we review recent achievements and prospects of collider neutrino experiments, including key achievements such as the first measurements of collider neutrino interactions at unprecedented energies and the exploration of new physics scenarios, like dark matter candidates, sterile neutrinos, and non-standard neutrino interactions. For concreteness, we will focus on the significant scientific opportunities presented by the Forward Physics Facility, which will enable precision measurements of neutrino cross sections and proton structure at low parton momentum fraction. Furthermore, collider neutrino studies will substantially reduce systematic uncertainties in calculating atmospheric neutrino fluxes, thereby improving astrophysical neutrino observations as well as advancing our understanding of cosmic-ray interactions.

Online Planning for Cooperative Air-Ground Robot Systems with Unknown Fuel Requirements

Authors:Ritvik Agarwal, Behnoushsadat Hatami, Alvika Gautam, Parikshit Maini

Date:2025-06-25 19:47:33

We consider an online variant of the fuel-constrained UAV routing problem with a ground-based mobile refueling station (FCURP-MRS), where targets incur unknown fuel costs. We develop a two-phase solution: an offline heuristic-based planner computes initial UAV and UGV paths, and a novel online planning algorithm that dynamically adjusts rendezvous points based on real-time fuel consumption during target processing. Preliminary Gazebo simulations demonstrate the feasibility of our approach in maintaining UAV-UGV path validity, ensuring mission completion. Link to video: https://youtu.be/EmpVj-fjqNY

Robust and Flexible Microtransit Design: Chance-Constrained Dial-a-Ride Problem with Soft Time Windows

Authors:Hongli Li, Zengxiang Lei, Xinwu Qian, Satish V. Ukkusuri

Date:2025-06-25 19:23:19

Microtransit offers a promising blend of rideshare flexibility and public transit efficiency. In practice, it faces unanticipated but spatially aligned requests, passengers seeking to join ongoing schedules, leading to underutilized capacity and degraded service if not properly managed. At the same time, it must accommodate diverse passenger needs, from routine errands to time-sensitive trips such as medical appointments. To meet these expectations, incorporating time flexibility is essential. However, existing models seldom consider both spontaneous and heterogeneous demand, limiting their real-world applicability. We propose a robust and flexible microtransit framework that integrates time flexibility and demand uncertainty via a Chance-Constrained Dial-A-Ride Problem with Soft Time Windows (CCDARP-STW). Demand uncertainty is captured through nonlinear chance constraints with controllable violation probabilities, while time flexibility is modeled with soft time windows and penalized cost. We develop a bounded-support relaxation using limited statistical information to linearize the chance constraints and solve the model using a tailored Branch-and-Cut-and-Price (BCP) algorithm with a probabilistic dominance rule. This rule improves computational efficiency, reducing explored labels by 17.40% and CPU time by 22.27% in robust cases. A case study based on real-world Chicago data shows our framework yields 11.55 minutes and 11.13 miles of savings versus conventional microtransit, and achieves the highest service reliability (96.46%) among robust models.

Drift-Adaptive Slicing-Based Resource Management for Cooperative ISAC Networks

Authors:Shisheng Hu, Jie Gao, Xue Qin, Conghao Zhou, Xinyu Huang, Mushu Li, Mingcheng He, Xuemin Shen

Date:2025-06-25 18:52:00

In this paper, we propose a novel drift-adaptive slicing-based resource management scheme for cooperative integrated sensing and communication (ISAC) networks. Particularly, we establish two network slices to provide sensing and communication services, respectively. In the large-timescale planning for the slices, we partition the sensing region of interest (RoI) of each mobile device and reserve network resources accordingly, facilitating low-complexity distance-based sensing target assignment in small timescales. To cope with the non-stationary spatial distributions of mobile devices and sensing targets, which can result in the drift in modeling the distributions and ineffective planning decisions, we construct digital twins (DTs) of the slices. In each DT, a drift-adaptive statistical model and an emulation function are developed for the spatial distributions in the corresponding slice, which facilitates closed-form decision-making and efficient validation of a planning decision, respectively. Numerical results show that the proposed drift-adaptive slicing-based resource management scheme can increase the service satisfaction ratio by up to 18% and reduce resource consumption by up to 13.1% when compared with benchmark schemes.

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Authors:Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, Yizhe Zhang

Date:2025-06-25 17:35:47

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding, we systematically investigate their denoising processes and reinforcement learning (RL) methods. We train a 7B dLLM, \textbf{DiffuCoder}, on 130B tokens of code. Using this model as a testbed, we analyze its decoding behavior, revealing how it differs from that of AR models: (1) dLLMs can decide how causal their generation should be without relying on semi-AR decoding, and (2) increasing the sampling temperature diversifies not only token choices but also their generation order. This diversity creates a rich search space for RL rollouts. For RL training, to reduce the variance of token log-likelihood estimates and maintain training efficiency, we propose \textbf{coupled-GRPO}, a novel sampling scheme that constructs complementary mask noise for completions used in training. In our experiments, coupled-GRPO significantly improves DiffuCoder's performance on code generation benchmarks (+4.4\% on EvalPlus) and reduces reliance on AR bias during decoding. Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework. https://github.com/apple/ml-diffucoder.