planning - 2025-03-14

Uncertainty in Action: Confidence Elicitation in Embodied Agents

Authors:Tianjiao Yu, Vedant Shah, Muntasir Wahed, Kiet A. Nguyen, Adheesh Juvekar, Tal August, Ismini Lourentzou
Date:2025-03-13 17:59:41

Expressing confidence is challenging for embodied agents navigating dynamic multimodal environments, where uncertainty arises from both perception and decision-making processes. We present the first work investigating embodied confidence elicitation in open-ended multimodal environments. We introduce Elicitation Policies, which structure confidence assessment across inductive, deductive, and abductive reasoning, along with Execution Policies, which enhance confidence calibration through scenario reinterpretation, action sampling, and hypothetical reasoning. Evaluating agents in calibration and failure prediction tasks within the Minecraft environment, we show that structured reasoning approaches, such as Chain-of-Thoughts, improve confidence calibration. However, our findings also reveal persistent challenges in distinguishing uncertainty, particularly under abductive settings, underscoring the need for more sophisticated embodied confidence elicitation methods.

DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

Authors:Ayesha Ishaq, Jean Lahoud, Ketan More, Omkar Thawakar, Ritesh Thawkar, Dinura Dissanayake, Noor Ahsan, Yuhao Li, Fahad Shahbaz Khan, Hisham Cholakkal, Ivan Laptev, Rao Muhammad Anwer, Salman Khan
Date:2025-03-13 17:59:01

While large multimodal models (LMMs) have demonstrated strong performance across various Visual Question Answering (VQA) tasks, certain challenges require complex multi-step reasoning to reach accurate answers. One particularly challenging task is autonomous driving, which demands thorough cognitive processing before decisions can be made. In this domain, a sequential and interpretive understanding of visual cues is essential for effective perception, prediction, and planning. Nevertheless, common VQA benchmarks often focus on the accuracy of the final answer while overlooking the reasoning process that enables the generation of accurate responses. Moreover, existing methods lack a comprehensive framework for evaluating step-by-step reasoning in realistic driving scenarios. To address this gap, we propose DriveLMM-o1, a new dataset and benchmark specifically designed to advance step-wise visual reasoning for autonomous driving. Our benchmark features over 18k VQA examples in the training set and more than 4k in the test set, covering diverse questions on perception, prediction, and planning, each enriched with step-by-step reasoning to ensure logical inference in autonomous driving scenarios. We further introduce a large multimodal model that is fine-tuned on our reasoning dataset, demonstrating robust performance in complex driving scenarios. In addition, we benchmark various open-source and closed-source methods on our proposed dataset, systematically comparing their reasoning capabilities for autonomous driving tasks. Our model achieves a +7.49% gain in final answer accuracy, along with a 3.62% improvement in reasoning score over the previous best open-source model. Our framework, dataset, and model are available at https://github.com/ayesha-ishaq/DriveLMM-o1.

World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning

Authors:Siyin Wang, Zhaoye Fei, Qinyuan Cheng, Shiduo Zhang, Panpan Cai, Jinlan Fu, Xipeng Qiu
Date:2025-03-13 15:49:56

Recent advances in large vision-language models (LVLMs) have shown promise for embodied task planning, yet they struggle with fundamental challenges like dependency constraints and efficiency. Existing approaches either solely optimize action selection or leverage world models during inference, overlooking the benefits of learning to model the world as a way to enhance planning capabilities. We propose Dual Preference Optimization (D$^2$PO), a new learning framework that jointly optimizes state prediction and action selection through preference learning, enabling LVLMs to understand environment dynamics for better planning. To automatically collect trajectories and stepwise preference data without human annotation, we introduce a tree search mechanism for extensive exploration via trial-and-error. Extensive experiments on VoTa-Bench demonstrate that our D$^2$PO-based method significantly outperforms existing methods and GPT-4o when applied to Qwen2-VL (7B), LLaVA-1.6 (7B), and LLaMA-3.2 (11B), achieving superior task success rates with more efficient execution paths.

Stratified Topological Autonomy for Long-Range Coordination (STALC)

Authors:Cora A. Dimmig, Adam Goertz, Adam Polevoy, Mark Gonzales, Kevin C. Wolfe, Bradley Woosley, John Rogers, Joseph Moore
Date:2025-03-13 15:45:27

Achieving unified multi-robot coordination and motion planning in complex environments is a challenging problem. In this paper, we present a hierarchical approach to long-range coordination, which we call Stratified Topological Autonomy for Long-Range Coordination (STALC). In particular, we look at the problem of minimizing visibility to observers and maximizing safety with a multi-robot team navigating through a hazardous environment. At its core, our approach relies on the notion of a dynamic topological graph, where the edge weights vary dynamically based on the locations of the robots in the graph. To create this dynamic topological graph, we evaluate the visibility of the robot team from a discrete set of observer locations (both adversarial and friendly), and construct a topological graph whose edge weights depend on both adversary position and robot team configuration. We then impose temporal constraints on the evolution of those edge weights based on robot team state and use Mixed-Integer Programming (MIP) to generate optimal multirobot plans through the graph. The visibility information also informs the lower layers of the autonomy stack to plan minimal visibility paths through the environment for the team of robots. Our approach presents methods to reduce the computational complexity for a team of robots that interact and coordinate across the team to accomplish a common goal. We demonstrate our approach in simulated and hardware experiments in forested and urban environments.

Social Media Harm Abatement: Mechanisms for Transparent Public Health Assessment

Authors:Nathaniel Lubin, Yuning Liu, Amanda Yarnell, S. Bryn Austin, Zachary J. Ward, Ravi Iyer, Jonathan Stray, Matthew Lawrence, Alissa Cooper, Peter Chapman
Date:2025-03-13 15:26:46

Social media platforms have been accused of causing a range of harms, resulting in dozens of lawsuits across jurisdictions. These lawsuits are situated within the context of a long history of American product safety litigation, suggesting opportunities for remediation outside of financial compensation. Anticipating that at least some of these cases may be successful and/or lead to settlements, this article outlines an implementable mechanism for an abatement and/or settlement plan capable of mitigating abuse. The paper describes the requirements of such a mechanism, implications for privacy and oversight, and tradeoffs that such a procedure would entail. The mechanism is framed to operate at the intersection of legal procedure, standards for transparent public health assessment, and the practical requirements of modern technology products.

Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback

Authors:Derun Li, Jianwei Ren, Yue Wang, Xin Wen, Pengxiang Li, Leimeng Xu, Kun Zhan, Zhongpu Xia, Peng Jia, Xianpeng Lang, Ningyi Xu, Hang Zhao
Date:2025-03-13 14:56:17

Generating human-like and adaptive trajectories is essential for autonomous driving in dynamic environments. While generative models have shown promise in synthesizing feasible trajectories, they often fail to capture the nuanced variability of human driving styles due to dataset biases and distributional shifts. To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving preferences. TrajHF incorporates multi-conditional denoiser and reinforcement learning with human feedback to refine multi-modal trajectory generation beyond conventional imitation learning. This enables better alignment with human driving preferences while maintaining safety and feasibility constraints. TrajHF achieves PDMS of 93.95 on NavSim benchmark, significantly exceeding other methods. TrajHF sets a new paradigm for personalized and adaptable trajectory generation in autonomous driving.

A nonlinear real time capable motion cueing algorithm based on deep reinforcement learning

Authors:Hendrik Scheidel, Camilo Gonzalez, Houshyar Asadi, Tobias Bellmann, Andreas Seefried, Shady Mohamed, Saeid Nahavandi
Date:2025-03-13 14:39:19

In motion simulation, motion cueing algorithms are used for the trajectory planning of the motion simulator platform, where workspace limitations prevent direct reproduction of reference trajectories. Strategies such as motion washout, which return the platform to its center, are crucial in these settings. For serial robotic MSPs with highly nonlinear workspaces, it is essential to maximize the efficient utilization of the MSPs kinematic and dynamic capabilities. Traditional approaches, including classical washout filtering and linear model predictive control, fail to consider platform-specific, nonlinear properties, while nonlinear model predictive control, though comprehensive, imposes high computational demands that hinder real-time, pilot-in-the-loop application without further simplification. To overcome these limitations, we introduce a novel approach using deep reinforcement learning for motion cueing, demonstrated here for the first time in a 6-degree-of-freedom setting with full consideration of the MSPs kinematic nonlinearities. Previous work by the authors successfully demonstrated the application of DRL to a simplified 2-DOF setup, which did not consider kinematic or dynamic constraints. This approach has been extended to all 6 DOF by incorporating a complete kinematic model of the MSP into the algorithm, a crucial step for enabling its application on a real motion simulator. The training of the DRL-MCA is based on Proximal Policy Optimization in an actor-critic implementation combined with an automated hyperparameter optimization. After detailing the necessary training framework and the algorithm itself, we provide a comprehensive validation, demonstrating that the DRL MCA achieves competitive performance against established algorithms. Moreover, it generates feasible trajectories by respecting all system constraints and meets all real-time requirements with low...

Analysis of the Institutional Free Market in Accredited Medical Physics Graduate Programs

Authors:Brian W. Pogue, Alexander P. Niver
Date:2025-03-13 14:15:51

Medical Physics education is delivered through accredited programs with admissions and funding for students determined by individual institutions providing the educational experiences. Public data from accredited graduate programs, along with funding data, were used to analyze institutional trends in this educational market. Temporal trends from 2017 to 2023 show robust growth in MS graduates, increasing at an average of 17.7 per year, as compared to steady but modest growth in PhDs, increasing by 3.6 per year. The current status is there are nearly two MS graduates for every PhD graduate. Trends in funding show self-funding of students is a dominant pathway in domestic programs. Those programs dominated by accredited MS education have their largest fraction of faculty in radiation oncology departments, whereas those dominated by PhD education have their largest fraction of faculty in radiology departments. Overall NIH funding in the space of radiation diagnostics and therapeutics has been largely static over this timeframe, but with a notable 5 year rise in NCI funding. This can be contrasted to a substantial 5X-6X rise in NIH funding for engineering research in this same period, with significant increases in trainee funding there. Taken as a whole, this survey shows that growth in the field of medical physics education is dominated by MS graduates, presumably servicing the expanded growth needs for well-trained clinical physicists. However, the research infrastructure that supports PhD training in medical physics seems likely to be growing modestly and missing the growth trend of NIH funding that appears to show substantially more growth in non-accredited programs such as biomedical engineering. This data is useful to informing accreditation guidance on numbers of graduates to match the workforce needs or for inter-institutional planning around education goals.

Extractors: QLDPC Architectures for Efficient Pauli-Based Computation

Authors:Zhiyang He, Alexander Cowtan, Dominic J. Williamson, Theodore J. Yoder
Date:2025-03-13 14:07:40

In pursuit of large-scale fault-tolerant quantum computation, quantum low-density parity-check (LPDC) codes have been established as promising candidates for low-overhead memory when compared to conventional approaches based on surface codes. Performing fault-tolerant logical computation on QLDPC memory, however, has been a long standing challenge in theory and in practice. In this work, we propose a new primitive, which we call an $\textit{extractor system}$, that can augment any QLDPC memory into a computational block well-suited for Pauli-based computation. In particular, any logical Pauli operator supported on the memory can be fault-tolerantly measured in one logical cycle, consisting of $O(d)$ physical syndrome measurement cycles, without rearranging qubit connectivity. We further propose a fixed-connectivity, LDPC architecture built by connecting many extractor-augmented computational (EAC) blocks with bridge systems. When combined with any user-defined source of high fidelity $|T\rangle$ states, our architecture can implement universal quantum circuits via parallel logical measurements, such that all single-block Clifford gates are compiled away. The size of an extractor on an $n$ qubit code is $\tilde{O}(n)$, where the precise overhead has immense room for practical optimizations.

An Algebraic Foundation for Knowledge Graph Construction (Extended Version)

Authors:Sitt Min Oo, Olaf Hartig
Date:2025-03-13 14:03:35

Although they exist since more than ten years already, have attracted diverse implementations, and have been used successfully in a significant number of applications, declarative mapping languages for constructing knowledge graphs from heterogeneous types of data sources still lack a solid formal foundation. This makes it impossible to introduce implementation and optimization techniques that are provably correct and, in fact, has led to discrepancies between different implementations. Moreover, it precludes studying fundamental properties of different languages (e.g., expressive power). To address this gap, this paper introduces a language-agnostic algebra for capturing mapping definitions. As further contributions, we show that the popular mapping language RML can be translated into our algebra (by which we also provide a formal definition of the semantics of RML) and we prove several algebraic rewriting rules that can be used to optimize mapping plans based on our algebra.

LUMOS: Language-Conditioned Imitation Learning with World Models

Authors:Iman Nematollahi, Branton DeMoss, Akshay L Chandra, Nick Hawes, Wolfram Burgard, Ingmar Posner
Date:2025-03-13 13:48:24

We introduce LUMOS, a language-conditioned multi-task imitation learning framework for robotics. LUMOS learns skills by practicing them over many long-horizon rollouts in the latent space of a learned world model and transfers these skills zero-shot to a real robot. By learning on-policy in the latent space of the learned world model, our algorithm mitigates policy-induced distribution shift which most offline imitation learning methods suffer from. LUMOS learns from unstructured play data with fewer than 1% hindsight language annotations but is steerable with language commands at test time. We achieve this coherent long-horizon performance by combining latent planning with both image- and language-based hindsight goal relabeling during training, and by optimizing an intrinsic reward defined in the latent space of the world model over multiple time steps, effectively reducing covariate shift. In experiments on the difficult long-horizon CALVIN benchmark, LUMOS outperforms prior learning-based methods with comparable approaches on chained multi-task evaluations. To the best of our knowledge, we are the first to learn a language-conditioned continuous visuomotor control for a real-world robot within an offline world model. Videos, dataset and code are available at http://lumos.cs.uni-freiburg.de.

HALO: Fault-Tolerant Safety Architecture For High-Speed Autonomous Racing

Authors:Aron Harder, Amar Kulkarni, Madhur Behl
Date:2025-03-13 13:19:51

The field of high-speed autonomous racing has seen significant advances in recent years, with the rise of competitions such as RoboRace and the Indy Autonomous Challenge providing a platform for researchers to develop software stacks for autonomous race vehicles capable of reaching speeds in excess of 170 mph. Ensuring the safety of these vehicles requires the software to continuously monitor for different faults and erroneous operating conditions during high-speed operation, with the goal of mitigating any unreasonable risks posed by malfunctions in sub-systems and components. This paper presents a comprehensive overview of the HALO safety architecture, which has been implemented on a full-scale autonomous racing vehicle as part of the Indy Autonomous Challenge. The paper begins with a failure mode and criticality analysis of the perception, planning, control, and communication modules of the software stack. Specifically, we examine three different types of faults - node health, data health, and behavioral-safety faults. To mitigate these faults, the paper then outlines HALO safety archetypes and runtime monitoring methods. Finally, the paper demonstrates the effectiveness of the HALO safety architecture for each of the faults, through real-world data gathered from autonomous racing vehicle trials during multi-agent scenarios.

Enhanced View Planning for Robotic Harvesting: Tackling Occlusions with Imitation Learning

Authors:Lun Li, Hamidreza Kasaei
Date:2025-03-13 13:12:52

In agricultural automation, inherent occlusion presents a major challenge for robotic harvesting. We propose a novel imitation learning-based viewpoint planning approach to actively adjust camera viewpoint and capture unobstructed images of the target crop. Traditional viewpoint planners and existing learning-based methods, depend on manually designed evaluation metrics or reward functions, often struggle to generalize to complex, unseen scenarios. Our method employs the Action Chunking with Transformer (ACT) algorithm to learn effective camera motion policies from expert demonstrations. This enables continuous six-degree-of-freedom (6-DoF) viewpoint adjustments that are smoother, more precise and reveal occluded targets. Extensive experiments in both simulated and real-world environments, featuring agricultural scenarios and a 6-DoF robot arm equipped with an RGB-D camera, demonstrate our method's superior success rate and efficiency, especially in complex occlusion conditions, as well as its ability to generalize across different crops without reprogramming. This study advances robotic harvesting by providing a practical "learn from demonstration" (LfD) solution to occlusion challenges, ultimately enhancing autonomous harvesting performance and productivity.

CODEI: Resource-Efficient Task-Driven Co-Design of Perception and Decision Making for Mobile Robots Applied to Autonomous Vehicles

Authors:Dejan Milojevic, Gioele Zardini, Miriam Elser, Andrea Censi, Emilio Frazzoli
Date:2025-03-13 12:12:44

This paper discusses the integration challenges and strategies for designing mobile robots, by focusing on the task-driven, optimal selection of hardware and software to balance safety, efficiency, and minimal usage of resources such as costs, energy, computational requirements, and weight. We emphasize the interplay between perception and motion planning in decision-making by introducing the concept of occupancy queries to quantify the perception requirements for sampling-based motion planners. Sensor and algorithm performance are evaluated using False Negative Rates (FPR) and False Positive Rates (FPR) across various factors such as geometric relationships, object properties, sensor resolution, and environmental conditions. By integrating perception requirements with perception performance, an Integer Linear Programming (ILP) approach is proposed for efficient sensor and algorithm selection and placement. This forms the basis for a co-design optimization that includes the robot body, motion planner, perception pipeline, and computing unit. We refer to this framework for solving the co-design problem of mobile robots as CODEI, short for Co-design of Embodied Intelligence. A case study on developing an Autonomous Vehicle (AV) for urban scenarios provides actionable information for designers, and shows that complex tasks escalate resource demands, with task performance affecting choices of the autonomy stack. The study demonstrates that resource prioritization influences sensor choice: cameras are preferred for cost-effective and lightweight designs, while lidar sensors are chosen for better energy and computational efficiency.

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models

Authors:Runze He, Bo Cheng, Yuhang Ma, Qingxiang Jia, Shanyuan Liu, Ao Ma, Xiaoyu Wu, Liebucha Wu, Dawei Leng, Yuhui Yin
Date:2025-03-13 07:37:09

In this paper, we propose a unified layout planning and image generation model, PlanGen, which can pre-plan spatial layout conditions before generating images. Unlike previous diffusion-based models that treat layout planning and layout-to-image as two separate models, PlanGen jointly models the two tasks into one autoregressive transformer using only next-token prediction. PlanGen integrates layout conditions into the model as context without requiring specialized encoding of local captions and bounding box coordinates, which provides significant advantages over the previous embed-and-pool operations on layout conditions, particularly when dealing with complex layouts. Unified prompting allows PlanGen to perform multitasking training related to layout, including layout planning, layout-to-image generation, image layout understanding, etc. In addition, PlanGen can be seamlessly expanded to layout-guided image manipulation thanks to the well-designed modeling, with teacher-forcing content manipulation policy and negative layout guidance. Extensive experiments verify the effectiveness of our PlanGen in multiple layoutrelated tasks, showing its great potential. Code is available at: https://360cvgroup.github.io/PlanGen.

IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

Authors:Yiyang Ling, Karan Owalekar, Oluwatobiloba Adesanya, Erdem Bıyık, Daniel Seita
Date:2025-03-13 07:09:00

Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g., brushing a soft pillow) to more dangerous (e.g., toppling a glass vase). Due to this diversity, it is difficult to characterize which contacts may be acceptable or unacceptable. In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach uses the VLM's outputs to produce a dense 3D "cost map" that encodes contact tolerances and seamlessly integrates with standard motion planners. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3620 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations. Supplementary material is available at https://impact-planning.github.io/.

Enhanced Route Planning with Calibrated Uncertainty Set

Authors:Lingxuan Tang, Rui Luo, Zhixin Zhou, Nicolo Colombo
Date:2025-03-13 06:31:42

This paper investigates the application of probabilistic prediction methodologies in route planning within a road network context. Specifically, we introduce the Conformalized Quantile Regression for Graph Autoencoders (CQR-GAE), which leverages the conformal prediction technique to offer a coverage guarantee, thus improving the reliability and robustness of our predictions. By incorporating uncertainty sets derived from CQR-GAE, we substantially improve the decision-making process in route planning under a robust optimization framework. We demonstrate the effectiveness of our approach by applying the CQR-GAE model to a real-world traffic scenario. The results indicate that our model significantly outperforms baseline methods, offering a promising avenue for advancing intelligent transportation systems.

Impact of buckypaper on the mechanical properties and failure modes of composites

Authors:Kartik Tripathi, Mohamed H. Hamza, Aditi Chattopadhyay, Todd C. Henry, Asha Hall
Date:2025-03-13 05:43:01

Recently, there has been an interest in the incorporation of buckypaper (BP), or carbon nanotube (CNT) membranes, in composite laminates. Research has shown that using BP in contrast to nanotube doped resin enables the introduction of a higher CNT weight fraction which offers multiple benefits including higher piezo resistivity for health monitoring applications and enhanced mechanical response for structural applications. However, their impact on the deformation and failure mechanisms of composite laminates has not been investigated thoroughly. Understanding these issues experimentally would require a carefully executed test plan involving a multitude of design parameters such as BP geometry and placement, material anisotropy and variability, and laminate stacking sequence. This paper presents a deep learning (DL)-based surrogate model for studying the mechanical response of hybrid carbon fiber reinforced polymer (CFRP) composite laminates with BP interleaves under various mechanical loads. The surrogate model utilizes a long short-term memory architecture implemented within a DL framework and predicts the laminate global response for a given configuration, geometry, and loading condition. The DL framework training and cross-validation are performed via data acquisition from a series of three-point bend tests conducted through finite element analysis (FEA) and in-house experiments, respectively. The model predictions show good agreement with FEA simulations and experimental results, where CFRP with two BP interleaves showed enhanced flexural strength and modulus over pristine samples. This enhancement can be attributed to the excellent crack retardation capabilities of CNTs, particularly in the interlaminar region.

SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation

Authors:Xiangyu Shi, Zerui Li, Wenqi Lyu, Jiatong Xia, Feras Dayoub, Yanyuan Qiao, Qi Wu
Date:2025-03-13 05:32:57

Vision-and-Language Navigation (VLN) in continuous environments requires agents to interpret natural language instructions while navigating unconstrained 3D spaces. Existing VLN-CE frameworks rely on a two-stage approach: a waypoint predictor to generate waypoints and a navigator to execute movements. However, current waypoint predictors struggle with spatial awareness, while navigators lack historical reasoning and backtracking capabilities, limiting adaptability. We propose a zero-shot VLN-CE framework integrating an enhanced waypoint predictor with a Multi-modal Large Language Model (MLLM)-based navigator. Our predictor employs a stronger vision encoder, masked cross-attention fusion, and an occupancy-aware loss for better waypoint quality. The navigator incorporates history-aware reasoning and adaptive path planning with backtracking, improving robustness. Experiments on R2R-CE and MP3D benchmarks show our method achieves state-of-the-art (SOTA) performance in zero-shot settings, demonstrating competitive results compared to fully supervised methods. Real-world validation on Turtlebot 4 further highlights its adaptability.

Post-disaster building indoor damage and survivor detection using autonomous path planning and deep learning with unmanned aerial vehicles

Authors:Xiao Pan, Sina Tavasoli, T. Y. Yang, Sina Poorghasem
Date:2025-03-13 04:13:48

Rapid response to natural disasters such as earthquakes is a crucial element in ensuring the safety of civil infrastructures and minimizing casualties. Traditional manual inspection is labour-intensive, time-consuming, and can be dangerous for inspectors and rescue workers. This paper proposed an autonomous inspection approach for structural damage inspection and survivor detection in the post-disaster building indoor scenario, which incorporates an autonomous navigation method, deep learning-based damage and survivor detection method, and a customized low-cost micro aerial vehicle (MAV) with onboard sensors. Experimental studies in a pseudo-post-disaster office building have shown the proposed methodology can achieve high accuracy in structural damage inspection and survivor detection. Overall, the proposed inspection approach shows great potential to improve the efficiency of existing manual post-disaster building inspection.

Combining Cooperative Re-Routing with Intersection Coordination for Connected and Automated Vehicles in Urban Networks

Authors:Panagiotis Typaldos, Andreas A. Malikopoulos
Date:2025-03-13 03:25:35

In this paper, we present a hierarchical framework that integrates upper-level routing with low-level optimal trajectory planning for connected and automated vehicles (CAVs) traveling in an urban network. The upper-level controller efficiently distributes traffic flows by utilizing a dynamic re-routing algorithm that leverages real-time density information and the fundamental diagrams of each network edge. This re-routing approach predicts when each edge will reach critical density and proactively adjusts the routing algorithm's weights to prevent congestion before it occurs. The low-level controller coordinates CAVs as they cross signal-free intersections, generating optimal, fuel-efficient trajectories while ensuring safe passage by satisfying all relevant constraints. We formulate the problem as an optimal control problem and derive an analytical solution. Using the SUMO micro-simulation platform, we conduct simulation experiments on a realistic network. The results show that our hierarchical framework significantly enhances network performance compared to a baseline static routing approach. By dynamically re-routing vehicles, our approach successfully reduces total travel time and mitigates congestion before it develops.

A Pharmacy Benefit Manager Insurance Business Model

Authors:Lawrence W. Abrams
Date:2025-03-13 01:13:16

It is time to move on from attempts to make the pharmacy benefit manager (PBM) reseller business model more transparent. Time and time again the Big 3 PBMs have developed opaque alternatives to piece-meal 100% pass-through mandates. Time and time again PBMs have demonstrated expertise in finding loopholes in state government disclosure laws. The purpose of this paper is to provide quantitative estimates of two transparent insurance business models as a solution to the PBM agency issue. The key parameter used is an 8% gross profit margin figure disclosed by the Big 3 PBMs themselves. Based on reported drug trend delivered to plans, we use a $1,200 to $1,500 per member per year (PMPY) as the range for this key performance indicator (KPI). We propose that discussions of PBM insurance business models start with the following figures: (1) a fixed premium model with medical loss ratio ranging from 92% to 85%; (2) a fee-for-service model ranging from $96 to $180 PMPY with risk sharing of deviations from a contracted PMPY delivered drug spend.

Developing and Evaluating an AI-Assisted Prediction Model for Unplanned Intensive Care Admissions following Elective Neurosurgery using Natural Language Processing within an Electronic Healthcare Record System

Authors:Julia Ive, Olatomiwa Olukoya, Jonathan P. Funnell, James Booker, Sze H M Lam, Ugan Reddy, Kawsar Noor, Richard JB Dobson, Astri M. V. Luoma, Hani J Marcus
Date:2025-03-13 00:48:48

Introduction: Timely care in a specialised neuro-intensive therapy unit (ITU) reduces mortality and hospital stays, with planned admissions being safer than unplanned ones. However, post-operative care decisions remain subjective. This study used artificial intelligence (AI), specifically natural language processing (NLP) to analyse electronic health records (EHRs) and predict ITU admissions for elective surgery patients. Methods: This study analysed the EHRs of elective neurosurgery patients from University College London Hospital (UCLH) using NLP. Patients were categorised into planned high dependency unit (HDU) or ITU admission; unplanned HDU or ITU admission; or ward / overnight recovery (ONR). The Medical Concept Annotation Tool (MedCAT) was used to identify SNOMED-CT concepts within the clinical notes. We then explored the utility of these identified concepts for a range of AI algorithms trained to predict ITU admission. Results: The CogStack-MedCAT NLP model, initially trained on hospital-wide EHRs, underwent two refinements: first with data from patients with Normal Pressure Hydrocephalus (NPH) and then with data from Vestibular Schwannoma (VS) patients, achieving a concept detection F1-score of 0.93. This refined model was then used to extract concepts from EHR notes of 2,268 eligible neurosurgical patients. We integrated the extracted concepts into AI models, including a decision tree model and a neural time-series model. Using the simpler decision tree model, we achieved a recall of 0.87 (CI 0.82 - 0.91) for ITU admissions, reducing the proportion of unplanned ITU cases missed by human experts from 36% to 4%. Conclusion: The NLP model, refined for accuracy, has proven its efficiency in extracting relevant concepts, providing a reliable basis for predictive AI models to use in clinically valid applications.

Vi-LAD: Vision-Language Attention Distillation for Socially-Aware Robot Navigation in Dynamic Environments

Authors:Mohamed Elnoor, Kasun Weerakoon, Gershom Seneviratne, Jing Liang, Vignesh Rajagopal, Dinesh Manocha
Date:2025-03-12 20:38:23

We introduce Vision-Language Attention Distillation (Vi-LAD), a novel approach for distilling socially compliant navigation knowledge from a large Vision-Language Model (VLM) into a lightweight transformer model for real-time robotic navigation. Unlike traditional methods that rely on expert demonstrations or human-annotated datasets, Vi-LAD performs knowledge distillation and fine-tuning at the intermediate layer representation level (i.e., attention maps) by leveraging the backbone of a pre-trained vision-action model. These attention maps highlight key navigational regions in a given scene, which serve as implicit guidance for socially aware motion planning. Vi-LAD fine-tunes a transformer-based model using intermediate attention maps extracted from the pre-trained vision-action model, combined with attention-like semantic maps constructed from a large VLM. To achieve this, we introduce a novel attention-level distillation loss that fuses knowledge from both sources, generating augmented attention maps with enhanced social awareness. These refined attention maps are then utilized as a traversability costmap within a socially aware model predictive controller (MPC) for navigation. We validate our approach through real-world experiments on a Husky wheeled robot, demonstrating significant improvements over state-of-the-art (SOTA) navigation methods. Our results show up to 14.2% - 50% improvement in success rate, which highlights the effectiveness of Vi-LAD in enabling socially compliant and efficient robot navigation.

Temporal Difference Flows

Authors:Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Rémi Munos, Alessandro Lazaric, Ahmed Touati
Date:2025-03-12 20:30:07

Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.

Integrated Experiment and Simulation Co-Design: A Key Infrastructure for Predictive Mesoscale Materials Modeling

Authors:Shailendra P. Joshi, Ashley Bucsek, Darren C. Pagan, Samantha Daly, Suraj Ravindran, Jaime Marian, Miguel A. Bessa, Surya R. Kalidindi, Nikhil C. Admal, Celia Reina, Somnath Ghosh, Jorge Vinals, Ellad B. Tadmor
Date:2025-03-12 19:55:34

The design of structural & functional materials for specialized applications is being fueled by rapid advancements in materials synthesis, characterization, manufacturing, with sophisticated computational materials modeling frameworks that span a wide spectrum of length & time scales in the mesoscale between atomistic & continuum approaches. This is leading towards a systems-based design methodology that will replace traditional empirical approaches, embracing the principles of the Materials Genome Initiative. However, several gaps remain in this framework as it relates to advanced structural materials:(1) limited availability & access to high-fidelity experimental & computational datasets, (2) lack of co-design of experiments & simulation aimed at computational model validation,(3) lack of on-demand access to verified and validated codes for simulation and for experimental analyses, & (4) limited opportunities for workforce training and educational outreach. These shortcomings stifle major innovations in structural materials design. This paper describes plans for a community-driven research initiative that addresses current gaps based on best-practice recommendations of leaders in mesoscale modeling, experimentation & cyberinfrastructure obtained at an NSF-sponsored workshop dedicated to this topic. The proposal is to create a hub for Mesoscale Experimentation and Simulation co-Operation (hMESO)-that will (I) provide curation and sharing of models, data, & codes, (II) foster co-design of experiments for model validation with systematic uncertainty quantification, & (III) provide a platform for education & workforce development. It will engage experimental & computational experts in mesoscale mechanics and plasticity, along with mathematicians and computer scientists with expertise in algorithms, data science, machine learning, & large-scale cyberinfrastructure initiatives.

Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving

Authors:Sara Rajaee, Kumar Pratik, Gabriele Cesa, Arash Behboodi
Date:2025-03-12 18:20:47

The most promising recent methods for AI reasoning require applying variants of reinforcement learning (RL) either on rolled out trajectories from the model, even for the step-wise rewards, or large quantities of human annotated trajectory data. The reliance on the rolled-out trajectory renders the compute cost and time prohibitively high. In particular, the correctness of a reasoning trajectory can typically only be judged at its completion, leading to sparse rewards in RL or requiring expensive synthetic data generation in expert iteration-like methods. In this work, we focus on the Automatic Theorem Proving (ATP) task and propose a novel verifier-in-the-loop design, which unlike existing approaches that leverage feedback on the entire reasoning trajectory, employs an automated verifier to give intermediate feedback at each step of the reasoning process. Using Lean as the verifier, we empirically show that the step-by-step local verification produces a global improvement in the model's reasoning accuracy and efficiency.

The benefit of ignorance for traffic through a random congestible network

Authors:Alican Saray, Calvin Pozderac, Ari Josephson, Brian Skinner
Date:2025-03-12 18:00:00

When traffic is routed through a network that is susceptible to congestion, the self-interested decisions made by individual users do not, in general, produce the optimal flow. This discrepancy is quantified by the so-called "price of anarchy." Here we consider whether the traffic produced by self-interested users is made better or worse when users have uncertain knowledge about the cost functions of the links in the network, and we define a parallel concept that we call the "price of ignorance." We introduce a simple model in which fast, congestible links and slow, incongestible links are mixed randomly in a large network and users plan their routes with finite uncertainty about which of the two cost functions describes each link. One of our key findings is that a small level of user ignorance universally improves traffic, regardless of the network composition. Further, there is an optimal level of ignorance which, in our model, causes the self-interested user behavior to coincide with the optimum. Many features of our model can be understood analytically, including the optimal level of user ignorance and the existence of critical scaling near the percolation threshold for fast links, where the potential benefit of user ignorance is greatest.

Auspex: Building Threat Modeling Tradecraft into an Artificial Intelligence-based Copilot

Authors:Andrew Crossman, Andrew R. Plummer, Chandra Sekharudu, Deepak Warrier, Mohammad Yekrangian
Date:2025-03-12 17:54:18

We present Auspex - a threat modeling system built using a specialized collection of generative artificial intelligence-based methods that capture threat modeling tradecraft. This new approach, called tradecraft prompting, centers on encoding the on-the-ground knowledge of threat modelers within the prompts that drive a generative AI-based threat modeling system. Auspex employs tradecraft prompts in two processing stages. The first stage centers on ingesting and processing system architecture information using prompts that encode threat modeling tradecraft knowledge pertaining to system decomposition and description. The second stage centers on chaining the resulting system analysis through a collection of prompts that encode tradecraft knowledge on threat identification, classification, and mitigation. The two-stage process yields a threat matrix for a system that specifies threat scenarios, threat types, information security categorizations and potential mitigations. Auspex produces formalized threat model output in minutes, relative to the weeks or months a manual process takes. More broadly, the focus on bespoke tradecraft prompting, as opposed to fine-tuning or agent-based add-ons, makes Auspex a lightweight, flexible, modular, and extensible foundational system capable of addressing the complexity, resource, and standardization limitations of both existing manual and automated threat modeling processes. In this connection, we establish the baseline value of Auspex to threat modelers through an evaluation procedure based on feedback collected from cybersecurity subject matter experts measuring the quality and utility of threat models generated by Auspex on real banking systems. We conclude with a discussion of system performance and plans for enhancements to Auspex.

The Value of Goal Commitment in Planning

Authors:Alberto Pozanco, Marianela Morales, Daniel Borrajo, Manuela Veloso
Date:2025-03-12 17:00:37

In this paper, we revisit the concept of goal commitment from early planners in the presence of current forward chaining heuristic planners. We present a compilation that extends the original planning task with commit actions that enforce the persistence of specific goals once achieved, thereby committing to them in the search sub-tree. This approach imposes a specific goal achievement order in parts of the search tree, potentially introducing dead-end states. This can reduce search effort if the goal achievement order is correct. Otherwise, the search algorithm can expand nodes in the open list where goals do not persist. Experimental results demonstrate that the reformulated tasks suit state-of-the-art agile planners, enabling them to find better