planning - 2025-03-20

Exploiting Prior Knowledge in Preferential Learning of Individualized Autonomous Vehicle Driving Styles

Authors:Lukas Theiner, Sebastian Hirt, Alexander Steinke, Rolf Findeisen
Date:2025-03-19 16:47:56

Trajectory planning for automated vehicles commonly employs optimization over a moving horizon - Model Predictive Control - where the cost function critically influences the resulting driving style. However, finding a suitable cost function that results in a driving style preferred by passengers remains an ongoing challenge. We employ preferential Bayesian optimization to learn the cost function by iteratively querying a passenger's preference. Due to increasing dimensionality of the parameter space, preference learning approaches might struggle to find a suitable optimum with a limited number of experiments and expose the passenger to discomfort when exploring the parameter space. We address these challenges by incorporating prior knowledge into the preferential Bayesian optimization framework. Our method constructs a virtual decision maker from real-world human driving data to guide parameter sampling. In a simulation experiment, we achieve faster convergence of the prior-knowledge-informed learning procedure compared to existing preferential Bayesian optimization approaches and reduce the number of inadequate driving styles sampled.

Advancing MG Energy Management: A Rolling Horizon Optimization Framework for Three-Phase Unbalanced Networks Integrating Convex Formulations

Authors:Pablo Cortés, Alejandra Tabares, Fredy Franco
Date:2025-03-19 16:34:49

Real-world three-phase microgrids face two interconnected challenges: 1. time-varying uncertainty from renewable generation and demand, and 2. persistent phase imbalances caused by uneven distributed energy resources DERs, load asymmetries, and grid faults. Conventional energy management systems fail to address these challenges holistically and static optimization methods lack adaptability to real-time fluctuations, while balanced three-phase models ignore critical asymmetries that degrade voltage stability and efficiency. This work introduces a dynamic rolling horizon optimization framework specifically designed for unbalanced three-phase microgrids. Unlike traditional two-stage stochastic approaches that fix decisions for the entire horizon, the rolling horizon algorithm iteratively updates decisions in response to real-time data. By solving a sequence of shorter optimization windows, each incorporating the latest system state and forecasts, the method achieves three key advantages: Adaptive Uncertainty Handling by continuously re plans operations to mitigate forecast errors. Phase Imbalance Correction by dynamically adjusts power flows across phases to minimize voltage deviations and losses caused by asymmetries, and computational Tractability, i.e., shorter optimization windows, combined with the mathematical mhodel, enable better decision making holding accuracy. For comparison purposes, we derive three optimization models: a nonlinear nonconvex model for high-fidelity offline planning, a convex quadratic approximation for day-ahead scheduling, and a linearized model to important for theoretical reasons such as decomposition algorithms.

Reinforcement Learning for Robust Athletic Intelligence: Lessons from the 2nd 'AI Olympics with RealAIGym' Competition

Authors:Felix Wiebe, Niccolò Turcato, Alberto Dalla Libera, Jean Seong Bjorn Choe, Bumkyu Choi, Tim Lukas Faust, Habib Maraqten, Erfan Aghadavoodi, Marco Cali, Alberto Sinigaglia, Giulio Giacomuzzo, Diego Romeres, Jong-kook Kim, Gian Antonio Susto, Shubham Vyas, Dennis Mronga, Boris Belousov, Jan Peters, Frank Kirchner, Shivesh Kumar
Date:2025-03-19 15:10:02

In the field of robotics many different approaches ranging from classical planning over optimal control to reinforcement learning (RL) are developed and borrowed from other fields to achieve reliable control in diverse tasks. In order to get a clear understanding of their individual strengths and weaknesses and their applicability in real world robotic scenarios is it important to benchmark and compare their performances not only in a simulation but also on real hardware. The '2nd AI Olympics with RealAIGym' competition was held at the IROS 2024 conference to contribute to this cause and evaluate different controllers according to their ability to solve a dynamic control problem on an underactuated double pendulum system with chaotic dynamics. This paper describes the four different RL methods submitted by the participating teams, presents their performance in the swing-up task on a real double pendulum, measured against various criteria, and discusses their transferability from simulation to real hardware and their robustness to external disturbances.

Perception-aware Planning for Quadrotor Flight in Unknown and Feature-limited Environments

Authors:Chenxin Yu, Zihong Lu, Jie Mei, Boyu Zhou
Date:2025-03-19 14:47:44

Various studies on perception-aware planning have been proposed to enhance the state estimation accuracy of quadrotors in visually degraded environments. However, many existing methods heavily rely on prior environmental knowledge and face significant limitations in previously unknown environments with sparse localization features, which greatly limits their practical application. In this paper, we present a perception-aware planning method for quadrotor flight in unknown and feature-limited environments that properly allocates perception resources among environmental information during navigation. We introduce a viewpoint transition graph that allows for the adaptive selection of local target viewpoints, which guide the quadrotor to efficiently navigate to the goal while maintaining sufficient localizability and without being trapped in feature-limited regions. During the local planning, a novel yaw trajectory generation method that simultaneously considers exploration capability and localizability is presented. It constructs a localizable corridor via feature co-visibility evaluation to ensure localization robustness in a computationally efficient way. Through validations conducted in both simulation and real-world experiments, we demonstrate the feasibility and real-time performance of the proposed method. The source code will be released to benefit the community.

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

Authors:Jiazhe Guo, Yikang Ding, Xiwu Chen, Shuo Chen, Bohan Li, Yingshuang Zou, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Zhiheng Li, Hao Zhao
Date:2025-03-19 13:49:48

Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization. A key challenge lies in finding an efficient and generalizable geometric representation that seamlessly connects temporal and spatial synthesis. To address this, we propose DiST-4D, the first disentangled spatiotemporal diffusion framework for 4D driving scene generation, which leverages metric depth as the core geometric representation. DiST-4D decomposes the problem into two diffusion processes: DiST-T, which predicts future metric depth and multi-view RGB sequences directly from past observations, and DiST-S, which enables spatial NVS by training only on existing viewpoints while enforcing cycle consistency. This cycle consistency mechanism introduces a forward-backward rendering constraint, reducing the generalization gap between observed and unseen viewpoints. Metric depth is essential for both accurate reliable forecasting and accurate spatial NVS, as it provides a view-consistent geometric representation that generalizes well to unseen perspectives. Experiments demonstrate that DiST-4D achieves state-of-the-art performance in both temporal prediction and NVS tasks, while also delivering competitive performance in planning-related evaluations.

Exploring the Perspectives of Social VR-Aware Non-Parent Adults and Parents on Children's Use of Social Virtual Reality

Authors:Cristina Fiani, Pejman Saeghe, Mark McGill, Mohamed Khamis
Date:2025-03-19 10:57:20

Social Virtual Reality (VR), where people meet in virtual spaces via 3D avatars, is used by children and adults alike. Children experience new forms of harassment in social VR where it is often inaccessible to parental oversight. To date, there is limited understanding of how parents and non-parent adults within the child social VR ecosystem perceive the appropriateness of social VR for different age groups and the measures in place to safeguard children. We present results of a mixed-methods questionnaire (N=149 adults, including 79 parents) focusing on encounters with children in social VR and perspectives towards children's use of social VR. We draw novel insights on the frequency of social VR use by children under 13 and current use of, and future aspirations for, child protection interventions. Compared to non-parent adults, parents familiar with social VR propose lower minimum ages and are more likely to allow social VR without supervision. Adult users experience immaturity from children in social VR, while children face abuse, encounter age-inappropriate behaviours and self-disclose to adults. We present directions to enhance the safety of social VR through pre-planned controls, real-time oversight, post-event insight and the need for evidence-based guidelines to support parents and platforms around age-appropriate interventions.

Embedding spatial context in urban traffic forecasting with contrastive pre-training

Authors:Matthew Low, Arian Prabowo, Hao Xue, Flora Salim
Date:2025-03-19 08:21:22

Urban traffic forecasting is a commonly encountered problem, with wide-ranging applications in fields such as urban planning, civil engineering and transport. In this paper, we study the enhancement of traffic forecasting with pre-training, focusing on spatio-temporal graph methods. While various machine learning methods to solve traffic forecasting problems have been explored and extensively studied, there is a gap of a more contextual approach: studying how relevant non-traffic data can improve prediction performance on traffic forecasting problems. We call this data spatial context. We introduce a novel method of combining road and traffic information through the notion of a traffic quotient graph, a quotient graph formed from road geometry and traffic sensors. We also define a way to encode this relationship in the form of a geometric encoder, pre-trained using contrastive learning methods and enhanced with OpenStreetMap data. We introduce and discuss ways to integrate this geometric encoder with existing graph neural network (GNN)-based traffic forecasting models, using a contrastive pre-training paradigm. We demonstrate the potential for this hybrid model to improve generalisation and performance with zero additional traffic data. Code for this paper is available at https://github.com/mattchrlw/forecasting-on-new-roads.

A Language Vision Model Approach for Automated Tumor Contouring in Radiation Oncology

Authors:Yi Luo, Hamed Hooshangnejad, Xue Feng, Gaofeng Huang, Xiaojian Chen, Rui Zhang, Quan Chen, Wil Ngwa, Kai Ding
Date:2025-03-19 06:41:37

Background: Lung cancer ranks as the leading cause of cancer-related mortality worldwide. The complexity of tumor delineation, crucial for radiation therapy, requires expertise often unavailable in resource-limited settings. Artificial Intelligence(AI), particularly with advancements in deep learning (DL) and natural language processing (NLP), offers potential solutions yet is challenged by high false positive rates. Purpose: The Oncology Contouring Copilot (OCC) system is developed to leverage oncologist expertise for precise tumor contouring using textual descriptions, aiming to increase the efficiency of oncological workflows by combining the strengths of AI with human oversight. Methods: Our OCC system initially identifies nodule candidates from CT scans. Employing Language Vision Models (LVMs) like GPT-4V, OCC then effectively reduces false positives with clinical descriptive texts, merging textual and visual data to automate tumor delineation, designed to elevate the quality of oncology care by incorporating knowledge from experienced domain experts. Results: Deployments of the OCC system resulted in a significant reduction in the false discovery rate by 35.0%, a 72.4% decrease in false positives per scan, and an F1-score of 0.652 across our dataset for unbiased evaluation. Conclusions: OCC represents a significant advance in oncology care, particularly through the use of the latest LVMs to improve contouring results by (1) streamlining oncology treatment workflows by optimizing tumor delineation, reducing manual processes; (2) offering a scalable and intuitive framework to reduce false positives in radiotherapy planning using LVMs; (3) introducing novel medical language vision prompt techniques to minimize LVMs hallucinations with ablation study, and (4) conducting a comparative analysis of LVMs, highlighting their potential in addressing medical language vision challenges.

Speed Optimization Algorithm based on Deterministic Markov Decision Process for Automated Highway Merge

Authors:Takeru Goto, Kosuke Toda, Takayasu Kumano
Date:2025-03-19 04:57:03

This study presents a robust optimization algorithm for automated highway merge. The merging scenario is one of the challenging scenes in automated driving, because it requires adjusting ego vehicle's speed to match other vehicles before reaching the end point. Then, we model the speed planning problem as a deterministic Markov decision process. The proposed scheme is able to compute each state value of the process and reliably derive the optimal sequence of actions. In our approach, we adopt jerk as the action of the process to prevent a sudden change of acceleration. However, since this expands the state space, we also consider ways to achieve a real-time operation. We compared our scheme with a simple algorithm with the Intelligent Driver Model. We not only evaluated the scheme in a simulation environment but also conduct a real world testing.

Geometric Iterative Approach for Efficient Inverse Kinematics and Planning of Continuum Robots with a Floating Base Under Environment Constraints

Authors:Congjun Ma, Quan Xiao, Liangcheng Liu, Xingxing You, Songyi Dian
Date:2025-03-19 03:12:34

Continuum robots with floating bases demonstrate exceptional operational capabilities in confined spaces, such as those encountered in medical surgeries and equipment maintenance. However, developing low-cost solutions for their motion and planning problems remains a significant challenge in this field. This paper investigates the application of geometric iterative strategy methods to continuum robots, and proposes the algorithm based on an improved two-layer geometric iterative strategy for motion planning. First, we thoroughly study the kinematics and effective workspace of a multi-segment tendon-driven continuum robot with a floating base. Then, generalized iterative algorithms for solving arbitrary-segment continuum robots are proposed based on a series of problems such as initial arm shape dependence exhibited by similar methods when applied to continuum robots. Further, the task scenario is extended to a follow-the-leader task considering environmental factors, and further extended algorithm are proposed. Simulation comparison results with similar methods demonstrate the effectiveness of the proposed method in eliminating the initial arm shape dependence and improving the solution efficiency and accuracy. The experimental results further demonstrate that the method based on improved two-layer geometric iteration can be used for motion planning task of a continuum robot with a floating base, under an average deviation of about 4 mm in the end position, an average orientation deviation of no more than 1 degree, and the reduction of average number of iterations and time cost is 127.4 iterations and 72.6 ms compared with similar methods, respectively.

Learning with Expert Abstractions for Efficient Multi-Task Continuous Control

Authors:Jeff Jewett, Sandhya Saisubramanian
Date:2025-03-19 00:44:23

Decision-making in complex, continuous multi-task environments is often hindered by the difficulty of obtaining accurate models for planning and the inefficiency of learning purely from trial and error. While precise environment dynamics may be hard to specify, human experts can often provide high-fidelity abstractions that capture the essential high-level structure of a task and user preferences in the target environment. Existing hierarchical approaches often target discrete settings and do not generalize across tasks. We propose a hierarchical reinforcement learning approach that addresses these limitations by dynamically planning over the expert-specified abstraction to generate subgoals to learn a goal-conditioned policy. To overcome the challenges of learning under sparse rewards, we shape the reward based on the optimal state value in the abstract model. This structured decision-making process enhances sample efficiency and facilitates zero-shot generalization. Our empirical evaluation on a suite of procedurally generated continuous control environments demonstrates that our approach outperforms existing hierarchical reinforcement learning methods in terms of sample efficiency, task completion rate, scalability to complex tasks, and generalization to novel scenarios.

Generative design of functional organic molecules for terahertz radiation detection

Authors:Zsuzsanna Koczor-Benda, Shayantan Chaudhuri, Joe Gilkes, Francesco Bartucca, Liming Li, Reinhard J. Maurer
Date:2025-03-18 21:23:01

Plasmonic nanocavities are molecule-nanoparticle junctions that offer a promising approach to upconvert terahertz radiation into visible or near-infrared light, enabling nanoscale detection at room temperature. However, the identification of molecules with strong terahertz-to-visible upconversion efficiency is limited by the availability of suitable compounds in commercial databases. Here, we employ the generative autoregressive deep neural network, G-SchNet, to perform property-driven design of novel monothiolated molecules tailored for terahertz radiation detection. To design functional organic molecules, we iteratively bias G-SchNet to drive molecular generation towards highly active and synthesizable molecules based on machine learning-based property predictors, including molecular fingerprints and state-of-the-art neural networks. We study the reliability of these property predictors for generated molecules and analyze the chemical space and properties of generated molecules to identify trends in activity. Finally, we filter generated molecules and plan retrosynthetic routes from commercially available reactants to identify promising novel compounds and their most active vibrational modes in terahertz-to-visible upconversion.

Risk-Aware Planning of Power Distribution Systems Using Scalable Cloud Technologies

Authors:Shiva Poudel, Poorva Sharma, Abhineet Parchure, Daniel Olsen, Sayantan Bhowmik, Tonya Martin, Dylan Locsin, Andrew P. Reiman
Date:2025-03-18 21:00:17

The uncertainty in distribution grid planning is driven by the unpredictable spatial and temporal patterns in adopting electric vehicles (EVs) and solar photovoltaic (PV) systems. This complexity, stemming from interactions among EVs, PV systems, customer behavior, and weather conditions, calls for a scalable framework to capture a full range of possible scenarios and analyze grid responses to factor in compound uncertainty. Although this process is challenging for many utilities today, the need to model numerous grid parameters as random variables and evaluate the impact on the system from many different perspectives will become increasingly essential to facilitate more strategic and well-informed planning investments. We present a scalable, stochastic-aware distribution system planning application that addresses these uncertainties by capturing spatial and temporal variability through a Markov model and conducting Monte Carlo simulations leveraging modular cloud-based architecture. The results demonstrate that 15,000 power flow scenarios generated from the Markov model are completed on the modified IEEE 123-bus test feeder, with each simulation representing an 8,760-hour time series run, all in under an hour. The grid impact extracted from this huge volume of simulated data provides insights into the spatial and temporal effects of adopted technology, highlighting that planning solely for average conditions is inadequate, while worst-case scenario planning may lead to prohibitive expenses.

These Magic Moments: Differentiable Uncertainty Quantification of Radiance Field Models

Authors:Parker Ewen, Hao Chen, Seth Isaacson, Joey Wilson, Katherine A. Skinner, Ram Vasudevan
Date:2025-03-18 19:12:02

This paper introduces a novel approach to uncertainty quantification for radiance fields by leveraging higher-order moments of the rendering equation. Uncertainty quantification is crucial for downstream tasks including view planning and scene understanding, where safety and robustness are paramount. However, the high dimensionality and complexity of radiance fields pose significant challenges for uncertainty quantification, limiting the use of these uncertainty quantification methods in high-speed decision-making. We demonstrate that the probabilistic nature of the rendering process enables efficient and differentiable computation of higher-order moments for radiance field outputs, including color, depth, and semantic predictions. Our method outperforms existing radiance field uncertainty estimation techniques while offering a more direct, computationally efficient, and differentiable formulation without the need for post-processing.Beyond uncertainty quantification, we also illustrate the utility of our approach in downstream applications such as next-best-view (NBV) selection and active ray sampling for neural radiance field training. Extensive experiments on synthetic and real-world scenes confirm the efficacy of our approach, which achieves state-of-the-art performance while maintaining simplicity.

Safety-Critical and Distributed Nonlinear Predictive Controllers for Teams of Quadrupedal Robots

Authors:Basit Muhammad Imran, Jeeseop Kim, Taizoon Chunawala, Alexander Leonessa, Kaveh Akbari Hamed
Date:2025-03-18 19:05:57

This paper presents a novel hierarchical, safety-critical control framework that integrates distributed nonlinear model predictive controllers (DNMPCs) with control barrier functions (CBFs) to enable cooperative locomotion of multi-agent quadrupedal robots in complex environments. While NMPC-based methods are widely adopted for enforcing safety constraints and navigating multi-robot systems (MRSs) through intricate environments, ensuring the safety of MRSs requires a formal definition grounded in the concept of invariant sets. CBFs, typically implemented via quadratic programs (QPs) at the planning layer, provide formal safety guarantees. However, their zero-control horizon limits their effectiveness for extended trajectory planning in inherently unstable, underactuated, and nonlinear legged robot models. Furthermore, the integration of CBFs into real-time NMPC for sophisticated MRSs, such as quadrupedal robot teams, remains underexplored. This paper develops computationally efficient, distributed NMPC algorithms that incorporate CBF-based collision safety guarantees within a consensus protocol, enabling longer planning horizons for safe cooperative locomotion under disturbances and rough terrain conditions. The optimal trajectories generated by the DNMPCs are tracked using full-order, nonlinear whole-body controllers at the low level. The proposed approach is validated through extensive numerical simulations with up to four Unitree A1 robots and hardware experiments involving two A1 robots subjected to external pushes, rough terrain, and uncertain obstacle information. Comparative analysis demonstrates that the proposed CBF-based DNMPCs achieve a 27.89% higher success rate than conventional NMPCs without CBF constraints.

Measurement of SiPM Dark Currents and Annealing Recovery for Fluences Expected in ePIC Calorimeters at the Electron-Ion Collider

Authors:Jiajun Huang, Sean Preins, Ryan Tsiao, Miguel Rodriguez, Barak Schmookler, Miguel Arratia
Date:2025-03-18 18:19:45

Silicon photomultipliers (SiPMs) will be used to read out all calorimeters in the ePIC experiment at the Electron-Ion Collider (EIC). A thorough characterization of the radiation damage expected for SiPMs under anticipated EIC fluences is essential for accurate simulations, detector design, and effective operational strategies. In this study, we evaluate radiation damage for the specific SiPM models chosen for ePIC across the complete fluence range anticipated at the EIC, $10^8$ to $10^{12}$ 1-MeV $n_{\mathrm{eq}}$/cm$^2$ per year, depending on the calorimeter location. The SiPMs were irradiated using a 64 MeV proton beam provided by the University of California, Davis 76" Cyclotron. We measured the SiPM dark-current as a function of fluence and bias voltage and investigated the effectiveness of high-temperature annealing to recover radiation damage. These results provide a comprehensive reference for the design, simulation, and operational planning of all ePIC calorimeter systems.

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

Authors:Ayesha Ishaq, Jean Lahoud, Fahad Shahbaz Khan, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer
Date:2025-03-18 17:59:12

Large Multimodal Models (LMMs) have recently gained prominence in autonomous driving research, showcasing promising capabilities across various emerging benchmarks. LMMs specifically designed for this domain have demonstrated effective perception, planning, and prediction skills. However, many of these methods underutilize 3D spatial and temporal elements, relying mainly on image data. As a result, their effectiveness in dynamic driving environments is limited. We propose to integrate tracking information as an additional input to recover 3D spatial and temporal details that are not effectively captured in the images. We introduce a novel approach for embedding this tracking information into LMMs to enhance their spatiotemporal understanding of driving scenarios. By incorporating 3D tracking data through a track encoder, we enrich visual queries with crucial spatial and temporal cues while avoiding the computational overhead associated with processing lengthy video sequences or extensive 3D inputs. Moreover, we employ a self-supervised approach to pretrain the tracking encoder to provide LMMs with additional contextual information, significantly improving their performance in perception, planning, and prediction tasks for autonomous driving. Experimental results demonstrate the effectiveness of our approach, with a gain of 9.5% in accuracy, an increase of 7.04 points in the ChatGPT score, and 9.4% increase in the overall score over baseline models on DriveLM-nuScenes benchmark, along with a 3.7% final score improvement on DriveLM-CARLA. Our code is available at https://github.com/mbzuai-oryx/TrackingMeetsLMM

VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms

Authors:Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu
Date:2025-03-18 16:59:09

Escape rooms present a unique cognitive challenge that demands exploration-driven planning: players should actively search their environment, continuously update their knowledge based on new discoveries, and connect disparate clues to determine which elements are relevant to their objectives. Motivated by this, we introduce VisEscape, a benchmark of 20 virtual escape rooms specifically designed to evaluate AI models under these challenging conditions, where success depends not only on solving isolated puzzles but also on iteratively constructing and refining spatial-temporal knowledge of a dynamically changing environment. On VisEscape, we observed that even state-of-the-art multimodal models generally fail to escape the rooms, showing considerable variation in their levels of progress and trajectories. To address this issue, we propose VisEscaper, which effectively integrates Memory, Feedback, and ReAct modules, demonstrating significant improvements by performing 3.7 times more effectively and 5.0 times more efficiently on average.

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Authors:Shoubin Yu, Difan Liu, Ziqiao Ma, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal
Date:2025-03-18 15:31:12

Recent video diffusion models have enhanced video editing, but it remains challenging to handle instructional editing and diverse tasks (e.g., adding, removing, changing) within a unified framework. In this paper, we introduce VEGGIE, a Video Editor with Grounded Generation from Instructions, a simple end-to-end framework that unifies video concept editing, grounding, and reasoning based on diverse user instructions. Specifically, given a video and text query, VEGGIE first utilizes an MLLM to interpret user intentions in instructions and ground them to the video contexts, generating frame-specific grounded task queries for pixel-space responses. A diffusion model then renders these plans and generates edited videos that align with user intent. To support diverse tasks and complex instructions, we employ a curriculum learning strategy: first aligning the MLLM and video diffusion model with large-scale instructional image editing data, followed by end-to-end fine-tuning on high-quality multitask video data. Additionally, we introduce a novel data synthesis pipeline to generate paired instructional video editing data for model training. It transforms static image data into diverse, high-quality video editing samples by leveraging Image-to-Video models to inject dynamics. VEGGIE shows strong performance in instructional video editing with different editing skills, outperforming the best instructional baseline as a versatile model, while other models struggle with multi-tasking. VEGGIE also excels in video object grounding and reasoning segmentation, where other baselines fail. We further reveal how the multiple tasks help each other and highlight promising applications like zero-shot multimodal instructional and in-context video editing.

ADAPT: An Autonomous Forklift for Construction Site Operation

Authors:Johannes Huemer, Markus Murschitz, Matthias Schörghuber, Lukas Reisinger, Thomas Kadiofsky, Christoph Weidinger, Mario Niedermeyer, Benedikt Widy, Marcel Zeilinger, Csaba Beleznai, Tobias Glück, Andreas Kugi, Patrik Zips
Date:2025-03-18 15:03:28

Efficient material logistics play a critical role in controlling costs and schedules in the construction industry. However, manual material handling remains prone to inefficiencies, delays, and safety risks. Autonomous forklifts offer a promising solution to streamline on-site logistics, reducing reliance on human operators and mitigating labor shortages. This paper presents the development and evaluation of the Autonomous Dynamic All-terrain Pallet Transporter (ADAPT), a fully autonomous off-road forklift designed for construction environments. Unlike structured warehouse settings, construction sites pose significant challenges, including dynamic obstacles, unstructured terrain, and varying weather conditions. To address these challenges, our system integrates AI-driven perception techniques with traditional approaches for decision making, planning, and control, enabling reliable operation in complex environments. We validate the system through extensive real-world testing, comparing its long-term performance against an experienced human operator across various weather conditions. We also provide a comprehensive analysis of challenges and key lessons learned, contributing to the advancement of autonomous heavy machinery. Our findings demonstrate that autonomous outdoor forklifts can operate near human-level performance, offering a viable path toward safer and more efficient construction logistics.

Risk-Sensitive Model Predictive Control for Interaction-Aware Planning -- A Sequential Convexification Algorithm

Authors:Renzi Wang, Mathijs Schuurmans, Panagiotis Patrinos
Date:2025-03-18 15:01:37

This paper considers risk-sensitive model predictive control for stochastic systems with a decision-dependent distribution. This class of systems is commonly found in human-robot interaction scenarios. We derive computationally tractable convex upper bounds to both the objective function, and to frequently used penalty terms for collision avoidance, allowing us to efficiently solve the generally nonconvex optimal control problem as a sequence of convex problems. Simulations of a robot navigating a corridor demonstrate the effectiveness and the computational advantage of the proposed approach.

An Assessment of the UK Government Clean Energy Strategy for the Year 2030

Authors:Anthony D. Stephens, David R. Walwyn
Date:2025-03-18 14:48:06

In 2024, the UK Government made two striking announcements on its plans to decarbonise the energy system; it pledged GBP22 billion to establish carbon capture and storage hubs on Teesside and Merseyside and released the Clean Power 2030 Action Plan. This paper questions the validity of both plans, arguing that they do not take adequate account of the consequences of the highly variable nature of wind and solar generations. Using dynamic models of future UK electricity systems which are designed to take account of these variabilities, it is shown that the Clean Power 2030 Action Plan overestimates the ability of wind and solar generations to decarbonise the electricity system as they increase in size relative to the demand of the electricity system. More importantly, the dynamic models show that most of the achievable decarbonization is the result of increasing wind generation from the current level of around 10 GW to around 20 GW. Increasing wind generation to only 20 GW, rather than to 30 GW as proposed in the Action Plan, should halve the proposed cost, a saving of perhaps GBP 120 billion, with little disbenefit in terms of reduced decarbonization. Furthermore, the dynamic modelling shows that UK gas storage capacity of 7.5 winter days looks hopeless inadequate in comparison with the storage capacities deemed necessary by its continental neighbors. Concern is expressed that a consequence of the Climate Change Act of 2008 requiring the UK to meet arbitrary decarbonization targets is leading government advisors to propose several unproven and therefore highly risky technological solutions.

RoMedFormer: A Rotary-Embedding Transformer Foundation Model for 3D Genito-Pelvic Structure Segmentation in MRI and CT

Authors:Yuheng Li, Mingzhe Hu, Richard L. J. Qiu, Maria Thor, Andre Williams, Deborah Marshall, Xiaofeng Yang
Date:2025-03-18 14:45:05

Deep learning-based segmentation of genito-pelvic structures in MRI and CT is crucial for applications such as radiation therapy, surgical planning, and disease diagnosis. However, existing segmentation models often struggle with generalizability across imaging modalities, and anatomical variations. In this work, we propose RoMedFormer, a rotary-embedding transformer-based foundation model designed for 3D female genito-pelvic structure segmentation in both MRI and CT. RoMedFormer leverages self-supervised learning and rotary positional embeddings to enhance spatial feature representation and capture long-range dependencies in 3D medical data. We pre-train our model using a diverse dataset of 3D MRI and CT scans and fine-tune it for downstream segmentation tasks. Experimental results demonstrate that RoMedFormer achieves superior performance segmenting genito-pelvic organs. Our findings highlight the potential of transformer-based architectures in medical image segmentation and pave the way for more transferable segmentation frameworks.

Stochastic Trajectory Prediction under Unstructured Constraints

Authors:Hao Ma, Zhiqiang Pu, Shijie Wang, Boyin Liu, Huimu Wang, Yanyan Liang, Jianqiang Yi
Date:2025-03-18 12:27:59

Trajectory prediction facilitates effective planning and decision-making, while constrained trajectory prediction integrates regulation into prediction. Recent advances in constrained trajectory prediction focus on structured constraints by constructing optimization objectives. However, handling unstructured constraints is challenging due to the lack of differentiable formal definitions. To address this, we propose a novel method for constrained trajectory prediction using a conditional generative paradigm, named Controllable Trajectory Diffusion (CTD). The key idea is that any trajectory corresponds to a degree of conformity to a constraint. By quantifying this degree and treating it as a condition, a model can implicitly learn to predict trajectories under unstructured constraints. CTD employs a pre-trained scoring model to predict the degree of conformity (i.e., a score), and uses this score as a condition for a conditional diffusion model to generate trajectories. Experimental results demonstrate that CTD achieves high accuracy on the ETH/UCY and SDD benchmarks. Qualitative analysis confirms that CTD ensures adherence to unstructured constraints and can predict trajectories that satisfy combinatorial constraints.

Variable Time-Step MPC for Agile Multi-Rotor UAV Interception of Dynamic Targets

Authors:Atharva Ghotavadekar, František Nekovář, Martin Saska, Jan Faigl
Date:2025-03-18 11:59:24

Agile trajectory planning can improve the efficiency of multi-rotor Uncrewed Aerial Vehicles (UAVs) in scenarios with combined task-oriented and kinematic trajectory planning, such as monitoring spatio-temporal phenomena or intercepting dynamic targets. Agile planning using existing non-linear model predictive control methods is limited by the number of planning steps as it becomes increasingly computationally demanding. That reduces the prediction horizon length, leading to a decrease in solution quality. Besides, the fixed time-step length limits the utilization of the available UAV dynamics in the target neighborhood. In this paper, we propose to address these limitations by introducing variable time steps and coupling them with the prediction horizon length. A simplified point-mass motion primitive is used to leverage the differential flatness of quadrotor dynamics and the generation of feasible trajectories in the flat output space. Based on the presented evaluation results and experimentally validated deployment, the proposed method increases the solution quality by enabling planning for long flight segments but allowing tightly sampled maneuvering.

Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning

Authors:Bozhou Zhang, Nan Song, Xin Jin, Li Zhang
Date:2025-03-18 11:57:31

End-to-end autonomous driving unifies tasks in a differentiable framework, enabling planning-oriented optimization and attracting growing attention. Current methods aggregate historical information either through dense historical bird's-eye-view (BEV) features or by querying a sparse memory bank, following paradigms inherited from detection. However, we argue that these paradigms either omit historical information in motion planning or fail to align with its multi-step nature, which requires predicting or planning multiple future time steps. In line with the philosophy of future is a continuation of past, we propose BridgeAD, which reformulates motion and planning queries as multi-step queries to differentiate the queries for each future time step. This design enables the effective use of historical prediction and planning by applying them to the appropriate parts of the end-to-end system based on the time steps, which improves both perception and motion planning. Specifically, historical queries for the current frame are combined with perception, while queries for future frames are integrated with motion planning. In this way, we bridge the gap between past and future by aggregating historical insights at every time step, enhancing the overall coherence and accuracy of the end-to-end autonomous driving pipeline. Extensive experiments on the nuScenes dataset in both open-loop and closed-loop settings demonstrate that BridgeAD achieves state-of-the-art performance.

What elements should we focus when designing immersive virtual nature? A preliminary user study

Authors:Lin Ma, Qiyuan An, Jing Chen, Xinggang Hou, Yuan Feng, Dengkai Chen
Date:2025-03-18 11:39:31

Extensive research has confirmed the positive relationship between exposure to natural environments and human cognitive, behavioral, physical, and mental health. However, only some have easy access to nature. With electronic information and simulation technology advancements, digital nature experiences are widely used across various devices and scenarios. It is essential to explore how to effectively select and utilize natural elements to guide the design of digital nature scenes. This paper examines critical elements in immersive virtual nature (IVN) and their impact on user perception. Through online surveys and design experiments, we identified specific natural elements that promote relaxation and proposed design strategies for virtual environments. We developed several immersive virtual nature scenes for further validation. Finally, we outline our future experimental plans and research directions in digital nature. Our research aims to provide HCI designers insights into creating restorative, immersive virtual scenes.

GPU-Accelerated Motion Planning of an Underactuated Forestry Crane in Cluttered Environments

Authors:Minh Nhat Vu, Gerald Ebmer, Alexander Watcher, Marc-Philip Ecker, Giang Nguyen, Tobias Glueck
Date:2025-03-18 11:31:20

Autonomous large-scale machine operations require fast, efficient, and collision-free motion planning while addressing unique challenges such as hydraulic actuation limits and underactuated joint dynamics. This paper presents a novel two-step motion planning framework designed for an underactuated forestry crane. The first step employs GPU-accelerated stochastic optimization to rapidly compute a globally shortest collision-free path. The second step refines this path into a dynamically feasible trajectory using a trajectory optimizer that ensures compliance with system dynamics and actuation constraints. The proposed approach is benchmarked against conventional techniques, including RRT-based methods and purely optimization-based approaches. Simulation results demonstrate substantial improvements in computation speed and motion feasibility, making this method highly suitable for complex crane systems.

WebNav: An Intelligent Agent for Voice-Controlled Web Navigation

Authors:Trisanth Srinivasan, Santosh Patapati
Date:2025-03-18 02:33:27

The increasing reliance on web interfaces presents many challenges for visually impaired users, showcasing the need for more advanced assistive technologies. This paper introduces WebNav, a voice-controlled web navigation agent that leverages a ReAct-inspired architecture and generative AI to provide this framework. WebNav comprises of a hierarchical structure: a Digital Navigation Module (DIGNAV) for high-level strategic planning, an Assistant Module for translating abstract commands into executable actions, and an Inference Module for low-level interaction. A key component is a dynamic labeling engine, implemented as a browser extension, that generates real-time labels for interactive elements, creating mapping between voice commands and Document Object Model (DOM) components. Preliminary evaluations show that WebNav outperforms traditional screen readers in response time and task completion accuracy for the visually impaired. Future work will focus on extensive user evaluations, benchmark development, and refining the agent's adaptive capabilities for real-world deployment.

Counterfactual experience augmented off-policy reinforcement learning

Authors:Sunbowen Lee, Yicheng Gong, Chao Deng
Date:2025-03-18 02:32:50

Reinforcement learning control algorithms face significant challenges due to out-of-distribution and inefficient exploration problems. While model-based reinforcement learning enhances the agent's reasoning and planning capabilities by constructing virtual environments, training such virtual environments can be very complex. In order to build an efficient inference model and enhance the representativeness of learning data, we propose the Counterfactual Experience Augmentation (CEA) algorithm. CEA leverages variational autoencoders to model the dynamic patterns of state transitions and introduces randomness to model non-stationarity. This approach focuses on expanding the learning data in the experience pool through counterfactual inference and performs exceptionally well in environments that follow the bisimulation assumption. Environments with bisimulation properties are usually represented by discrete observation and action spaces, we propose a sampling method based on maximum kernel density estimation entropy to extend CEA to various environments. By providing reward signals for counterfactual state transitions based on real information, CEA constructs a complete counterfactual experience to alleviate the out-of-distribution problem of the learning data, and outperforms general SOTA algorithms in environments with difference properties. Finally, we discuss the similarities, differences and properties of generated counterfactual experiences and real experiences. The code is available at https://github.com/Aegis1863/CEA.