planning - 2025-05-05

How Effective are Large Time Series Models in Hydrology? A Study on Water Level Forecasting in Everglades

Authors:Rahuul Rangaraj, Jimeng Shi, Azam Shirali, Rajendra Paudel, Yanzhao Wu, Giri Narasimhan
Date:2025-05-02 17:48:20

The Everglades play a crucial role in flood and drought regulation, water resource planning, and ecosystem management in the surrounding regions. However, traditional physics-based and statistical methods for predicting water levels often face significant challenges, including high computational costs and limited adaptability to diverse or unforeseen conditions. Recent advancements in large time series models have demonstrated the potential to address these limitations, with state-of-the-art deep learning and foundation models achieving remarkable success in time series forecasting across various domains. Despite this progress, their application to critical environmental systems, such as the Everglades, remains underexplored. In this study, we fill the gap by investigating twelve task-specific models and five time series foundation models across six categories for a real-world application focused on water level prediction in the Everglades. Our primary results show that the foundation model, Chronos, significantly outperforms all other models while the remaining foundation models exhibit relatively poor performance. Moreover, the performance of task-specific models varies with the model architectures. Lastly, we discuss the possible reasons for the varying performance of models.

Non-Standard Neutrino Interactions at Neutrino Experiments and Colliders

Authors:Ayres Freitas, Matthew Low
Date:2025-05-02 17:24:12

The impact of new physics on the interactions of neutrinos with other particles can be parametrized by a set of effective four-fermion operators called non-standard neutrino interactions (NSIs). This NSI framework is useful for studying the complementarity between different types of neutrino experiments. In this work, we further compare the reach of neutrino experiments with high-energy collider experiments. Since high-energy colliders often probe the mass scale associated with the four-fermion operators, the effective field theory approach becomes invalid and explicit models must be utilized. We study a variety of representative simplified models including new U(1) gauge bosons, scalar leptoquarks, and heavy neutral leptons. For each of these, we examine the model parameter space constrained by NSI bounds from current and future neutrino experiments, and by data from the Large Hadron Collider and planned electron-positron and muon colliders. We find that in the models we study, with the possible exceptions of muon-philic leptoquarks and heavy neutral leptons mixing with electron or muon neutrinos, collider searches are more constraining than neutrino measurements. Additionally, we briefly comment on other model building possibilities for obtaining models where neutrino experiments are most constraining.

Dynamic Robot Tool Use with Vision Language Models

Authors:Noah Trupin, Zixing Wang, Ahmed H. Qureshi
Date:2025-05-02 17:20:46

Tool use enhances a robot's task capabilities. Recent advances in vision-language models (VLMs) have equipped robots with sophisticated cognitive capabilities for tool-use applications. However, existing methodologies focus on elementary quasi-static tool manipulations or high-level tool selection while neglecting the critical aspect of task-appropriate tool grasping. To address this limitation, we introduce inverse Tool-Use Planning (iTUP), a novel VLM-driven framework that enables grounded fine-grained planning for versatile robotic tool use. Through an integrated pipeline of VLM-based tool and contact point grounding, position-velocity trajectory planning, and physics-informed grasp generation and selection, iTUP demonstrates versatility across (1) quasi-static and more challenging (2) dynamic and (3) cluster tool-use tasks. To ensure robust planning, our framework integrates stable and safe task-aware grasping by reasoning over semantic affordances and physical constraints. We evaluate iTUP and baselines on a comprehensive range of realistic tool use tasks including precision hammering, object scooping, and cluster sweeping. Experimental results demonstrate that iTUP ensures a thorough grounding of cognition and planning for challenging robot tool use across diverse environments.

An Efficient Real-Time Planning Method for Swarm Robotics Based on an Optimal Virtual Tube

Authors:Pengda Mao, Shuli Lv, Chen Min, Zhaolong Shen, Quan Quan
Date:2025-05-02 16:41:38

Swarm robotics navigating through unknown obstacle environments is an emerging research area that faces challenges. Performing tasks in such environments requires swarms to achieve autonomous localization, perception, decision-making, control, and planning. The limited computational resources of onboard platforms present significant challenges for planning and control. Reactive planners offer low computational demands and high re-planning frequencies but lack predictive capabilities, often resulting in local minima. Long-horizon planners, on the other hand, can perform multi-step predictions to reduce deadlocks but cost much computation, leading to lower re-planning frequencies. This paper proposes a real-time optimal virtual tube planning method for swarm robotics in unknown environments, which generates approximate solutions for optimal trajectories through affine functions. As a result, the computational complexity of approximate solutions is $O(n_t)$, where $n_t$ is the number of parameters in the trajectory, thereby significantly reducing the overall computational burden. By integrating reactive methods, the proposed method enables low-computation, safe swarm motion in unknown environments. The effectiveness of the proposed method is validated through several simulations and experiments.

Power System Transition Planning: An Industry-Aligned Framework for Long-Term Optimization

Authors:Ahmed Al-Shafei, Nima Amjady, Hamidreza Zareipour, Yankai Cao
Date:2025-05-02 15:05:27

This work introduces the category of Power System Transition Planning optimization problem. It aims to shift power systems to emissions-free networks efficiently. Unlike comparable work, the framework presented here broadly applies to the industry's decision-making process. It defines a field-appropriate functional boundary focused on the economic efficiency of power systems. Namely, while imposing a wide range of planning factors in the decision space, the model maintains the structure and depth of conventional power system planning under uncertainty, which leads to a large-scale multistage stochastic programming formulation that encounters intractability in real-life cases. Thus, the framework simultaneously invokes high-performance computing defaultism. In this comprehensive exposition, we present a guideline model, comparing its scope to existing formulations, supported by a fully detailed example problem, showcasing the analytical value of the solution gained in a small test case. Then, the framework's viability for realistic applications is demonstrated by solving an extensive test case based on a realistic planning construct consistent with Alberta's power system practices for long-term planning studies. The framework resorts to Stochastic Dual Dynamic Programming as a decomposition method to achieve tractability, leveraging High-Performance Computing and parallel computation.

2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables

Authors:Yajuan Zhang, Jiahai Jiang, Yule Yan, Liang Yang, Ping Zhang
Date:2025-05-02 14:00:48

Accurate wind power forecasting can help formulate scientific dispatch plans, which is of great significance for maintaining the safety, stability, and efficient operation of the power system. In recent years, wind power forecasting methods based on deep learning have focused on extracting the spatiotemporal correlations among data, achieving significant improvements in forecasting accuracy. However, they exhibit two limitations. First, there is a lack of modeling for the inter-variable relationships, which limits the accuracy of the forecasts. Second, by treating endogenous and exogenous variables equally, it leads to unnecessary interactions between the endogenous and exogenous variables, increasing the complexity of the model. In this paper, we propose the 2DXformer, which, building upon the previous work's focus on spatiotemporal correlations, addresses the aforementioned two limitations. Specifically, we classify the inputs of the model into three types: exogenous static variables, exogenous dynamic variables, and endogenous variables. First, we embed these variables as variable tokens in a channel-independent manner. Then, we use the attention mechanism to capture the correlations among exogenous variables. Finally, we employ a multi-layer perceptron with residual connections to model the impact of exogenous variables on endogenous variables. Experimental results on two real-world large-scale datasets indicate that our proposed 2DXformer can further improve the performance of wind power forecasting. The code is available in this repository: \href{https://github.com/jseaj/2DXformer}{https://github.com/jseaj/2DXformer}.

Essential Workers at Risk: An Agent-Based Model (SAFE-ABM) with Bayesian Uncertainty Quantification

Authors:Elizabeth B. Amona, Indranil Sahoo, Ya Su, Edward L. Boone, Gwendoline Nelis, Ryad Ghanam
Date:2025-05-02 13:09:29

Essential workers face elevated infection risks due to their critical roles during pandemics, and protecting them remains a significant challenge for public health planning. This study develops SAFE-ABM, a simulation-based framework using Agent-Based Modeling (ABM), to evaluate targeted intervention strategies, explicitly capturing structured interactions across families, workplaces, and schools. We simulate key scenarios such as unrestricted movement, school closures, mobility restrictions specific to essential workers, and workforce rotation, to assess their impact on disease transmission dynamics. To ensure robust uncertainty assessment, we integrate a novel Bayesian Uncertainty Quantification (UQ) framework, systematically capturing variability in transmission rates, recovery times, and mortality estimates. Our comparative analysis demonstrates that while general mobility restrictions reduce overall transmission, a workforce rotation strategy for essential workers, when combined with quarantine enforcement, most effectively limits workplace outbreaks and secondary family infections. Unlike other interventions, this approach preserves a portion of the susceptible population, resulting in a more controlled and sustainable epidemic trajectory. These findings offer critical insights for optimizing intervention strategies that mitigate disease spread while maintaining essential societal functions.

mwBTFreddy: A Dataset for Flash Flood Damage Assessment in Urban Malawi

Authors:Evelyn Chapuma, Grey Mengezi, Lewis Msasa, Amelia Taylor
Date:2025-05-02 13:06:19

This paper describes the mwBTFreddy dataset, a resource developed to support flash flood damage assessment in urban Malawi, specifically focusing on the impacts of Cyclone Freddy in 2023. The dataset comprises paired pre- and post-disaster satellite images sourced from Google Earth Pro, accompanied by JSON files containing labelled building annotations with geographic coordinates and damage levels (no damage, minor, major, or destroyed). Developed by the Kuyesera AI Lab at the Malawi University of Business and Applied Sciences, this dataset is intended to facilitate the development of machine learning models tailored to building detection and damage classification in African urban contexts. It also supports flood damage visualisation and spatial analysis to inform decisions on relocation, infrastructure planning, and emergency response in climate-vulnerable regions.

Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging

Authors:Elena Mulero Ayllón, Massimiliano Mantegna, Linlin Shen, Paolo Soda, Valerio Guarrasi, Matteo Tortora
Date:2025-05-02 13:04:01

Accurate lung tumor segmentation is crucial for improving diagnosis, treatment planning, and patient outcomes in oncology. However, the complexity of tumor morphology, size, and location poses significant challenges for automated segmentation. This study presents a comprehensive benchmarking analysis of deep learning-based segmentation models, comparing traditional architectures such as U-Net and DeepLabV3, self-configuring models like nnUNet, and foundation models like MedSAM, and MedSAM~2. Evaluating performance across two lung tumor segmentation datasets, we assess segmentation accuracy and computational efficiency under various learning paradigms, including few-shot learning and fine-tuning. The results reveal that while traditional models struggle with tumor delineation, foundation models, particularly MedSAM~2, outperform them in both accuracy and computational efficiency. These findings underscore the potential of foundation models for lung tumor segmentation, highlighting their applicability in improving clinical workflows and patient outcomes.

The CRESST experiment: towards the next-generation of sub-GeV direct dark matter detection

Authors:G. Angloher, S. Banik, A. Bento, A. Bertolini, R. Breier, C. Bucci, J. Burkhart, L. Canonica, E. R. Cipelli, S. Di Lorenzo, J. Dohm, F. Dominsky, L. Einfalt, A. Erb, E. Fascione, F. v. Feilitzsch, S. Fichtinger, D. Fuchs, V. M. Ghete, P. Gorla, P. V. Guillaumon, D. Hauff, M. Jeskovsky, J. Jochum, M. Kaznacheeva, H. Kluck, H. Kraus, B. v. Krosigk, A. Langenkaemper, M. Mancuso, B. Mauri, V. Mokina, C. Moore, P. Murali, M. Olmi, T. Ortmann, C. Pagliarone, L. Pattavina, F. Petricca, W. Potzel, P. Povinec, F. Proebst, F. Pucci, F. Reindl, J. Rothe, K. Schaeffner, J. Schieck, S. Schoenert, C. Schwertner, M. Stahlberg, L. Stodolsky, C. Strandhagen, R. Strauss, I. Usherov, D. Valdenaire, M. Zanirato, V. Zema
Date:2025-05-02 10:52:42

Direct detection experiments have established the most stringent constraints on potential interactions between particle candidates for relic, thermal dark matter and Standard Model particles. To surpass current exclusion limits a new generation of experiments is being developed. The upcoming upgrade of the CRESST experiment will incorporate $\mathcal{O}$(100) detectors with different masses ranging from $\sim$2g to $\sim$24g, aiming to achieve unprecedented sensitivity to sub-GeV dark matter particles with a focus on spin-independent dark matter-nucleus scattering. This paper presents a comprehensive analysis of the planned upgrade, detailed experimental strategies, anticipated challenges, and projected sensitivities. Approaches to address and mitigate low-energy excess backgrounds $-$ a key limitation in previous and current sub-GeV dark matter searches $-$ are also discussed. In addition, a long-term roadmap for the next decade is outlined, including other potential scientific applications.

Helium Range Viability for Online Range Probing in Mixed Carbon-Helium Beams

Authors:Jennifer J. Hardt, Alexander A. Pryanichnikov, Oliver Jäkel, Joao Seco, Niklas Wahl
Date:2025-05-02 10:14:38

Background: Recently, mixed carbon-helium beams were proposed for range verification in carbon ion therapy: Helium, with three times the range of carbon, serves as an on-line range probe, and is mixed into a therapeutic carbon beam. Purpose: Treatment monitoring is of special interest for lung cancer therapy, however the helium range might not always be sufficient to exit the patient distally. Therefore mixed beam use cases of several patient sites are considered. Methods: An extension to the open-source planning toolkit, matRad allows for calculation and optimization of mixed beam treatment plans. The use of the mixed beam method in 15 patients with lung cancer, as well as in a prostate and liver case, for various potential beam configurations was investigated. Planning strategies to optimize the residual helium range considering the sensitive energy range of the imaging detector were developed. A strategy involves adding helium to energies whose range is sufficient. Another one is to use range shifters to increase the helium energy and thus range. Results: In most patient cases, the residual helium range of at least one spot is too low. All investigated planning strategies can be used to ensure a high enough helium range while still keeping a low helium dose and a satisfactory total mixed carbon-helium beam dose. The use of range shifters allows for the detection of more spots. Conclusion: The mixed beam method shows promising results for online motioning. The use of range shifters ensures a high enough helium range and more detectable spots, allowing for a wider-spread application.

Poster: Machine Learning for Vulnerability Detection as Target Oracle in Automated Fuzz Driver Generation

Authors:Gianpietro Castiglione, Marcello Maugeri, Giampaolo Bella
Date:2025-05-02 09:02:36

In vulnerability detection, machine learning has been used as an effective static analysis technique, although it suffers from a significant rate of false positives. Contextually, in vulnerability discovery, fuzzing has been used as an effective dynamic analysis technique, although it requires manually writing fuzz drivers. Fuzz drivers usually target a limited subset of functions in a library that must be chosen according to certain criteria, e.g., the depth of a function, the number of paths. These criteria are verified by components called target oracles. In this work, we propose an automated fuzz driver generation workflow composed of: (1) identifying a likely vulnerable function by leveraging a machine learning for vulnerability detection model as a target oracle, (2) automatically generating fuzz drivers, (3) fuzzing the target function to find bugs which could confirm the vulnerability inferred by the target oracle. We show our method on an existing vulnerability in libgd, with a plan for large-scale evaluation.

Model Tensor Planning

Authors:An T. Le, Khai Nguyen, Minh Nhat Vu, João Carvalho, Jan Peters
Date:2025-05-02 07:09:38

Sampling-based model predictive control (MPC) offers strong performance in nonlinear and contact-rich robotic tasks, yet often suffers from poor exploration due to locally greedy sampling schemes. We propose \emph{Model Tensor Planning} (MTP), a novel sampling-based MPC framework that introduces high-entropy control trajectory generation through structured tensor sampling. By sampling over randomized multipartite graphs and interpolating control trajectories with B-splines and Akima splines, MTP ensures smooth and globally diverse control candidates. We further propose a simple $\beta$-mixing strategy that blends local exploitative and global exploratory samples within the modified Cross-Entropy Method (CEM) update, balancing control refinement and exploration. Theoretically, we show that MTP achieves asymptotic path coverage and maximum entropy in the control trajectory space in the limit of infinite tensor depth and width. Our implementation is fully vectorized using JAX and compatible with MuJoCo XLA, supporting \emph{Just-in-time} (JIT) compilation and batched rollouts for real-time control with online domain randomization. Through experiments on various challenging robotic tasks, ranging from dexterous in-hand manipulation to humanoid locomotion, we demonstrate that MTP outperforms standard MPC and evolutionary strategy baselines in task success and control robustness. Design and sensitivity ablations confirm the effectiveness of MTP tensor sampling structure, spline interpolation choices, and mixing strategy. Altogether, MTP offers a scalable framework for robust exploration in model-based planning and control.

Quantum Computing in Industrial Environments: Where Do We Stand and Where Are We Headed?

Authors:Eneko Osaba, Iñigo Perez Delgado, Alejandro Mata Ali, Pablo Miranda-Rodriguez, Aitor Moreno Fdez de Leceta, Luka Carmona Rivas
Date:2025-05-01 22:13:12

This article explores the current state and future prospects of quantum computing in industrial environments. Firstly, it describes three main paradigms in this field of knowledge: gate-based quantum computers, quantum annealers, and tensor networks. The article also examines specific industrial applications, such as bin packing, job shop scheduling, and route planning for robots and vehicles. These applications demonstrate the potential of quantum computing to solve complex problems in the industry. The article concludes by presenting a vision of the directions the field will take in the coming years, also discussing the current limitations of quantum technology. Despite these limitations, quantum computing is emerging as a powerful tool to address industrial challenges in the future.

IberFire -- a detailed creation of a spatio-temporal dataset for wildfire risk assessment in Spain

Authors:Julen Ercibengoa, Meritxell Gómez-Omella, Izaro Goienetxea
Date:2025-05-01 19:54:17

Wildfires pose a critical environmental issue to ecosystems, economies, and public safety, particularly in Mediterranean regions such as Spain. Accurate predictive models rely on high-resolution spatio-temporal data to capture the complex interplay of environmental and anthropogenic factors. To address the lack of localised and fine-grained datasets in Spain, this work introduces IberFire, a spatio-temporal datacube at 1 km x 1 km x 1-day resolution covering mainland Spain and the Balearic Islands from December 2007 to December 2024. IberFire integrates 260 features across eight main categories: auxiliary features, fire history, geography, topography, meteorology, vegetation indices, human activity, and land cover. All features are derived from open-access sources, ensuring transparency and real-time applicability. The data processing pipeline was implemented entirely using open-source tools, and the codebase has been made publicly available. This work not only enhances spatio-temporal granularity and feature diversity compared to existing European datacubes but also provides a reproducible methodology for constructing similar datasets. IberFire supports advanced wildfire risk modelling through Machine Learning (ML) and Deep Learning (DL) techniques, enables climate pattern analysis and informs strategic planning in fire prevention and land management. The dataset is publicly available on Zenodo to promote open research and collaboration.

A Novel Dynamic Bias-Correction Framework for Hurricane Risk Assessment under Climate Change

Authors:Reda Snaiki, Teng Wu
Date:2025-05-01 19:46:45

Conventional hurricane track generation methods typically depend on biased outputs from Global Climate Models (GCMs), which undermines their accuracy in the context of climate change. We present a novel dynamic bias correction framework that adaptively corrects biases in GCM outputs. Our approach employs machine learning to predict evolving GCM biases, allowing dynamic corrections that account for changing climate conditions. By combining dimensionality reduction with data-driven surrogate modeling, we capture the system's underlying dynamics to produce realistic spatial distributions of environmental parameters under future scenarios. Using the empirical Weibull plotting approach, we calculate return periods for wind speed and rainfall across coastal cities. Our results reveal significant differences in projected risks with and without dynamic bias correction, emphasizing the increased threat to critical infrastructure in hurricane-prone regions. This work highlights the necessity of adaptive techniques for accurately assessing future climate impacts, offering a critical advancement in hurricane risk modeling and resilience planning.

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Authors:Dongzhi Jiang, Ziyu Guo, Renrui Zhang, Zhuofan Zong, Hao Li, Le Zhuo, Shilin Yan, Pheng-Ann Heng, Hongsheng Li
Date:2025-05-01 17:59:46

Recent advancements in large language models have demonstrated how chain-of-thought (CoT) and reinforcement learning (RL) can improve performance. However, applying such reasoning strategies to the visual generation domain remains largely unexplored. In this paper, we present T2I-R1, a novel reasoning-enhanced text-to-image generation model, powered by RL with a bi-level CoT reasoning process. Specifically, we identify two levels of CoT that can be utilized to enhance different stages of generation: (1) the semantic-level CoT for high-level planning of the prompt and (2) the token-level CoT for low-level pixel processing during patch-by-patch generation. To better coordinate these two levels of CoT, we introduce BiCoT-GRPO with an ensemble of generation rewards, which seamlessly optimizes both generation CoTs within the same training step. By applying our reasoning strategies to the baseline model, Janus-Pro, we achieve superior performance with 13% improvement on T2I-CompBench and 19% improvement on the WISE benchmark, even surpassing the state-of-the-art model FLUX.1. Code is available at: https://github.com/CaraJ7/T2I-R1

GeoDEx: A Unified Geometric Framework for Tactile Dexterous and Extrinsic Manipulation under Force Uncertainty

Authors:Sirui Chen, Sergio Aguilera Marinovic, Soshi Iba, Rana Soltani Zarrin
Date:2025-05-01 16:40:47

Sense of touch that allows robots to detect contact and measure interaction forces enables them to perform challenging tasks such as grasping fragile objects or using tools. Tactile sensors in theory can equip the robots with such capabilities. However, accuracy of the measured forces is not on a par with those of the force sensors due to the potential calibration challenges and noise. This has limited the values these sensors can offer in manipulation applications that require force control. In this paper, we introduce GeoDEx, a unified estimation, planning, and control framework using geometric primitives such as plane, cone and ellipsoid, which enables dexterous as well as extrinsic manipulation in the presence of uncertain force readings. Through various experimental results, we show that while relying on direct inaccurate and noisy force readings from tactile sensors results in unstable or failed manipulation, our method enables successful grasping and extrinsic manipulation of different objects. Additionally, compared to directly running optimization using SOCP (Second Order Cone Programming), planning and force estimation using our framework achieves a 14x speed-up.

A Finite-State Controller Based Offline Solver for Deterministic POMDPs

Authors:Alex Schutz, Yang You, Matias Mattamala, Ipek Caliskanelli, Bruno Lacerda, Nick Hawes
Date:2025-05-01 15:30:26

Deterministic partially observable Markov decision processes (DetPOMDPs) often arise in planning problems where the agent is uncertain about its environmental state but can act and observe deterministically. In this paper, we propose DetMCVI, an adaptation of the Monte Carlo Value Iteration (MCVI) algorithm for DetPOMDPs, which builds policies in the form of finite-state controllers (FSCs). DetMCVI solves large problems with a high success rate, outperforming existing baselines for DetPOMDPs. We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario.

ParkDiffusion: Heterogeneous Multi-Agent Multi-Modal Trajectory Prediction for Automated Parking using Diffusion Models

Authors:Jiarong Wei, Niclas Vödisch, Anna Rehr, Christian Feist, Abhinav Valada
Date:2025-05-01 15:16:59

Automated parking is a critical feature of Advanced Driver Assistance Systems (ADAS), where accurate trajectory prediction is essential to bridge perception and planning modules. Despite its significance, research in this domain remains relatively limited, with most existing studies concentrating on single-modal trajectory prediction of vehicles. In this work, we propose ParkDiffusion, a novel approach that predicts the trajectories of both vehicles and pedestrians in automated parking scenarios. ParkDiffusion employs diffusion models to capture the inherent uncertainty and multi-modality of future trajectories, incorporating several key innovations. First, we propose a dual map encoder that processes soft semantic cues and hard geometric constraints using a two-step cross-attention mechanism. Second, we introduce an adaptive agent type embedding module, which dynamically conditions the prediction process on the distinct characteristics of vehicles and pedestrians. Third, to ensure kinematic feasibility, our model outputs control signals that are subsequently used within a kinematic framework to generate physically feasible trajectories. We evaluate ParkDiffusion on the Dragon Lake Parking (DLP) dataset and the Intersections Drone (inD) dataset. Our work establishes a new baseline for heterogeneous trajectory prediction in parking scenarios, outperforming existing methods by a considerable margin.

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

Authors:Lucas Robinet, Ahmad Berjaoui, Elizabeth Cohen-Jonathan Moyal
Date:2025-05-01 14:51:30

Multimodal magnetic resonance imaging (MRI) constitutes the first line of investigation for clinicians in the care of brain tumors, providing crucial insights for surgery planning, treatment monitoring, and biomarker identification. Pre-training on large datasets have been shown to help models learn transferable representations and adapt with minimal labeled data. This behavior is especially valuable in medical imaging, where annotations are often scarce. However, applying this paradigm to multimodal medical data introduces a challenge: most existing approaches assume that all imaging modalities are available during both pre-training and fine-tuning. In practice, missing modalities often occur due to acquisition issues, specialist unavailability, or specific experimental designs on small in-house datasets. Consequently, a common approach involves training a separate model for each desired modality combination, making the process both resource-intensive and impractical for clinical use. Therefore, we introduce BM-MAE, a masked image modeling pre-training strategy tailored for multimodal MRI data. The same pre-trained model seamlessly adapts to any combination of available modalities, extracting rich representations that capture both intra- and inter-modal information. This allows fine-tuning on any subset of modalities without requiring architectural changes, while still benefiting from a model pre-trained on the full set of modalities. Extensive experiments show that the proposed pre-training strategy outperforms or remains competitive with baselines that require separate pre-training for each modality subset, while substantially surpassing training from scratch on several downstream tasks. Additionally, it can quickly and efficiently reconstruct missing modalities, highlighting its practical value. Code and trained models are available at: https://github.com/Lucas-rbnt/BM-MAE

TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching

Authors:Yue Meng, Chuchu Fan
Date:2025-05-01 14:40:07

Learning to solve complex tasks with signal temporal logic (STL) specifications is crucial to many real-world applications. However, most previous works only consider fixed or parametrized STL specifications due to the lack of a diverse STL dataset and encoders to effectively extract temporal logic information for downstream tasks. In this paper, we propose TeLoGraF, Temporal Logic Graph-encoded Flow, which utilizes Graph Neural Networks (GNN) encoder and flow-matching to learn solutions for general STL specifications. We identify four commonly used STL templates and collect a total of 200K specifications with paired demonstrations. We conduct extensive experiments in five simulation environments ranging from simple dynamical models in the 2D space to high-dimensional 7DoF Franka Panda robot arm and Ant quadruped navigation. Results show that our method outperforms other baselines in the STL satisfaction rate. Compared to classical STL planning algorithms, our approach is 10-100X faster in inference and can work on any system dynamics. Besides, we show our graph-encoding method's capability to solve complex STLs and robustness to out-distribution STL specifications. Code is available at https://github.com/mengyuest/TeLoGraF

InterLoc: LiDAR-based Intersection Localization using Road Segmentation with Automated Evaluation Method

Authors:Nguyen Hoang Khoi Tran, Julie Stephany Berrio, Mao Shan, Zhenxing Ming, Stewart Worrall
Date:2025-05-01 13:30:28

Online localization of road intersections is beneficial for autonomous vehicle localization, mapping and motion planning. Intersections offer strong landmarks to correct vehicle pose estimation in GNSS dropouts and anchor new sensor data in up-to-date maps. They are also decisive routing nodes in road network graphs. Despite that importance, intersection localization has not been widely studied, with existing methods either ignore the rich semantic information already computed onboard or depend on scarce, hand-labeled intersection datasets. To close that gap, this paper presents a LiDAR-based method for online vehicle-centric intersection localization. We fuse semantic road segmentation with vehicle local pose to detect intersection candidates in a bird's eye view (BEV) representation. We then refine those candidates by analyzing branch topology and correcting the intersection point in a least squares formulation. To evaluate our method, we introduce an automated benchmarking pipeline that pairs localized intersection points with OpenStreetMap (OSM) intersection nodes using precise GNSS/INS ground-truth poses. Experiments on SemanticKITTI show that the method outperforms the latest learning-based baseline in accuracy and reliability. Moreover, sensitivity tests demonstrate that our method is robust to challenging segmentation error levels, highlighting its applicability in the real world.

Optimal Interactive Learning on the Job via Facility Location Planning

Authors:Shivam Vats, Michelle Zhao, Patrick Callaghan, Mingxi Jia, Maxim Likhachev, Oliver Kroemer, George Konidaris
Date:2025-05-01 12:45:09

Collaborative robots must continually adapt to novel tasks and user preferences without overburdening the user. While prior interactive robot learning methods aim to reduce human effort, they are typically limited to single-task scenarios and are not well-suited for sustained, multi-task collaboration. We propose COIL (Cost-Optimal Interactive Learning) -- a multi-task interaction planner that minimizes human effort across a sequence of tasks by strategically selecting among three query types (skill, preference, and help). When user preferences are known, we formulate COIL as an uncapacitated facility location (UFL) problem, which enables bounded-suboptimal planning in polynomial time using off-the-shelf approximation algorithms. We extend our formulation to handle uncertainty in user preferences by incorporating one-step belief space planning, which uses these approximation algorithms as subroutines to maintain polynomial-time performance. Simulated and physical experiments on manipulation tasks show that our framework significantly reduces the amount of work allocated to the human while maintaining successful task completion.

ScaleTrack: Scaling and back-tracking Automated GUI Agents

Authors:Jing Huang, Zhixiong Zeng, Wenkang Han, Yufeng Zhong, Liming Zheng, Shuai Fu, Jingyuan Chen, Lin Ma
Date:2025-05-01 09:27:13

Automated GUI agents aims to facilitate user interaction by automatically performing complex tasks in digital environments, such as web, mobile, desktop devices. It receives textual task instruction and GUI description to generate executable actions (\emph{e.g.}, click) and operation boxes step by step. Training a GUI agent mainly involves grounding and planning stages, in which the GUI grounding focuses on finding the execution coordinates according to the task, while the planning stage aims to predict the next action based on historical actions. However, previous work suffers from the limitations of insufficient training data for GUI grounding, as well as the ignorance of backtracking historical behaviors for GUI planning. To handle the above challenges, we propose ScaleTrack, a training framework by scaling grounding and backtracking planning for automated GUI agents. We carefully collected GUI samples of different synthesis criterions from a wide range of sources, and unified them into the same template for training GUI grounding models. Moreover, we design a novel training strategy that predicts the next action from the current GUI image, while also backtracking the historical actions that led to the GUI image. In this way, ScaleTrack explains the correspondence between GUI images and actions, which effectively describes the evolution rules of the GUI environment. Extensive experimental results demonstrate the effectiveness of ScaleTrack. Data and code will be available at url.

Vintage-Based Formulations in Multi-Year Investment Modelling for Energy Systems

Authors:Ni Wang, Germán Morales-España
Date:2025-05-01 08:14:19

This paper reviews two established formulations for modelling multi-year energy investments: the simple method, which aggregates all capacity regardless of commissioning year, and the vintage method, which explicitly tracks investments by year to capture differences in technical parameters over time. While the vintage method improves modelling fidelity, it significantly increases model size. To address this, we propose a novel compact formulation that maintains the ability to represent year-specific characteristics while reducing the dimensionality of the model. The proposed compact formulation is implemented in the open-source model TulipaEnergyModel.jl and offers a tractable alternative for detailed long-term energy system planning.

Automated segmenta-on of pediatric neuroblastoma on multi-modal MRI: Results of the SPPIN challenge at MICCAI 2023

Authors:M. A. D. Buser, D. C. Simons, M. Fitski, M. H. W. A. Wijnen, A. S. Littooij, A. H. ter Brugge, I. N. Vos, M. H. A. Janse, M. de Boer, R. ter Maat, J. Sato, S. Kido, S. Kondo, S. Kasai, M. Wodzinski, H. Muller, J. Ye, J. He, Y. Kirchhoff, M. R. Rokkus, G. Haokai, S. Zitong, M. Fernández-Patón, D. Veiga-Canuto, D. G. Ellis, M. R. Aizenberg, B. H. M. van der Velden, H. Kuijf, A. De Luca, A. F. W. van der Steeg
Date:2025-05-01 07:46:03

Surgery plays an important role within the treatment for neuroblastoma, a common pediatric cancer. This requires careful planning, often via magnetic resonance imaging (MRI)-based anatomical 3D models. However, creating these models is often time-consuming and user dependent. We organized the Surgical Planning in Pediatric Neuroblastoma (SPPIN) challenge, to stimulate developments on this topic, and set a benchmark for fully automatic segmentation of neuroblastoma on multi-model MRI. The challenge started with a training phase, where teams received 78 sets of MRI scans from 34 patients, consisting of both diagnostic and post-chemotherapy MRI scans. The final test phase, consisting of 18 MRI sets from 9 patients, determined the ranking of the teams. Ranking was based on the Dice similarity coefficient (Dice score), the 95th percentile of the Hausdorff distance (HD95) and the volumetric similarity (VS). The SPPIN challenge was hosted at MICCAI 2023. The final leaderboard consisted of 9 teams. The highest-ranking team achieved a median Dice score 0.82, a median HD95 of 7.69 mm and a VS of 0.91, utilizing a large, pretrained network called STU-Net. A significant difference for the segmentation results between diagnostic and post-chemotherapy MRI scans was observed (Dice = 0.89 vs Dice = 0.59, P = 0.01) for the highest-ranking team. SPPIN is the first medical segmentation challenge in extracranial pediatric oncology. The highest-ranking team used a large pre-trained network, suggesting that pretraining can be of use in small, heterogenous datasets. Although the results of the highest-ranking team were high for most patients, segmentation especially in small, pre-treated tumors were insufficient. Therefore, more reliable segmentation methods are needed to create clinically applicable models to aid surgical planning in pediatric neuroblastoma.

Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction

Authors:Ze Zhang, Georg Hess, Junjie Hu, Emmanuel Dean, Lennart Svensson, Knut Åkesson
Date:2025-05-01 01:13:56

This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods.

PSN Game: Game-theoretic Planning via a Player Selection Network

Authors:Tianyu Qiu, Eric Ouano, Fernando Palafox, Christian Ellis, David Fridovich-Keil
Date:2025-04-30 23:14:32

While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving optimization problems with hundreds or thousands of variables, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose PSN Game: a novel game-theoretic planning framework that reduces runtime by learning a Player Selection Network (PSN). A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of variables in the optimization problem, PSN directly lowers computation time. The PSN Game framework is more flexible than existing player selection methods as it i) relies solely on observations of players' past trajectories, without requiring full state, control, or other game-specific information; and ii) requires no online parameter tuning. We train PSNs in an unsupervised manner using a differentiable dynamic game solver, with reference trajectories from full-player games guiding the learning. Experiments in both simulated scenarios and human trajectory datasets demonstrate that i) PSNs outperform baseline selection methods in trajectory smoothness and length, while maintaining comparable safety and achieving a 10x speedup in runtime; and ii) PSNs generalize effectively to real-world scenarios without fine-tuning. By selecting only the most relevant players for decision-making, PSNs offer a general mechanism for reducing planning complexity that can be seamlessly integrated into existing multi-agent planning frameworks.

First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images

Authors:Isabelle Bloch, Enzo Bonnot, Pietro Gori, Giammarco La Barbera, Sabine Sarnacki
Date:2025-04-30 20:41:04

This article deals with the description and recognition of fiber bundles, in particular nerves, in medical images, based on the anatomical description of the fiber trajectories. To this end, we propose a logical formalization of this anatomical knowledge. The intrinsically imprecise description of nerves, as found in anatomical textbooks, leads us to propose fuzzy semantics combined with first-order logic. We define a language representing spatial entities, relations between these entities and quantifiers. A formula in this language is then a formalization of the natural language description. The semantics are given by fuzzy representations in a concrete domain and satisfaction degrees of relations. Based on this formalization, a spatial reasoning algorithm is proposed for segmentation and recognition of nerves from anatomical and diffusion magnetic resonance images, which is illustrated on pelvic nerves in pediatric imaging, enabling surgeons to plan surgery.