planning - 2025-07-09

IceCat-2: Updated IceCube Event Catalog of Alert Tracks

Authors:Angela Zegarelli, Anna Franckowiak, Giacomo Sommani, Nora Valtonen-Mattila, Tianlu Yuan

Date:2025-07-08 16:55:02

We present preliminary results for IceCat-2, the second public catalog of IceCube Alert Tracks, which plans to build and improve upon the first release, IceCat-1. The initial catalog, last updated in October 2023, included all real-time alerts issued since 2016, as well as events observed by IceCube since the start of full-detector data collection in 2011 that would have triggered an alert if the program had been in place at that time. IceCat-2 plans to expand on this by incorporating all additional alerts since IceCat-1, and reprocessing all events with significantly improved reconstruction algorithms. A key advancement in IceCat-2 will come from an updated reconstruction technique introduced by the IceCube Collaboration in September 2024. This approach substantially enhances the angular resolution of muon track alerts, while also improving statistical coverage. With respect to IceCat-1, the 50%(90%) angular uncertainty on track alerts is expected to be reduced by a factor of approximately 5(4). These refined reconstructions will allow us to revisit possible correlations between past alerts and sources in gamma-ray and X-ray catalogs. The enhanced precision may uncover new astrophysical associations with known astrophysical sources, offering deeper insight into potential cosmic ray accelerators.

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Authors:Yunhan Yang, Yufan Zhou, Yuan-Chen Guo, Zi-Xin Zou, Yukun Huang, Ying-Tian Liu, Hao Xu, Ding Liang, Yan-Pei Cao, Xihui Liu

Date:2025-07-08 16:46:15

The creation of 3D assets with explicit, editable part structures is crucial for advancing interactive applications, yet most generative methods produce only monolithic shapes, limiting their utility. We introduce OmniPart, a novel framework for part-aware 3D object generation designed to achieve high semantic decoupling among components while maintaining robust structural cohesion. OmniPart uniquely decouples this complex task into two synergistic stages: (1) an autoregressive structure planning module generates a controllable, variable-length sequence of 3D part bounding boxes, critically guided by flexible 2D part masks that allow for intuitive control over part decomposition without requiring direct correspondences or semantic labels; and (2) a spatially-conditioned rectified flow model, efficiently adapted from a pre-trained holistic 3D generator, synthesizes all 3D parts simultaneously and consistently within the planned layout. Our approach supports user-defined part granularity, precise localization, and enables diverse downstream applications. Extensive experiments demonstrate that OmniPart achieves state-of-the-art performance, paving the way for more interpretable, editable, and versatile 3D content.

Learning-Augmented Model-Based Multi-Robot Planning for Time-Critical Search and Inspection Under Uncertainty

Authors:Abhish Khanal, Joseph Prince Mathew, Cameron Nowzari, Gregory J. Stein

Date:2025-07-08 16:15:22

In disaster response or surveillance operations, quickly identifying areas needing urgent attention is critical, but deploying response teams to every location is inefficient or often impossible. Effective performance in this domain requires coordinating a multi-robot inspection team to prioritize inspecting locations more likely to need immediate response, while also minimizing travel time. This is particularly challenging because robots must directly observe the locations to determine which ones require additional attention. This work introduces a multi-robot planning framework for coordinated time-critical multi-robot search under uncertainty. Our approach uses a graph neural network to estimate the likelihood of PoIs needing attention from noisy sensor data and then uses those predictions to guide a multi-robot model-based planner to determine the cost-effective plan. Simulated experiments demonstrate that our planner improves performance at least by 16.3\%, 26.7\%, and 26.2\% for 1, 3, and 5 robots, respectively, compared to non-learned and learned baselines. We also validate our approach on real-world platforms using quad-copters.

Combining IceCube Muon Tracks and Cascades to measure the Galactic Diffuse Neutrino Flux

Authors:Jonas Hellrung, Julia Becker Tjus, Wolfgang Rhode

Date:2025-07-08 15:29:49

The diffuse Galactic neutrino flux is produced by cosmic rays interacting with the interstellar medium. The measurement of this flux can help to understand the distribution of cosmic rays in the Galaxy. The first observation of this neutrino flux was published in 2023 by the IceCube Collaboration. Here, plans for a new analysis combining different event topologies are presented. IceCube measures events in two main topologies. Tracks, originating in charged current $\nu_\mu$ interactions, provide a better angular resolution. In contrast, cascades, from most other possible interactions, provide a better energy resolution and are able to observe the Southern sky (and therefore the Galactic Center) despite the huge background of atmospheric muons. Combining both event topologies in one analysis exploits all these advantages. Sensitivities and model discrimination power of a combined measurement using a forward folding binned likelihood fit are discussed here.

AURA-CVC: Autonomous Ultrasound-guided Robotic Assistance for Central Venous Catheterization

Authors:Deepak Raina, Lidia Al-Zogbi, Brian Teixeira, Vivek Singh, Ankur Kapoor, Thorsten Fleiter, Muyinatu A. Lediju Bell, Vinciya Pandian, Axel Krieger

Date:2025-07-08 13:36:56

Purpose: Central venous catheterization (CVC) is a critical medical procedure for vascular access, hemodynamic monitoring, and life-saving interventions. Its success remains challenging due to the need for continuous ultrasound-guided visualization of a target vessel and approaching needle, which is further complicated by anatomical variability and operator dependency. Errors in needle placement can lead to life-threatening complications. While robotic systems offer a potential solution, achieving full autonomy remains challenging. In this work, we propose an end-to-end robotic-ultrasound-guided CVC pipeline, from scan initialization to needle insertion. Methods: We introduce a deep-learning model to identify clinically relevant anatomical landmarks from a depth image of the patient's neck, obtained using RGB-D camera, to autonomously define the scanning region and paths. Then, a robot motion planning framework is proposed to scan, segment, reconstruct, and localize vessels (veins and arteries), followed by the identification of the optimal insertion zone. Finally, a needle guidance module plans the insertion under ultrasound guidance with operator's feedback. This pipeline was validated on a high-fidelity commercial phantom across 10 simulated clinical scenarios. Results: The proposed pipeline achieved 10 out of 10 successful needle placements on the first attempt. Vessels were reconstructed with a mean error of 2.15 \textit{mm}, and autonomous needle insertion was performed with an error less than or close to 1 \textit{mm}. Conclusion: To our knowledge, this is the first robotic CVC system demonstrated on a high-fidelity phantom with integrated planning, scanning, and insertion. Experimental results show its potential for clinical translation.

Low voltage user phase reconfiguration as a planning problem

Authors:Sari Kerckhove, Marta Vanin, Reinhilde D'hulst, Dirk Van Hertem

Date:2025-07-08 11:56:28

Considerable levels of phase imbalance in low voltage (LV) distribution networks imply that grid assets are suboptimally utilized and can cause additional losses, equipment failure and degradation. With the ongoing energy transition, the installation of additional single-phase distributed energy resources may further increase the phase imbalance if no countermeasures are taken. Phase reconfiguration is a cost-effective solution to reduce imbalance. However, dynamic reconfiguration, through real-time phase swapping of loads using remotely controlled switches, is often impractical because these switches are too costly for widespread installation at LV users. Approaching phase reconfiguration as a planning problem, i.e. static reconfiguration, is an underaddressed but promising alternative. Effective static approaches that allow appropriate imbalance objectives are currently lacking. This paper presents reliable and expressive static phase reconfiguration methods that grid operators can easily integrate into routine maintenance for effective phase balancing. We present and compare three static methods, an exact mixed-integer nonlinear formulation (MINLP), a mixed-integer quadratic approximation (MIQP), and a genetic algorithm (GA), each supporting different imbalance objectives. The MIQP approach, despite using proxy objectives, efficiently mitigates the different types of imbalance considered, and outperforms both MINLP and GA in scalability and consistency.

Comparison of Path Planning Algorithms for Autonomous Vehicle Navigation Using Satellite and Airborne LiDAR Data

Authors:Chang Liu, Zhexiong Xue, Tamas Sziranyi

Date:2025-07-08 11:15:21

Autonomous vehicle navigation in unstructured environments, such as forests and mountainous regions, presents significant challenges due to irregular terrain and complex road conditions. This work provides a comparative evaluation of mainstream and well-established path planning algorithms applied to weighted pixel-level road networks derived from high-resolution satellite imagery and airborne LiDAR data. For 2D road-map navigation, where the weights reflect road conditions and terrain difficulty, A*, Dijkstra, RRT*, and a Novel Improved Ant Colony Optimization Algorithm (NIACO) are tested on the DeepGlobe satellite dataset. For 3D road-map path planning, 3D A*, 3D Dijkstra, RRT-Connect, and NIACO are evaluated using the Hamilton airborne LiDAR dataset, which provides detailed elevation information. All algorithms are assessed under identical start and end point conditions, focusing on path cost, computation time, and memory consumption. Results demonstrate that Dijkstra consistently offers the most stable and efficient performance in both 2D and 3D scenarios, particularly when operating on dense, pixel-level geospatial road-maps. These findings highlight the reliability of Dijkstra-based planning for static terrain navigation and establish a foundation for future research on dynamic path planning under complex environmental constraints.

GTA1: GUI Test-time Scaling Agent

Authors:Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, Ran Xu, Liyuan Pan, Caiming Xiong, Junnan Li

Date:2025-07-08 08:52:18

Graphical user interface (GUI) agents autonomously operate across platforms (e.g., Linux) to complete tasks by interacting with visual elements. Specifically, a user instruction is decomposed into a sequence of action proposals, each corresponding to an interaction with the GUI. After each action, the agent observes the updated GUI environment to plan the next step. However, two main challenges arise: i) resolving ambiguity in task planning (i.e., the action proposal sequence), where selecting an appropriate plan is non-trivial, as many valid ones may exist; ii) accurately grounding actions in complex and high-resolution interfaces, i.e., precisely interacting with visual targets. This paper investigates the two aforementioned challenges with our GUI Test-time Scaling Agent, namely GTA1. First, to select the most appropriate action proposal, we introduce a test-time scaling method. At each step, we sample multiple candidate action proposals and leverage a judge model to evaluate and select the most suitable one. It trades off computation for better decision quality by concurrent sampling, shortening task execution steps, and improving overall performance. Second, we propose a model that achieves improved accuracy when grounding the selected action proposal to its corresponding visual elements. Our key insight is that reinforcement learning (RL) facilitates visual grounding through inherent objective alignments, rewarding successful clicks on interface elements. Experimentally, our method establishes state-of-the-art performance across diverse benchmarks. For example, GTA1-7B achieves 50.1%, 92.4%, and 67.7% accuracies on Screenspot-Pro, Screenspot-V2, and OSWorld-G, respectively. When paired with a planner applying our test-time scaling strategy, it exhibits state-of-the-art agentic performance (e.g., 45.2% task success rate on OSWorld). We open-source our code and models here.

Vers un cadre ontologique pour la gestion des comp{é}tences : {à} des fins de formation, de recrutement, de m{é}tier, ou de recherches associ{é}es

Authors:Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge

Date:2025-07-08 08:13:30

The rapid transformation of the labor market, driven by technological advancements and the digital economy, requires continuous competence development and constant adaptation. In this context, traditional competence management systems lack interoperability, adaptability, and semantic understanding, making it difficult to align individual competencies with labor market needs and training programs. This paper proposes an ontology-based framework for competence management, enabling a structured representation of competencies, occupations, and training programs. By leveraging ontological models and semantic reasoning, this framework aims to enhance the automation of competence-to-job matching, the personalization of learning recommendations, and career planning. This study discusses the design, implementation, and potential applications of the framework, focusing on competence training programs, job searching, and finding competent individuals.

PSAT: Pediatric Segmentation Approaches via Adult Augmentations and Transfer Learning

Authors:Tristan Kirscher, Sylvain Faisan, Xavier Coubez, Loris Barrier, Philippe Meyer

Date:2025-07-08 08:07:36

Pediatric medical imaging presents unique challenges due to significant anatomical and developmental differences compared to adults. Direct application of segmentation models trained on adult data often yields suboptimal performance, particularly for small or rapidly evolving structures. To address these challenges, several strategies leveraging the nnU-Net framework have been proposed, differing along four key axes: (i) the fingerprint dataset (adult, pediatric, or a combination thereof) from which the Training Plan -including the network architecture-is derived; (ii) the Learning Set (adult, pediatric, or mixed), (iii) Data Augmentation parameters, and (iv) the Transfer learning method (finetuning versus continual learning). In this work, we introduce PSAT (Pediatric Segmentation Approaches via Adult Augmentations and Transfer learning), a systematic study that investigates the impact of these axes on segmentation performance. We benchmark the derived strategies on two pediatric CT datasets and compare them with state-of-theart methods, including a commercial radiotherapy solution. PSAT highlights key pitfalls and provides actionable insights for improving pediatric segmentation. Our experiments reveal that a training plan based on an adult fingerprint dataset is misaligned with pediatric anatomy-resulting in significant performance degradation, especially when segmenting fine structures-and that continual learning strategies mitigate institutional shifts, thus enhancing generalization across diverse pediatric datasets. The code is available at https://github.com/ICANS-Strasbourg/PSAT.

A Learning-based Planning and Control Framework for Inertia Drift Vehicles

Authors:Bei Zhou, Zhouheng Li, Lei Xie, Hongye Su, Johannes Betz

Date:2025-07-08 07:49:00

Inertia drift is a transitional maneuver between two sustained drift stages in opposite directions, which provides valuable insights for navigating consecutive sharp corners for autonomous racing.However, this can be a challenging scenario for the drift controller to handle rapid transitions between opposing sideslip angles while maintaining accurate path tracking. Moreover, accurate drift control depends on a high-fidelity vehicle model to derive drift equilibrium points and predict vehicle states, but this is often compromised by the strongly coupled longitudinal-lateral drift dynamics and unpredictable environmental variations. To address these challenges, this paper proposes a learning-based planning and control framework utilizing Bayesian optimization (BO), which develops a planning logic to ensure a smooth transition and minimal velocity loss between inertia and sustained drift phases. BO is further employed to learn a performance-driven control policy that mitigates modeling errors for enhanced system performance. Simulation results on an 8-shape reference path demonstrate that the proposed framework can achieve smooth and stable inertia drift through sharp corners.

Hierarchical Task Offloading for UAV-Assisted Vehicular Edge Computing via Deep Reinforcement Learning

Authors:Hongbao Li, Ziye Jia, Sijie He, Kun Guo, Qihui Wu

Date:2025-07-08 07:10:52

With the emergence of compute-intensive and delay-sensitive applications in vehicular networks, unmanned aerial vehicles (UAVs) have emerged as a promising complement for vehicular edge computing due to the high mobility and flexible deployment. However, the existing UAV-assisted offloading strategies are insufficient in coordinating heterogeneous computing resources and adapting to dynamic network conditions. Hence, this paper proposes a dual-layer UAV-assisted edge computing architecture based on partial offloading, composed of the relay capability of high-altitude UAVs and the computing support of low-altitude UAVs. The proposed architecture enables efficient integration and coordination of heterogeneous resources. A joint optimization problem is formulated to minimize the system delay and energy consumption while ensuring the task completion rate. To solve the high-dimensional decision problem, we reformulate the problem as a Markov decision process and propose a hierarchical offloading scheme based on the soft actor-critic algorithm. The method decouples global and local decisions, where the global decisions integrate offloading ratios and trajectory planning into continuous actions, while the local scheduling is handled via designing a priority-based mechanism. Simulations are conducted and demonstrate that the proposed approach outperforms several baselines in task completion rate, system efficiency, and convergence speed, showing strong robustness and applicability in dynamic vehicular environments.

Divergent Realities: A Comparative Analysis of Human Expert vs. Artificial Intelligence Based Generation and Evaluation of Treatment Plans in Dermatology

Authors:Dipayan Sengupta, Saumya Panda

Date:2025-07-08 06:59:58

Background: Evaluating AI-generated treatment plans is a key challenge as AI expands beyond diagnostics, especially with new reasoning models. This study compares plans from human experts and two AI models (a generalist and a reasoner), assessed by both human peers and a superior AI judge. Methods: Ten dermatologists, a generalist AI (GPT-4o), and a reasoning AI (o3) generated treatment plans for five complex dermatology cases. The anonymized, normalized plans were scored in two phases: 1) by the ten human experts, and 2) by a superior AI judge (Gemini 2.5 Pro) using an identical rubric. Results: A profound 'evaluator effect' was observed. Human experts scored peer-generated plans significantly higher than AI plans (mean 7.62 vs. 7.16; p=0.0313), ranking GPT-4o 6th (mean 7.38) and the reasoning model, o3, 11th (mean 6.97). Conversely, the AI judge produced a complete inversion, scoring AI plans significantly higher than human plans (mean 7.75 vs. 6.79; p=0.0313). It ranked o3 1st (mean 8.20) and GPT-4o 2nd, placing all human experts lower. Conclusions: The perceived quality of a clinical plan is fundamentally dependent on the evaluator's nature. An advanced reasoning AI, ranked poorly by human experts, was judged as superior by a sophisticated AI, revealing a deep gap between experience-based clinical heuristics and data-driven algorithmic logic. This paradox presents a critical challenge for AI integration, suggesting the future requires synergistic, explainable human-AI systems that bridge this reasoning gap to augment clinical care.

DRO-EDL-MPC: Evidential Deep Learning-Based Distributionally Robust Model Predictive Control for Safe Autonomous Driving

Authors:Hyeongchan Ham, Heejin Ahn

Date:2025-07-08 06:45:18

Safety is a critical concern in motion planning for autonomous vehicles. Modern autonomous vehicles rely on neural network-based perception, but making control decisions based on these inference results poses significant safety risks due to inherent uncertainties. To address this challenge, we present a distributionally robust optimization (DRO) framework that accounts for both aleatoric and epistemic perception uncertainties using evidential deep learning (EDL). Our approach introduces a novel ambiguity set formulation based on evidential distributions that dynamically adjusts the conservativeness according to perception confidence levels. We integrate this uncertainty-aware constraint into model predictive control (MPC), proposing the DRO-EDL-MPC algorithm with computational tractability for autonomous driving applications. Validation in the CARLA simulator demonstrates that our approach maintains efficiency under high perception confidence while enforcing conservative constraints under low confidence.

AnatomyCarve: A VR occlusion management technique for medical images based on segment-aware clipping

Authors:Andrey Titov, Tina N. H. Nantenaina, Marta Kersten-Oertel, Simon Drouin

Date:2025-07-08 01:20:07

Visualizing 3D medical images is challenging due to self-occlusion, where anatomical structures of interest can be obscured by surrounding tissues. Existing methods, such as slicing and interactive clipping, are limited in their ability to fully represent internal anatomy in context. In contrast, hand-drawn medical illustrations in anatomy books manage occlusion effectively by selectively removing portions based on tissue type, revealing 3D structures while preserving context. This paper introduces AnatomyCarve, a novel technique developed for a VR environment that creates high-quality illustrations similar to those in anatomy books, while remaining fast and interactive. AnatomyCarve allows users to clip selected segments from 3D medical volumes, preserving spatial relations and contextual information. This approach enhances visualization by combining advanced rendering techniques with natural user interactions in VR. Usability of AnatomyCarve was assessed through a study with non-experts, while surgical planning effectiveness was evaluated with practicing neurosurgeons and residents. The results show that AnatomyCarve enables customized anatomical visualizations, with high user satisfaction, suggesting its potential for educational and clinical applications.

Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines

Authors:Wilka Carvalho, Sam Hall-McMaster, Honglak Lee, Samuel J. Gershman

Date:2025-07-08 00:55:47

Humans can pursue a near-infinite variety of tasks, but typically can only pursue a small number at the same time. We hypothesize that humans leverage experience on one task to preemptively learn solutions to other tasks that were accessible but not pursued. We formalize this idea as Multitask Preplay, a novel algorithm that replays experience on one task as the starting point for "preplay" -- counterfactual simulation of an accessible but unpursued task. Preplay is used to learn a predictive representation that can support fast, adaptive task performance later on. We first show that, compared to traditional planning and predictive representation methods, multitask preplay better predicts how humans generalize to tasks that were accessible but not pursued in a small grid-world, even when people didn't know they would need to generalize to these tasks. We then show these predictions generalize to Craftax, a partially observable 2D Minecraft environment. Finally, we show that Multitask Preplay enables artificial agents to learn behaviors that transfer to novel Craftax worlds sharing task co-occurrence structure. These findings demonstrate that Multitask Preplay is a scalable theory of how humans counterfactually learn and generalize across multiple tasks; endowing artificial agents with the same capacity can significantly improve their performance in challenging multitask environments.

Gaussian Process-Based Active Exploration Strategies in Vision and Touch

Authors:Ho Jin Choi, Nadia Figueroa

Date:2025-07-07 22:37:59

Robots struggle to understand object properties like shape, material, and semantics due to limited prior knowledge, hindering manipulation in unstructured environments. In contrast, humans learn these properties through interactive multi-sensor exploration. This work proposes fusing visual and tactile observations into a unified Gaussian Process Distance Field (GPDF) representation for active perception of object properties. While primarily focusing on geometry, this approach also demonstrates potential for modeling surface properties beyond geometry. The GPDF encodes signed distance using point cloud, analytic gradient and Hessian, and surface uncertainty estimates, which are attributes that common neural network shape representation lack. By utilizing a point cloud to construct a distance function, GPDF does not need extensive pretraining on large datasets and can incorporate observations by aggregation. Starting with an initial visual shape estimate, the framework iteratively refines the geometry by integrating dense vision measurements using differentiable rendering and tactile measurements at uncertain surface regions. By quantifying multi-sensor uncertainties, it plans exploratory motions to maximize information gain for recovering precise 3D structures. For the real-world robot experiment, we utilize the Franka Research 3 robot manipulator, which is fixed on a table and has a customized DIGIT tactile sensor and an Intel Realsense D435 RGBD camera mounted on the end-effector. In these experiments, the robot explores the shape and properties of objects assumed to be static and placed on the table. To improve scalability, we investigate approximation methods like inducing point method for Gaussian Processes. This probabilistic multi-modal fusion enables active exploration and mapping of complex object geometries, extending potentially beyond geometry.

2048: Reinforcement Learning in a Delayed Reward Environment

Authors:Prady Saligram, Tanvir Bhathal, Robby Manihani

Date:2025-07-07 20:33:12

Delayed and sparse rewards present a fundamental obstacle for reinforcement-learning (RL) agents, which struggle to assign credit for actions whose benefits emerge many steps later. The sliding-tile game 2048 epitomizes this challenge: although frequent small score changes yield immediate feedback, they often mislead agents into locally optimal but globally suboptimal strategies. In this work, we introduce a unified, distributional multi-step RL framework designed to directly optimize long-horizon performance. Using the open source Gym-2048 environment we develop and compare four agent variants: standard DQN, PPO, QR-DQN (Quantile Regression DQN), and a novel Horizon-DQN (H-DQN) that integrates distributional learning, dueling architectures, noisy networks, prioritized replay, and more. Empirical evaluation reveals a clear hierarchy in effectiveness: max episode scores improve from 3.988K (DQN) to 5.756K (PPO), 8.66K (QR-DQN), and 18.21K (H-DQN), with H-DQN reaching the 2048 tile. Upon scaling H-DQN it reaches a max score 41.828K and a 4096 tile. These results demonstrate that distributional, multi-step targets substantially enhance performance in sparse-reward domains, and they suggest promising avenues for further gains through model-based planning and curriculum learning.

Strategic Alignment Patterns in National AI Policies

Authors:Mohammad Hossein Azin, Hessam Zandhessami

Date:2025-07-07 18:36:30

This paper introduces a novel visual mapping methodology for assessing strategic alignment in national artificial intelligence policies. The proliferation of AI strategies across countries has created an urgent need for analytical frameworks that can evaluate policy coherence between strategic objectives, foresight methods, and implementation instruments. Drawing on data from the OECD AI Policy Observatory, we analyze 15-20 national AI strategies using a combination of matrix-based visualization and network analysis to identify patterns of alignment and misalignment. Our findings reveal distinct alignment archetypes across governance models, with notable variations in how countries integrate foresight methodologies with implementation planning. High-coherence strategies demonstrate strong interconnections between economic competitiveness objectives and robust innovation funding instruments, while common vulnerabilities include misalignment between ethical AI objectives and corresponding regulatory frameworks. The proposed visual mapping approach offers both methodological contributions to policy analysis and practical insights for enhancing strategic coherence in AI governance. This research addresses significant gaps in policy evaluation methodology and provides actionable guidance for policymakers seeking to strengthen alignment in technological governance frameworks.

Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

Authors:Chun-Hsiao Yeh, Yilin Wang, Nanxuan Zhao, Richard Zhang, Yuheng Li, Yi Ma, Krishna Kumar Singh

Date:2025-07-07 17:59:56

Recent diffusion-based image editing methods have significantly advanced text-guided tasks but often struggle to interpret complex, indirect instructions. Moreover, current models frequently suffer from poor identity preservation, unintended edits, or rely heavily on manual masks. To address these challenges, we introduce X-Planner, a Multimodal Large Language Model (MLLM)-based planning system that effectively bridges user intent with editing model capabilities. X-Planner employs chain-of-thought reasoning to systematically decompose complex instructions into simpler, clear sub-instructions. For each sub-instruction, X-Planner automatically generates precise edit types and segmentation masks, eliminating manual intervention and ensuring localized, identity-preserving edits. Additionally, we propose a novel automated pipeline for generating large-scale data to train X-Planner which achieves state-of-the-art results on both existing benchmarks and our newly introduced complex editing benchmark.

From Marginal to Joint Predictions: Evaluating Scene-Consistent Trajectory Prediction Approaches for Automated Driving

Authors:Fabian Konstantinidis, Ariel Dallari Guerreiro, Raphael Trumpp, Moritz Sackmann, Ulrich Hofmann, Marco Caccamo, Christoph Stiller

Date:2025-07-07 17:58:53

Accurate motion prediction of surrounding traffic participants is crucial for the safe and efficient operation of automated vehicles in dynamic environments. Marginal prediction models commonly forecast each agent's future trajectories independently, often leading to sub-optimal planning decisions for an automated vehicle. In contrast, joint prediction models explicitly account for the interactions between agents, yielding socially and physically consistent predictions on a scene level. However, existing approaches differ not only in their problem formulation but also in the model architectures and implementation details used, making it difficult to compare them. In this work, we systematically investigate different approaches to joint motion prediction, including post-processing of the marginal predictions, explicitly training the model for joint predictions, and framing the problem as a generative task. We evaluate each approach in terms of prediction accuracy, multi-modality, and inference efficiency, offering a comprehensive analysis of the strengths and limitations of each approach. Several prediction examples are available at https://frommarginaltojointpred.github.io/.

NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving

Authors:Qucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu, Xiaotong Liu, Xiaoyin Zheng, Chen Chen, Cheng Lu

Date:2025-07-07 17:37:01

Autonomous driving systems have made significant advances in Q&A, perception, prediction, and planning based on local visual information, yet they struggle to incorporate broader navigational context that human drivers routinely utilize. We address this critical gap between local sensor data and global navigation information by proposing NavigScene, an auxiliary navigation-guided natural language dataset that simulates a human-like driving environment within autonomous driving systems. Moreover, we develop three complementary paradigms to leverage NavigScene: (1) Navigation-guided Reasoning, which enhances vision-language models by incorporating navigation context into the prompting approach; (2) Navigation-guided Preference Optimization, a reinforcement learning method that extends Direct Preference Optimization to improve vision-language model responses by establishing preferences for navigation-relevant summarized information; and (3) Navigation-guided Vision-Language-Action model, which integrates navigation guidance and vision-language models with conventional driving models through feature fusion. Extensive experiments demonstrate that our approaches significantly improve performance across perception, prediction, planning, and question-answering tasks by enabling reasoning capabilities beyond visual range and improving generalization to diverse driving scenarios. This work represents a significant step toward more comprehensive autonomous driving systems capable of navigating complex, unfamiliar environments with greater reliability and safety.

Clinical test cases for model-based dose calculation algorithm commissioning, QA and benchmarking, for 192Ir HDR brachytherapy of gynecologic cancers

Authors:V. Peppa, M. Robitaille, F. Akbari, S. A. Enger, R. M. Thomson, F. Mourtada, G. P. Fonseca

Date:2025-07-07 15:55:54

Purpose: To develop clinically relevant test cases for commissioning Model-Based Dose Calculation Algorithms (MBDCAs) for 192Ir High Dose Rate (HDR) gynecologic brachytherapy following the workflow proposed by the TG-186 report and the WGDCAB report 372. Acquisition and Validation Methods: Two cervical cancer intracavitary HDR brachytherapy patient models were created, using either uniformly structured regions or realistic segmentation. The computed tomography (CT) images of the models were converted to DICOM CT images via MATLAB and imported into two Treatment Planning Systems (TPSs) with MBDCA capability. The clinical segmentation was expanded to include additional organs at risk. The actual clinical treatment plan was generally maintained, with the source replaced by a generic 192Ir HDR source. Dose to medium in medium calculations were performed using the MBDCA option of each TPS, and three different Monte Carlo (MC) simulation codes. MC results agreed within statistical uncertainty, while comparisons between MBDCA and MC dose distributions highlighted both strengths and limitations of the studied MBDCAs, suggesting potential approaches to overcome the challenges. Data Format and Usage Notes: The datasets for the developed cases are available online at http://doi.org/ 10.5281/zenodo.15720996. The DICOM files include the treatment plan for each case, TPS, and the corresponding reference MC dose data. The package also contains a TPS- and case-specific user guide for commissioning the MBDCAs, and files needed to replicate the MC simulations. Potential Applications: The provided datasets and proposed methodology offer a commissioning framework for TPSs using MBDCAs, and serve as a benchmark for brachytherapy researchers using MC methods. They also facilitate intercomparisons of MBDCA performance and provide a quality assurance resource for evaluating future TPS software updates.

LERa: Replanning with Visual Feedback in Instruction Following

Authors:Svyatoslav Pchelintsev, Maxim Patratskiy, Anatoly Onishchenko, Alexandr Korchemnyi, Aleksandr Medvedev, Uliana Vinogradova, Ilya Galuzinsky, Aleksey Postnikov, Alexey K. Kovalev, Aleksandr I. Panov

Date:2025-07-07 15:49:00

Large Language Models are increasingly used in robotics for task planning, but their reliance on textual inputs limits their adaptability to real-world changes and failures. To address these challenges, we propose LERa - Look, Explain, Replan - a Visual Language Model-based replanning approach that utilizes visual feedback. Unlike existing methods, LERa requires only a raw RGB image, a natural language instruction, an initial task plan, and failure detection - without additional information such as object detection or predefined conditions that may be unavailable in a given scenario. The replanning process consists of three steps: (i) Look, where LERa generates a scene description and identifies errors; (ii) Explain, where it provides corrective guidance; and (iii) Replan, where it modifies the plan accordingly. LERa is adaptable to various agent architectures and can handle errors from both dynamic scene changes and task execution failures. We evaluate LERa on the newly introduced ALFRED-ChaOS and VirtualHome-ChaOS datasets, achieving a 40% improvement over baselines in dynamic environments. In tabletop manipulation tasks with a predefined probability of task failure within the PyBullet simulator, LERa improves success rates by up to 67%. Further experiments, including real-world trials with a tabletop manipulator robot, confirm LERa's effectiveness in replanning. We demonstrate that LERa is a robust and adaptable solution for error-aware task execution in robotics. The code is available at https://lera-robo.github.io.

A Corrective Frequency-Constrained Unit Commitment with Data-driven Estimation of Optimal UFLS in Island Power Systems

Authors:Miad Sarvarizadeh, Lukas Sigrist, Almudena Rouco, Mohammad Rajabdorri, Enrique Lobato

Date:2025-07-07 14:47:47

This paper presents a novel corrective \gls{fcuc} formulation for island power systems by implementing data-driven constraint learning to estimate the optimal \gls{ufls}. The Tobit model is presented to estimate the optimal amount of \gls{ufls} using the initial rate of change of frequency. The proposed formulation enables co-optimizing operation costs and \gls{ufls}. The aim is to account for optimal \gls{ufls} occurrences during operation planning, without increasing them. This would potentially reduce system operation costs by relaxing the reserve requirement constraint. The performance of the proposed formulation has been analyzed for a Spanish island power system through various simulations. Different daily demand profiles are analyzed to demonstrate the effectiveness of the proposed formulation. Additionally, a sensitivity analysis is conducted to demonstrate the effects of changing the cost associated with \gls{ufls}. The corrective \gls{fcuc} is shown to be capable of reducing system operation costs without jeopardizing the quality of the frequency response in terms of \gls{ufls} occurrence.

The Hitchhiker's Guide to Differential Dynamic Microscopy

Authors:Enrico Lattuada, Fabian Krautgasser, Maxime Lavaud, Fabio Giavazzi, Roberto Cerbino

Date:2025-07-07 14:43:56

Over nearly two decades, Differential Dynamic Microscopy (DDM) has become a standard technique for extracting dynamic correlation functions from time-lapse microscopy data, with applications spanning colloidal suspensions, polymer solutions, active fluids, and biological systems. In its most common implementation, DDM analyzes image sequences acquired with a conventional microscope equipped with a digital camera, yielding time- and wavevector-resolved information analogous to that obtained in multi-angle Dynamic Light Scattering (DLS). With a widening array of applications and a growing, heterogeneous user base, lowering the technical barrier to performing DDM has become a central objective. In this tutorial article, we provide a step-by-step guide to conducting DDM experiments -- from planning and acquisition to data analysis -- and introduce the open-source software package fastDDM, designed to efficiently process large image datasets. fastDDM employs optimized, parallel algorithms that reduce analysis times by up to four orders of magnitude on typical datasets (e.g., 10,000 frames), thereby enabling high-throughput workflows and making DDM more broadly accessible across disciplines.

Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning

Authors:Ricardo Cardoso, Plinio Moreno

Date:2025-07-07 14:11:47

Inertial mass plays a crucial role in robotic applications such as object grasping, manipulation, and simulation, providing a strong prior for planning and control. Accurately estimating an object's mass before interaction can significantly enhance the performance of various robotic tasks. However, mass estimation using only vision sensors is a relatively underexplored area. This paper proposes a novel approach combining sparse point-cloud data from depth images with RGB images to estimate the mass of objects. We evaluate a range of point-cloud processing architectures, alongside RGB-only methods. To overcome the limited availability of training data, we create a synthetic dataset using ShapeNetSem 3D models, simulating RGBD images via a Kinect camera. This synthetic data is used to train an image generation model for estimating dense depth maps, which we then use to augment an existing dataset of images paired with mass values. Our approach significantly outperforms existing benchmarks across all evaluated metrics. The data generation (https://github.com/RavineWindteer/ShapenetSem-to-RGBD) as well as the training of the depth estimator (https://github.com/RavineWindteer/GLPDepth-Edited) and the mass estimator (https://github.com/RavineWindteer/Depth-mass-estimator) are available online.

When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning

Authors:Maxence Boels, Harry Robertshaw, Alejandro Granados, Prokar Dasgupta, Sebastien Ourselin

Date:2025-07-07 13:49:57

Surgical action planning requires predicting future instrument-verb-target triplets for real-time assistance. While teleoperated robotic surgery provides natural expert demonstrations for imitation learning (IL), reinforcement learning (RL) could potentially discover superior strategies through exploration. We present the first comprehensive comparison of IL versus RL for surgical action planning on CholecT50. Our Dual-task Autoregressive Imitation Learning (DARIL) baseline achieves 34.6% action triplet recognition mAP and 33.6% next frame prediction mAP with smooth planning degradation to 29.2% at 10-second horizons. We evaluated three RL variants: world model-based RL, direct video RL, and inverse RL enhancement. Surprisingly, all RL approaches underperformed DARIL i.e. world model RL dropped to 3.1% mAP at 10s while direct video RL achieved only 15.9%. Our analysis reveals that distribution matching on expert-annotated test sets systematically favors IL over potentially valid RL policies that differ from training demonstrations. This challenges assumptions about RL superiority in sequential decision making and provides crucial insights for surgical AI development.

Unifying Robot Optimization: Monte Carlo Tree Search with Tensor Factorization

Authors:Teng Xue, Amirreza Razmjoo, Yan Zhang, Sylvain Calinon

Date:2025-07-07 12:49:20

Many robotic tasks, such as inverse kinematics, motion planning, and optimal control, can be formulated as optimization problems. Solving these problems involves addressing nonlinear kinematics, complex contact dynamics, and long-horizon planning, each posing distinct challenges for state-of-the-art optimization methods. To efficiently solve a wide range of tasks across varying scenarios, researchers either develop specialized algorithms for the task to achieve, or switch between different frameworks. Monte Carlo Tree Search (MCTS) is a general-purpose decision-making tool that enables strategic exploration across problem instances without relying on task-specific structures. However, MCTS suffers from combinatorial complexity, leading to slow convergence and high memory usage. To address this limitation, we propose \emph{Tensor Train Tree Search} (TTTS), which leverages tensor factorization to exploit the separable structure of decision trees. This yields a low-rank, linear-complexity representation that significantly reduces both computation time and storage requirements. We prove that TTTS can efficiently reach the bounded global optimum within a finite time. Experimental results across inverse kinematics, motion planning around obstacles, multi-stage motion planning, and bimanual whole-body manipulation demonstrate the efficiency of TTTS on a diverse set of robotic tasks.

Advancement of Circular Economy Through Interdisciplinary Collaboration: A Bibliometric Approach

Authors:Keita Nishimoto, Koji Kimita, Shinsuke Murakami, Yin Long, Kimitaka Asatani, Ichiro Sakata

Date:2025-07-07 12:16:17

Since the European Union introduced its Circular Economy (CE) Action Plan in 2015, CE research has expanded rapidly. However, the structure of this emerging field - both in terms of its constituent disciplines and researcher dynamics - remains poorly understood. To address this gap, we analyze over 25,000 CE-related publications from Scopus by combining conventional bibliometric approaches with advanced machine learning techniques, including text embeddings and clustering. This hybrid method enables both a macro-level mapping of research domains and a micro-level investigation of individual researchers' disciplinary backgrounds and collaborations. We classify CE research into 16 distinct clusters, identifying the original disciplines of researchers and visualizing patterns of interdisciplinary collaboration. Building on this foundation, we ask: Which CE-related research domains receive the most attention in academic and policy contexts? And how are different types of interdisciplinary collaboration associated with research impact? Our findings show that research in business and management attracts substantial academic and policy attention, while engineering research - though less visible - tends to achieve higher funding success. This suggests a positive dynamic in which the former draws attention to CE issues and the latter secures the economic resources necessary to realize them. We further demonstrate that CE papers co-authored by researchers from different disciplines tend to show higher research impact than intradisciplinary work. Qualitative case analyses also highlight this tendency. Centered particularly on collaborations between business-oriented and engineering-oriented disciplines, our findings underscore the importance of interdisciplinary efforts in CE research and offer insights for guiding future cross-disciplinary engagement in the field.