planning - 2025-12-22

Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes

Authors:Carlos Vélez García, Miguel Cazorla, Jorge Pomares
Date:2025-12-19 17:49:13

We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or explicit planner, PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures. Planning is realized as gradient-based refinement in this energy landscape, using identical computation during training and inference to reduce train-test mismatch common in decoupled modeling pipelines. PaD is trained via self-supervised hindsight goal relabeling, shaping the energy landscape around the planning dynamics. At inference, multiple trajectory candidates are refined under different temporal hypotheses, and low-energy plans balancing feasibility and efficiency are selected. We evaluate PaD on OGBench cube manipulation tasks. When trained on narrow expert demonstrations, PaD achieves state-of-the-art 95\% success, strongly outperforming prior methods that peak at 68\%. Remarkably, training on noisy, suboptimal data further improves success and plan efficiency, highlighting the benefits of verification-driven planning. Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning for offline, reward-free planning.

Quantum Wasserstein distance for Gaussian states

Authors:Anaelle Hertz, Mohammad Ahmadpoor, Oleksandr Dzhenzherov, Augusto Gerolin, Khabat Heshami
Date:2025-12-19 17:13:55

Optimal transport between classical probability distributions has been proven useful in areas such as machine learning and random combinatorial optimization. Quantum optimal transport, and the quantum Wasserstein distance as the minimal cost associated with transforming one quantum state to another, is expected to have implications in quantum state discrimination and quantum metrology. In this work, following the formalism introduced in [De Palma, G. and Trevisan, D. Ann. Henri Poincaré, {\bf 22} (2021), 3199-3234] to compute the optimal transport plan between two quantum states, we give a general formula for the Wasserstein distance of order 2 between any two one-mode Gaussian states. We discuss how the Wasserstein distance between classical Gaussian distributions and the quantum Wasserstein distance by De Palma and Trevisan for thermal states can be recovered from our general formula for Gaussian states. This opens the path to directly compare various known distance measures with the Wasserstein distance through their closed-form solutions.

A Parametric Framework for Anticipatory Flashflood Warning: Integrating Landscape Vulnerability with Precipitation Forecasts

Authors:Xiangpeng Li, Junwei Ma, Samuel D Brody, Ali Mostafavi
Date:2025-12-19 16:55:00

Flash flood warnings are largely reactive, providing limited advance notice for evacuation planning and resource prepositioning. This study presents and validates an anticipatory, parametric framework that converts landscape vulnerability and precipitation into transparent, zone-aware threat levels at neighborhood scales. We first derive an inherent hazard likelihood (IHL) surface using pluvial flood depth, height above nearest drainage, and distance to streams. Next, we compute a hazard severity index (HSI) by normalizing 24-hour rainfall against local Atlas-14 100-year, 24-hour depths. We then integrate IHL and HSI within a localized threat severity (LTS) matrix using 20 class-specific triggers, requiring lower exceedance in high-risk terrain and higher exceedance in uplands. Applied to two Texas flood events, the LTS exhibits statistically significant spatial association with independent crowdsourced impact proxies, capturing observed disruption hotspots. The framework is computationally lightweight, scalable, and extends actionable situational awareness into a 48-72 hour anticipatory window, supporting pre-event decision-making by emergency managers.

Gravity Prior and Temporal Horizon Shape Interceptive Behavior under Active Inference

Authors:Marta Russo, Antonella Maselli, Federico Maggiore, Giovanni Pezzulo
Date:2025-12-19 16:14:47

Accurate interception of moving objects, such as catching a ball, requires the nervous system to overcome sensory delays, noise, and environmental dynamics. One key challenge is predicting future object motion in the presence of sensory uncertainty and inherent neural processing latencies. Theoretical frameworks such as internal models and optimal control have emphasized the role of predictive mechanisms in motor behavior. Active Inference extends these ideas by positing that perception and action arise from minimizing variational free energy under a generative model of the world. In this study, we investigate how different predictive strategies and the inclusion of environmental dynamics, specifically an internal model of gravity, influence interceptive control within an Active Inference agent. We simulate a simplified ball-catching task in which the agent moves a cursor horizontally to intercept a parabolically falling object. Four strategies are compared: short temporal horizon prediction of the next position or long horizon estimation of the interception point, each with or without a gravity prior. Performance is evaluated across diverse initial conditions using spatial and temporal error, action magnitude, and movement corrections. All strategies produce successful interception behavior, but those that incorporate gravity and longer temporal horizons outperform others. Including a gravity prior significantly improves spatial and temporal accuracy. Predicting the future interception point yields lower action values and smoother trajectories compared to short-horizon prediction. These findings suggest that internal models of physical dynamics and extended predictive horizons can enhance interceptive control, providing a unified computational account of how the brain may integrate sensory uncertainty, physical expectations, and motor planning.

A Dual Quaternion based RRT* Path Planning Approach for Satellite Rendezvous and Docking

Authors:Ana Stankovic, Mohamed Khalil Ben-Larbi, Wolfgang H. Müller
Date:2025-12-19 15:17:46

This paper proposes a sampling-based motion planner that employs a dual quaternion representation to generate smooth, collision-free six-degree-of-freedom pose trajectories for satellite rendezvous and docking under keep-out zone constraints. The proposed planner integrates the dual quaternion algebra directly into an RRT* framework, thereby enabling natural screw motion interpolation in SE(3). The dual quaternion-based RRT* has been implemented in Python and demonstrated on a representative multi-obstacle scenario. A comparison with a standard RRT* using separate translation and quaternion steering highlights the enhanced pose continuity and obstacle avoidance of the proposed method. The present approach is purely kinematic in nature and does not take into account relative orbital dynamics. Consequently, the resulting path provides a preliminary estimate for a subsequent optimisation-based trajectory planner, which will refine the motion with dynamic constraints for the purpose of practical satellite rendezvous and docking missions.

Optimized Scheduling and Positioning of Mobile Manipulators in Collaborative Applications

Authors:Christian Cella, Sole Ester Sonnino, Marco Faroni, Andrea Zanchettin, Paolo Rocco
Date:2025-12-19 13:50:07

The growing integration of mobile robots in shared workspaces requires efficient path planning and coordination between the agents, accounting for safety and productivity. In this work, we propose a digital model-based optimization framework for mobile manipulators in human-robot collaborative environments, in order to determine the sequence of robot base poses and the task scheduling for the robot. The complete problem is treated as black-box, and Particle Swarm Optimization (PSO) is employed to balance conflicting Key-Performance Indicators (KPIs). We demonstrate improvements in cycle time, task sequencing, and adaptation to human presence in a collaborative box-packing scenario.

A unified FLAIR hyperintensity segmentation model for various CNS tumor types and acquisition time points

Authors:Mathilde Gajda Faanes, David Bouget, Asgeir S. Jakola, Timothy R. Smith, Vasileios K. Kavouridis, Francesco Latini, Margret Jensdottir, Peter Milos, Henrietta Nittby Redebrandt, Rickard L. Sjöberg, Rupavathana Mahesparan, Lars Kjelsberg Pedersen, Ole Solheim, Ingerid Reinertsen
Date:2025-12-19 13:33:43

T2-weighted fluid-attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) scans are important for diagnosis, treatment planning and monitoring of brain tumors. Depending on the brain tumor type, the FLAIR hyperintensity volume is an important measure to asses the tumor volume or surrounding edema, and an automatic segmentation of this would be useful in the clinic. In this study, around 5000 FLAIR images of various tumors types and acquisition time points from different centers were used to train a unified FLAIR hyperintensity segmentation model using an Attention U-Net architecture. The performance was compared against dataset specific models, and was validated on different tumor types, acquisition time points and against BraTS. The unified model achieved an average Dice score of 88.65\% for pre-operative meningiomas, 80.08% for pre-operative metastasis, 90.92% for pre-operative and 84.60% for post-operative gliomas from BraTS, and 84.47% for pre-operative and 61.27\% for post-operative lower grade gliomas. In addition, the results showed that the unified model achieved comparable segmentation performance to the dataset specific models on their respective datasets, and enables generalization across tumor types and acquisition time points, which facilitates the deployment in a clinical setting. The model is integrated into Raidionics, an open-source software for CNS tumor analysis.

Deep Learning-based Robust Autonomous Navigation of Aerial Robots in Dense Forests

Authors:Guglielmo Del Col, Väinö Karjalainen, Teemu Hakala, Yibo Zhang, Eija Honkavaara
Date:2025-12-19 13:19:33

Autonomous aerial navigation in dense natural environments remains challenging due to limited visibility, thin and irregular obstacles, GNSS-denied operation, and frequent perceptual degradation. This work presents an improved deep learning-based navigation framework that integrates semantically enhanced depth encoding with neural motion-primitive evaluation for robust flight in cluttered forests. Several modules are incorporated on top of the original sevae-ORACLE algorithm to address limitations observed during real-world deployment, including lateral control for sharper maneuvering, a temporal consistency mechanism to suppress oscillatory planning decisions, a stereo-based visual-inertial odometry solution for drift-resilient state estimation, and a supervisory safety layer that filters unsafe actions in real time. A depth refinement stage is included to improve the representation of thin branches and reduce stereo noise, while GPU optimization increases onboard inference throughput from 4 Hz to 10 Hz. The proposed approach is evaluated against several existing learning-based navigation methods under identical environmental conditions and hardware constraints. It demonstrates higher success rates, more stable trajectories, and improved collision avoidance, particularly in highly cluttered forest settings. The system is deployed on a custom quadrotor in three boreal forest environments, achieving fully autonomous completion in all flights in moderate and dense clutter, and 12 out of 15 flights in highly dense underbrush. These results demonstrate improved reliability and safety over existing navigation methods in complex natural environments.

TwinSegNet: A Digital Twin-Enabled Federated Learning Framework for Brain Tumor Analysis

Authors:Almustapha A. Wakili, Adamu Hussaini, Abubakar A. Musa, Woosub Jung, Wei Yu
Date:2025-12-19 11:59:41

Brain tumor segmentation is critical in diagnosis and treatment planning for the disease. Yet, current deep learning methods rely on centralized data collection, which raises privacy concerns and limits generalization across diverse institutions. In this paper, we propose TwinSegNet, which is a privacy-preserving federated learning framework that integrates a hybrid ViT-UNet model with personalized digital twins for accurate and real-time brain tumor segmentation. Our architecture combines convolutional encoders with Vision Transformer bottlenecks to capture local and global context. Each institution fine-tunes the global model of private data to form its digital twin. Evaluated on nine heterogeneous MRI datasets, including BraTS 2019-2021 and custom tumor collections, TwinSegNet achieves high Dice scores (up to 0.90%) and sensitivity/specificity exceeding 90%, demonstrating robustness across non-independent and identically distributed (IID) client distributions. Comparative results against centralized models such as TumorVisNet highlight TwinSegNet's effectiveness in preserving privacy without sacrificing performance. Our approach enables scalable, personalized segmentation for multi-institutional clinical settings while adhering to strict data confidentiality requirements.

VAIR: Visual Analytics for Injury Risk Exploration in Sports

Authors:Chunggi Lee, Ut Gong, Tica Lin, Stefanie Zollmann, Scott A Epsley, Adam Petway, Hanspeter Pfister
Date:2025-12-19 10:57:50

Injury prevention in sports requires understanding how bio-mechanical risks emerge from movement patterns captured in real-world scenarios. However, identifying and interpreting injury prone events from raw video remains difficult and time-consuming. We present VAIR, a visual analytics system that supports injury risk analysis using 3D human motion reconstructed from sports video. VAIR combines pose estimation, bio-mechanical simulation, and synchronized visualizations to help users explore how joint-level risk indicators evolve over time. Domain experts can inspect movement segments through temporally aligned joint angles, angular velocity, and internal forces to detect patterns associated with known injury mechanisms. Through case studies involving Achilles tendon and Anterior cruciate ligament (ACL) injuries in basketball, we show that VAIR enables more efficient identification and interpretation of risky movements. Expert feedback confirms that VAIR improves diagnostic reasoning and supports both retrospective analysis and proactive intervention planning.

Semantic Model for the SKA Regional Centre Network

Authors:Edgar Ribeiro João, Manuel Parra-Royón, Julián Garrido
Date:2025-12-19 07:43:46

The unprecedented volume of data from the Square Kilometre Array (SKA) telescopes will require the implementation of robust and solid strategies for efficient data processing and management. In this context, the SKA Regional Centre Network (SRCNet) -- a collaborative global infrastructure comprising multiple regional centres distributed across various geographical regions around the globe -- is poised to play a critical role. This network will be instrumental in facilitating the effective handling and analysis of extensive data streams generated by the telescopes, thereby enabling significant advancements in astronomical research and exploration. This paper introduces a semantic model implemented with JSON-LD designed specifically for the SRCNet, detailing its architecture, data distribution, and computing service. By explicitly defining nodes, resources, relationships, and workflows, this model lays a foundation for interoperability and efficient resource management within the distributed network. The model presented in this text supports two possible configurations: centralized and decentralized -- depending where data reside -- enabling a future service broker to efficiently plan workflows by querying nodes for real-time system availability. Consistency tests conducted using SPARQL queries were made on the model in order to validate and test its integrity. Therefore, this research contributes to the advancement of semantic modeling in astronomy by addressing the semantic model for the SRCNet, a topic that has not been previously explored. This semantic model serves as a precursor to the development of a precise mathematical representation of the network and establishes a foundational framework for a future service broker.

Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics

Authors:Jun'ichi Ozaki, Ryosuke Susuta, Takuhiro Moriyama, Yohei Shida
Date:2025-12-19 04:59:41

Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.

Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Authors:Rujiao Long, Yang Li, Xingyao Zhang, Weixun Wang, Tianqianjin Lin, Xi Zhao, Yuchi Xu, Wenbo Su, Junchi Yan, Bo Zheng
Date:2025-12-19 03:32:53

Exploration capacity shapes both inference-time performance and reinforcement learning (RL) training for large (vision-) language models, as stochastic sampling often yields redundant reasoning paths with little high-level diversity. This paper proposes Reasoning Palette, a novel latent-modulation framework that endows the model with a stochastic latent variable for strategic contextualization, guiding its internal planning prior to token generation. This latent context is inferred from the mean-pooled embedding of a question-answer pair via a variational autoencoder (VAE), where each sampled latent potentially encodes a distinct reasoning context. During inference, a sampled latent is decoded into learnable token prefixes and prepended to the input prompt, modulating the model's internal reasoning trajectory. In this way, the model performs internal sampling over reasoning strategies prior to output generation, which shapes the style and structure of the entire response sequence. A brief supervised fine-tuning (SFT) warm-up phase allows the model to adapt to this latent conditioning. Within RL optimization, Reasoning Palette facilitates structured exploration by enabling on-demand injection for diverse reasoning modes, significantly enhancing exploration efficiency and sustained learning capability. Experiments across multiple reasoning benchmarks demonstrate that our method enables interpretable and controllable control over the (vision-) language model's strategic behavior, thereby achieving consistent performance gains over standard RL methods.

It is not always greener on the other side: Greenery perception across demographics and personalities in multiple cities

Authors:Matias Quintana, Fangqi Liu, Jussi Torkko, Youlong Gu, Xiucheng Liang, Yujun Hou, Koichi Ito, Yihan Zhu, Mahmoud Abdelrahman, Tuuli Toivonen, Yi Lu, Filip Biljecki
Date:2025-12-19 03:01:40

Quantifying and assessing urban greenery is consequential for planning and development, reflecting the everlasting importance of green spaces for multiple climate and well-being dimensions of cities. Evaluation can be broadly grouped into objective (e.g., measuring the amount of greenery) and subjective (e.g., polling the perception of people) approaches, which may differ -- what people see and feel about how green a place is might not match the measurements of the actual amount of vegetation. In this work, we advance the state of the art by measuring such differences and explaining them through human, geographic, and spatial dimensions. The experiments rely on contextual information extracted from street view imagery and a comprehensive urban visual perception survey collected from 1,000 people across five countries with their extensive demographic and personality information. We analyze the discrepancies between objective measures (e.g., Green View Index (GVI)) and subjective scores (e.g., pairwise ratings), examining whether they can be explained by a variety of human and visual factors such as age group and spatial variation of greenery in the scene. The findings reveal that such discrepancies are comparable around the world and that demographics and personality do not play a significant role in perception. Further, while perceived and measured greenery correlate consistently across geographies (both where people and where imagery are from), where people live plays a significant role in explaining perceptual differences, with these two, as the top among seven, features that influences perceived greenery the most. This location influence suggests that cultural, environmental, and experiential factors substantially shape how individuals observe greenery in cities.

Enhancing Long Document Long Form Summarisation with Self-Planning

Authors:Xiaotang Du, Rohit Saxena, Laura Perez-Beltrachini, Pasquale Minervini, Ivan Titov
Date:2025-12-19 02:37:30

We introduce a novel approach for long context summarisation, highlight-guided generation, that leverages sentence-level information as a content plan to improve the traceability and faithfulness of generated summaries. Our framework applies self-planning methods to identify important content and then generates a summary conditioned on the plan. We explore both an end-to-end and two-stage variants of the approach, finding that the two-stage pipeline performs better on long and information-dense documents. Experiments on long-form summarisation datasets demonstrate that our method consistently improves factual consistency while preserving relevance and overall quality. On GovReport, our best approach has improved ROUGE-L by 4.1 points and achieves about 35% gains in SummaC scores. Qualitative analysis shows that highlight-guided summarisation helps preserve important details, leading to more accurate and insightful summaries across domains.

Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making

Authors:Toshiaki Hori, Jonathan DeCastro, Deepak Gopinath, Avinash Balachandran, Guy Rosman
Date:2025-12-18 21:44:00

We propose a new approach for solving planning problems with a hierarchical structure, fusing reinforcement learning and MPC planning. Our formulation tightly and elegantly couples the two planning paradigms. It leverages reinforcement learning actions to inform the MPPI sampler, and adaptively aggregates MPPI samples to inform the value estimation. The resulting adaptive process leverages further MPPI exploration where value estimates are uncertain, and improves training robustness and the overall resulting policies. This results in a robust planning approach that can handle complex planning problems and easily adapts to different applications, as demonstrated over several domains, including race driving, modified Acrobot, and Lunar Lander with added obstacles. Our results in these domains show better data efficiency and overall performance in terms of both rewards and task success, with up to a 72% increase in success rate compared to existing approaches, as well as accelerated convergence (x2.1) compared to non-adaptive sampling.

Virtual Reality in Service Design for Plastics Recycling: Two Application Cases

Authors:Ashley Colley, Kuisma Hurtig, Juri Etto, Emma Kirjavainen, Pavlo Ivanov, Jonna Häkkilä
Date:2025-12-18 21:26:11

Plastics recycling depends on everyday sorting practices and on how recycling services are communicated and experienced. Virtual reality (VR) can present these practices and services in situated, interactive form, yet its role in service design for plastics recycling is still emerging. This paper examines how VR tools can contribute to designing plastics recycling services through two application cases that address different stages of the recycling journey. The first case, Clean Cabin Escape, is a household scale VR escape room where players collect and sort waste items into locally relevant categories, with immediate feedback that supports practice with plastics recycling decisions. The second case is a VR simulation of a plastics recycling center that represents a real planned site and is used in service design workshops where stakeholders explore layout, signage and customer paths for plastics fractions. Across the cases, we analyse how VR supported learning, engagement and shared sensemaking, and how it interacted with other service design methods such as workshops, customer path mapping and physical artefacts. The findings show that VR can make domestic sorting tasks and complex recycling centers more concrete for both citizens and professionals, but also highlight trade offs related to hardware access, onboarding effort, visual fidelity and localisation of recycling rules. The paper concludes by outlining opportunities for integrating VR into broader service design toolsets for plastics recycling and circular economy services, and by pointing to directions for future research on long term impact and inclusive design.

Archaeological investigation of galaxies' evolutionary history in the cosmic middle ages

Authors:Anna R. Gallazzi, Stefano Zibetti, Mark Sargent, Nicolas Bouche', Luke Davies, Marcella Longhetti, Annagrazia Puglisi, Laura Scholz-Diaz, Fabio Ditrani, Daniele Mattolini, Sabine Thater, Crescenzo Tortora, Bodo Ziegler, Mirko Curti, Lucia Pozzetti, Mojtaba Raouf, Umberto Rescigno
Date:2025-12-18 19:00:58

The cosmic Middle Ages, spanning the last 8-10 Gyr of the Universe, is a critical period in which massive early-formed systems coexist with global star formation quenching in less massive galaxies, yet galaxies experience further dynamical, morphological and chemical evolution. Understanding the relative role of internal drivers and of interaction with the evolving large-scale structures remains a highly complex and unsettled issue. To make transformative progress on these questions we must characterize the physical and kinematic properties (integrated and spatially resolved) of stellar populations in galaxies, fossil record of their past star formation and assembly histories, together with gas properties, across a wide range of masses and environmental scales, over this critical cosmic epoch. Volume-representative samples of 10^6 galaxies down to 10^9 solar masses are essential to fully trace the complex interplay between physical processes and to physically connect progenitor and descendant galaxy populations. This demands a deep and extensive survey with high signal-to-noise, medium-resolution, rest-frame optical spectroscopy. Current and planned facilities in the 2020-2030s cannot simultaneously achieve the required sample size, spectral quality, mass limit, and spatial coverage. A dedicated large-aperture spectroscopic facility with wide-area high-multiplex MOS and large field-of-view IFU is needed to provide transformative insights into the physical mechanisms regulating star formation and galaxy evolution.

MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

Authors:Yuanchen Ju, Yongyuan Liang, Yen-Jen Wang, Nandiraju Gireesh, Yuanliang Ju, Seungjae Lee, Qiao Gu, Elvis Hsieh, Furong Huang, Koushil Sreenath
Date:2025-12-18 18:59:03

Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and which parts are actionable. Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To address these limitations, we introduce MomaGraph, a unified scene representation for embodied agents that integrates spatial-functional relationships and part-level interactive elements. However, advancing such a representation requires both suitable data and rigorous evaluation, which have been largely missing. We thus contribute MomaGraph-Scenes, the first large-scale dataset of richly annotated, task-driven scene graphs in household environments, along with MomaGraph-Bench, a systematic evaluation suite spanning six reasoning capabilities from high-level planning to fine-grained scene understanding. Built upon this foundation, we further develop MomaGraph-R1, a 7B vision-language model trained with reinforcement learning on MomaGraph-Scenes. MomaGraph-R1 predicts task-oriented scene graphs and serves as a zero-shot task planner under a Graph-then-Plan framework. Extensive experiments demonstrate that our model achieves state-of-the-art results among open-source models, reaching 71.6% accuracy on the benchmark (+11.4% over the best baseline), while generalizing across public benchmarks and transferring effectively to real-robot experiments.

An evacuation simulator for pedestrian dynamics based on the Social Force Model

Authors:Julián López, Virginia Mazzone, M. Leticia Rubio Puzzo, Juan Cruz Moreno
Date:2025-12-18 18:51:08

The evacuation of pedestrians from enclosed spaces represents a key problem in safety engineering and infrastructure design. Analyzing the collective dynamics that emerge during evacuation processes requires simulation tools capable of capturing individual interactions and spatial constraints realistically. In this work, we present \textit{SiCoBioNa}, an open-source evacuation simulator based on the Social Force Model (SFM). The software provides an intuitive graphical interface that allows users to configure pedestrian properties, spatial geometries, and initial conditions without requiring prior expertise in numerical modeling techniques. The SFM framework enables the representation of goal-oriented motion, interpersonal interactions, and interactions with fixed obstacles. The simulator generates both quantitative data and visual outputs, facilitating the analysis of evacuation dynamics and the evaluation of different spatial configurations. Due to its modular and extensible design, \textit{SiCoBioNa} serves as a reproducible research tool for studies on pedestrian dynamics providing practical support for evacuation planning.

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Authors:Tianyuan Qu, Lei Ke, Xiaohang Zhan, Longxiang Tang, Yuqi Liu, Bohao Peng, Bei Yu, Dong Yu, Jiaya Jia
Date:2025-12-18 18:34:23

Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions; the editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting. To strengthen planning, we apply GRPO-based reinforcement learning using 1K instruction-only examples, yielding substantial gains in reasoning fidelity and format reliability. We further present IV-Edit, a benchmark focused on fine-grained grounding and knowledge-intensive edits. Across IV-Complex settings, RePlan consistently outperforms strong baselines trained on far larger datasets, improving regional precision and overall fidelity. Our project page: https://replan-iv-edit.github.io

ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning

Authors:Zihan Zhou, Animesh Garg, Ajay Mandlekar, Caelan Garrett
Date:2025-12-18 18:32:39

Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/

Clinical beam test of inter- and intra-fraction relative range monitoring in carbon ion radiotherapy

Authors:Devin Hymers, Sebastian Schroeder, Olga Bertini, Stephan Brons, Johann Heuser, Joerg Lehnert, Christian Joachim Schmidt, Dennis Mücher
Date:2025-12-18 17:29:28

Interaction Vertex Imaging (IVI) is used for range monitoring (RM) in carbon ion radiotherapy. The purpose of RM is to measure the Bragg peak (BP) position for each contributing beam, and detect any changes. Currently, there is no consensus on a clinical RM method, the use of which would improve the safety and consistency of treatment. The prototype filtered IVI (fIVI) Range Monitoring System is the first system to apply large-area and high-rate-capable silicon detectors to IVI. Two layers of these detectors track prompt secondary fragments for use in RM. This device monitored 16 cm and 32 cm diameter cylindrical plastic phantoms irradiated by clinical carbon ion beams at the Heidelberg Ion Beam Therapy Center. Approximately 20 different BP depths were delivered to each phantom, with a minimum depth difference of 0.8 mm and a maximum depth difference of 51.9 mm and 82.5 mm respectively. For large BP range differences, the relationship between the true depth difference and that measured by fIVI is quadratic, although for small differences, the deviation from a linear relationship with a slope of 1 is negligible. RM performance is strongly dependent on the number of tracked particles, particularly in the clinically-relevant regime. Significant performance differences exist between the two phantoms, with millimetric precision at clinical doses being achieved only for the 16 cm phantom. The performance achieved by the prototype fIVI Range Monitoring System is consistent with previous investigations of IVI, despite measuring at more challenging shallow BP positions. Further significant improvements are possible through increasing the sensitive area of the tracking system beyond the prototype, which will both allow an improvement in precision for the most intense points of a scanned treatment plan and expand the number of points for which millimetric precision may be achieved.

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Authors:Xiaopeng Lin, Shijie Lian, Bin Yu, Ruoqi Yang, Changti Wu, Yuzhuo Miao, Yurun Jin, Yukun Shi, Cong Huang, Bojun Cheng, Kai Chen
Date:2025-12-18 17:27:03

Robotic generalization relies on physical intelligence: the ability to reason about state changes, contact-rich interactions, and long-horizon planning under egocentric perception and action. However, most VLMs are trained primarily on third-person data, creating a fundamental viewpoint mismatch for humanoid robots. Scaling robot egocentric data collection remains impractical due to high cost and limited diversity, whereas large-scale human egocentric videos offer a scalable alternative that naturally capture rich interaction context and causal structure. The key challenge is to convert raw egocentric videos into structured and reliable embodiment training supervision. Accordingly, we propose an Egocentric2Embodiment translation pipeline that transforms first-person videos into multi-level, schema-driven VQA supervision with enforced evidence grounding and temporal consistency, enabling the construction of the Egocentric2Embodiment dataset (E2E-3M) at scale. An egocentric-aware embodied brain, termed PhysBrain, is obtained by training on the E2E-3M dataset. PhysBrain exhibits substantially improved egocentric understanding, particularly for planning on EgoThink. It provides an egocentric-aware initialization that enables more sample-efficient VLA fine-tuning and higher SimplerEnv success rates (53.9\%), demonstrating effective transfer from human egocentric supervision to downstream robot control.

Delay-Aware Multi-Stage Edge Server Upgrade with Budget Constraint

Authors:Endar Suprih Wihidayat, Sieteng Soh, Kwan-Wu Chin, Duc-son Pham
Date:2025-12-18 17:25:55

In this paper, the Multi-stage Edge Server Upgrade (M-ESU) is proposed as a new network planning problem, involving the upgrading of an existing multi-access edge computing (MEC) system through multiple stages (e.g., over several years). More precisely, the problem considers two key decisions: (i) whether to deploy additional edge servers or upgrade those already installed, and (ii) how tasks should be offloaded so that the average number of tasks that meet their delay requirement is maximized. The framework specifically involves: (i) deployment of new servers combined with capacity upgrades for existing servers, and (ii) the optimal task offloading to maximize the average number of tasks with a delay requirement. It also considers the following constraints: (i) budget per stage, (ii) server deployment and upgrade cost (in $) and cost depreciation rate, (iii) computation resource of servers, (iv) number of tasks and their growth rate (in %), and (v) the increase in task sizes and stricter delay requirements over time. We present two solutions: a Mixed Integer Linear Programming (MILP) model and an efficient heuristic algorithm (M-ESU/H). MILP yields the optimal solution for small networks, whereas M-ESU/H is used in large-scale networks. For small networks, the simulation results show that the solution computed by M-ESU/H is within 1.25% of the optimal solution while running several orders of magnitude faster. For large networks, M-ESU/H is compared against three alternative heuristic solutions that consider only server deployment, or giving priority to server deployment or upgrade. Our experiments show that M-ESU/H yields up to 21.57% improvement in task satisfaction under identical budget and demand growth conditions, confirming its scalability and practical value for long-term MEC systems.

Discovering and Learning Probabilistic Models of Black-Box AI Capabilities

Authors:Daniel Bramblett, Rushang Karia, Adrian Ciotinga, Ruthvick Suresh, Pulkit Verma, YooJung Choi, Siddharth Srivastava
Date:2025-12-18 16:32:06

Black-box AI (BBAI) systems such as foundational models are increasingly being used for sequential decision making. To ensure that such systems are safe to operate and deploy, it is imperative to develop efficient methods that can provide a sound and interpretable representation of the BBAI's capabilities. This paper shows that PDDL-style representations can be used to efficiently learn and model an input BBAI's planning capabilities. It uses the Monte-Carlo tree search paradigm to systematically create test tasks, acquire data, and prune the hypothesis space of possible symbolic models. Learned models describe a BBAI's capabilities, the conditions under which they can be executed, and the possible outcomes of executing them along with their associated probabilities. Theoretical results show soundness, completeness and convergence of the learned models. Empirical results with multiple BBAI systems illustrate the scope, efficiency, and accuracy of the presented methods.

VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation

Authors:Yixiang Chen, Yan Huang, Keji He, Peiyan Li, Liang Wang
Date:2025-12-18 16:26:17

When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the VERM (Virtual Eye for Robotic Manipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving 1.89x speedup in training time and 1.54x speedup in inference speed. More results can be found on our project website at https://verm-ral.github.io .

Hypervelocity Impact Debris Cloud Trajectory-Planning based on Additive Manufactured Lattice Structures

Authors:Bilin Zheng, Xiao Kang, Xiaoyu Zhang, Hao Zhou, Mengchuan Xu, Chang Liu
Date:2025-12-18 14:52:26

Space debris and micrometeoroid (MMOD) impacts pose a serious threat to the safe operation of spacecraft. However, traditional protective structures typically suffer from limitations such as excessive thickness and inadequate load-bearing capacity. Guided by the design concepts of debris-cloud deflection and hierarchical energy dissipation, this study proposes a trajectory-planning lattice protective structure. First, the lattice parameters and geometry were designed according to the functional relationship between the incident angle and the transmitted/ricochet trajectory angles. Subsequently, multi-angle hypervelocity impact experiments were carried out to evaluate the proposed lattice protection structure. In combination with post-impact CT three-dimensional reconstruction and smoothed particle hydrodynamics (SPH) numerical simulations, the protective mechanisms of the lattice structure were systematically characterized and clarified. The results demonstrate that, for three oblique incidence conditions, the lattice structure remained intact and significantly deflected the debris-cloud momentum direction while effectively dissipating its kinetic energy. The angled plates with gradient designs enabled continuous changes in the momentum direction and stepwise kinetic energy dissipation through multiple cycles of debrisation, dispersion, and trajectory deflection. This research presents a novel, engineering-ready approach for spacecraft MMOD protection and validates the potential of trajectory-planning lattice structures for hypervelocity impact defense.

CRONOS: Continuous Time Reconstruction for 4D Medical Longitudinal Series

Authors:Nico Albert Disch, Saikat Roy, Constantin Ulrich, Yannick Kirchhoff, Maximilian Rokuss, Robin Peretzke, David Zimmerer, Klaus Maier-Hein
Date:2025-12-18 14:16:46

Forecasting how 3D medical scans evolve over time is important for disease progression, treatment planning, and developmental assessment. Yet existing models either rely on a single prior scan, fixed grid times, or target global labels, which limits voxel-level forecasting under irregular sampling. We present CRONOS, a unified framework for many-to-one prediction from multiple past scans that supports both discrete (grid-based) and continuous (real-valued) timestamps in one model, to the best of our knowledge the first to achieve continuous sequence-to-image forecasting for 3D medical data. CRONOS learns a spatio-temporal velocity field that transports context volumes toward a target volume at an arbitrary time, while operating directly in 3D voxel space. Across three public datasets spanning Cine-MRI, perfusion CT, and longitudinal MRI, CRONOS outperforms other baselines, while remaining computationally competitive. We will release code and evaluation protocols to enable reproducible, multi-dataset benchmarking of multi-context, continuous-time forecasting.

Algorithmic Monetary Policies for Blockchain Participation Games

Authors:Diodato Ferraioli, Paolo Penna, Manvir Schneider, Carmine Ventre
Date:2025-12-18 13:28:00

A central challenge in blockchain tokenomics is aligning short-term performance incentives with long-term decentralization goals. We propose a framework for algorithmic monetary policies that navigates this tradeoff in repeated participation games. Agents, characterized by type (capability) and stake, choose to participate or abstain at each round; the policy (probabilistically) selects high-type agents for task execution (maximizing throughput) while distributing rewards to sustain decentralization. We analyze equilibria under two agent behaviors: myopic (short-term utility maximization) and foresighted (multi-round planning). For myopic agents, performance-centric policies risk centralization, but foresight enables stable decentralization with some volatility to the token value. We further discuss virtual stake--a hybrid of type and stake--as an alternative approach. We show that the initial virtual stake distribution critically impacts long-term outcomes, suggesting that policies must indirectly manage decentralization.