planning - 2025-12-23

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models

Authors:Zixuan Ye, Quande Liu, Cong Wei, Yuanxing Zhang, Xintao Wang, Pengfei Wan, Kun Gai, Wenhan Luo
Date:2025-12-22 18:59:03

Recently, the introduction of Chain-of-Thought (CoT) has largely improved the generation ability of unified models. However, it is observed that the current thinking process during generation mainly focuses on the text consistency with the text prompt, ignoring the \textbf{visual context consistency} with the visual reference images during the multi-modal generation, e.g., multi-reference generation. The lack of such consistency results in the failure in maintaining key visual features (like human ID, object attribute, style). To this end, we integrate the visual context consistency into the reasoning of unified models, explicitly motivating the model to sustain such consistency by 1) Adaptive Visual Planning: generating structured visual check list to figure out the visual element of needed consistency keeping, and 2) Iterative Visual Correction: performing self-reflection with the guidance of check lists and refining the generated result in an iterative manner. To achieve this, we use supervised finetuning to teach the model how to plan the visual checking, conduct self-reflection and self-refinement, and use flow-GRPO to further enhance the visual consistency through a customized visual checking reward. The experiments show that our method outperforms both zero-shot unified models and those with text CoTs in multi-modal generation, demonstrating higher visual context consistency.

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Authors:Jiaqi Peng, Wenzhe Cai, Yuqiang Yang, Tai Wang, Yuan Shen, Jiangmiao Pang
Date:2025-12-22 18:03:08

Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the \href{https://steinate.github.io/logoplanner.github.io/}{project page}.

Results of the 2024 CommonRoad Motion Planning Competition for Autonomous Vehicles

Authors:Yanliang Huang, Xia Yan, Peiran Yin, Zhenduo Zhang, Zeyan Shao, Youran Wang, Haoliang Huang, Matthias Althoff
Date:2025-12-22 16:46:40

Over the past decade, a wide range of motion planning approaches for autonomous vehicles has been developed to handle increasingly complex traffic scenarios. However, these approaches are rarely compared on standardized benchmarks, limiting the assessment of relative strengths and weaknesses. To address this gap, we present the setup and results of the 4th CommonRoad Motion Planning Competition held in 2024, conducted using the CommonRoad benchmark suite. This annual competition provides an open-source and reproducible framework for benchmarking motion planning algorithms. The benchmark scenarios span highway and urban environments with diverse traffic participants, including passenger cars, buses, and bicycles. Planner performance is evaluated along four dimensions: efficiency, safety, comfort, and compliance with selected traffic rules. This report introduces the competition format and provides a comparison of representative high-performing planners from the 2023 and 2024 editions.

SlicerOrbitSurgerySim: An Open-Source Platform for Virtual Registration and Quantitative Comparison of Preformed Orbital Plates

Authors:Chi Zhang, Braedon Gunn, Andrew M. Read-Fuller
Date:2025-12-22 16:21:29

Poor adaptation of orbital implants remains a major contributor to postoperative complications and revision surgery. Although preformed orbital plates are widely used to reduce cost and operative time compared with customized implants, surgeons currently lack publicly available tools and standardized metrics to quantitatively compare plate fit across vendors, sizes, and patient anatomy. We developed SlicerOrbitSurgerySim, an open-source extension for the 3D Slicer platform that enables interactive virtual registration, evaluation, and comparison of multiple preformed orbital plates in a patient-specific virtual planning environment. The software generates reproducible quantitative plate-to-orbit distance metrics and visualization tools that support both patient-specific planning and population-level statistical analysis of plate adaptability. By facilitating objective comparison of implant designs and placement strategies, this tool aims to improve preoperative decision-making, reduce intraoperative plate modification, and promote collaborative research and surgical education. Pilot studies, sample datasets, and detailed tutorials are provided to support testing, transparency, and reproducibility.

Residential structure survivability to large wildfires in the United States

Authors:Mukesh Kumar, John T. Abatzoglou, Crystal A. Kolden, Mojtaba Sadegh
Date:2025-12-22 16:06:56

Wildfire impacts on US communities have escalated in recent decades, highlighting the need to better understand factors that influence wildfire outcomes. We find that 567,000 homes were exposed to wildfires across the contiguous US during 2001-2020, two-thirds of which occurred and increased five-fold in the Western US. While residential structure survivability - the percent of structures within a wildfire perimeter that survive the fire - remained stable in the Eastern US in the past two decades, it declined by 10% in the West. Survivability was explained by structural age, surrounding fuels, and fire weather. Survivability was 87% for homes built pre-1990 compared to 92% for post-1990 homes in the West. Survivability was lowest in forests compared to grasslands and shrublands. Finally, survivability was markedly lower for fires coincident with extreme fire weather. Our results suggest that modern building codes, fuel management, and proactive planning can strengthen wildfire resilience.

Spectral Shinkage of Gaussian Entropic Optimal Transport

Authors:Ho Yun
Date:2025-12-22 15:03:08

We present a functional calculus treatment of Entropic Optimal Transport (EOT) between Gaussian measures on separable Hilbert spaces, providing a unified framework that handles infinite-dimensional degeneracy. By leveraging the notion of proper alignment and the Schur complement, we reveal that the Gaussian EOT solution operates as a precise \textit{spectral shrinkage}: the optimal coupling is uniquely determined by contracting the spectrum of the correlation operator via a universal scalar function. This geometric insight facilitates an algorithmic shift from iterative fixed-point schemes (e.g., Sinkhorn) to direct algebraic computation, enabling efficient multi-scale analysis, where a single spectral decomposition allows for the exact evaluation of entropic costs across arbitrary regularization parameters $\varepsilon > 0$ at negligible additional cost. Furthermore, we investigate the asymptotic behavior as $\varepsilon \downarrow 0$ in settings where the unregularized Optimal Transport problem admits non-unique solutions. We establish a selection principle that the regularized limit converges to the most diffusive optimal coupling --characterized as the centroid of the convex set of optimal Kantorovich plans. This demonstrates that in degenerate regimes, the entropic limit systematically rejects deterministic Monge solutions (extremal points) in favor of the optimal solution with minimal Hilbert-Schmidt correlation, effectively filtering out spurious correlations in the null space. Finally, we derive stability bounds and convergence rates, recovering established parametric rates ($\varepsilon \log(1/\varepsilon)$) in finite dimensions while identifying distinct non-parametric rates dependent on spectral decay in infinite-dimensional settings.

Learning General Policies with Policy Gradient Methods

Authors:Simon Ståhlberg, Blai Bonet, Hector Geffner
Date:2025-12-22 13:08:58

While reinforcement learning methods have delivered remarkable results in a number of settings, generalization, i.e., the ability to produce policies that generalize in a reliable and systematic way, has remained a challenge. The problem of generalization has been addressed formally in classical planning where provable correct policies that generalize over all instances of a given domain have been learned using combinatorial methods. The aim of this work is to bring these two research threads together to illuminate the conditions under which (deep) reinforcement learning approaches, and in particular, policy optimization methods, can be used to learn policies that generalize like combinatorial methods do. We draw on lessons learned from previous combinatorial and deep learning approaches, and extend them in a convenient way. From the former, we model policies as state transition classifiers, as (ground) actions are not general and change from instance to instance. From the latter, we use graph neural networks (GNNs) adapted to deal with relational structures for representing value functions over planning states, and in our case, policies. With these ingredients in place, we find that actor-critic methods can be used to learn policies that generalize almost as well as those obtained using combinatorial approaches while avoiding the scalability bottleneck and the use of feature pools. Moreover, the limitations of the DRL methods on the benchmarks considered have little to do with deep learning or reinforcement learning algorithms, and result from the well-understood expressive limitations of GNNs, and the tradeoff between optimality and generalization (general policies cannot be optimal in some domains). Both of these limitations are addressed without changing the basic DRL methods by adding derived predicates and an alternative cost structure to optimize.

First-Order Representation Languages for Goal-Conditioned RL

Authors:Simon Ståhlberg, Hector Geffner
Date:2025-12-22 12:54:32

First-order relational languages have been used in MDP planning and reinforcement learning (RL) for two main purposes: specifying MDPs in compact form, and representing and learning policies that are general and not tied to specific instances or state spaces. In this work, we instead consider the use of first-order languages in goal-conditioned RL and generalized planning. The question is how to learn goal-conditioned and general policies when the training instances are large and the goal cannot be reached by random exploration alone. The technique of Hindsight Experience Replay (HER) provides an answer to this question: it relabels unsuccessful trajectories as successful ones by replacing the original goal with one that was actually achieved. If the target policy must generalize across states and goals, trajectories that do not reach the original goal states can enable more data- and time-efficient learning. In this work, we show that further performance gains can be achieved when states and goals are represented by sets of atoms. We consider three versions: goals as full states, goals as subsets of the original goals, and goals as lifted versions of these subgoals. The result is that the latter two successfully learn general policies on large planning instances with sparse rewards by automatically creating a curriculum of easier goals of increasing complexity. The experiments illustrate the computational gains of these versions, their limitations, and opportunities for addressing them.

Quantum and classical algorithms for daily railway rolling stock circulation plans

Authors:Ewa Kędziera, Wojciech Gamon, Mátyás Koniorczyk, Zakaria Mzaouali, Andrea Galadíková, Krzysztof Domino
Date:2025-12-22 12:36:20

We study daily rolling stock circulation planning for electric multiple units (EMUs) on a regional passenger network, focusing on services where identical EMUs may be coupled in pairs on selected routes. Motivated by the operational needs of the regional operator Silesian Railways in Poland, we formulate an acyclic mixed-integer linear program on a one-day horizon that incorporates depot balance constraints, demand-driven seat and bicycle capacity limits (which is a new aspect requested by the regional operator and local society of passengers), and simple crew availability constraints. The model is designed to support both baseline planning and disruption management under increased passenger demand. Using a graph-hypergraph representation of trips and single or coupled EMU movements, we first solve the problem with a classical ILP solver. We then derive a Quadratic Unconstrained Binary Optimization (QUBO) reformulation - which is frequently used as the input for quantum optimization - and evaluate its solution by quantum annealing on D-Wave Advantage systems and by the classical quantum-inspired VeloxQ solver. Computational experiments on real-world instances from the Silesian network, with up to 404 train trips and 11 EMU types, show that the ILP approach can obtain high-quality daily circulation plans within at most about 40 minutes, whereas current quantum and quantum-inspired solvers are restricted to substantially smaller sub-instances (up to 51 and 78 train trips, respectively) due to embedding and QUBO size limitations. These results quantify the present frontier of QUBO-based methods for rolling stock circulation and point towards hybrid decision-support architectures in which quantum or quantum-inspired optimizers address only local subproblems within a broader classical planning framework.

GANeXt: A Fully ConvNeXt-Enhanced Generative Adversarial Network for MRI- and CBCT-to-CT Synthesis

Authors:Siyuan Mei, Yan Xia, Fuxin Fan
Date:2025-12-22 12:32:16

The synthesis of computed tomography (CT) from magnetic resonance imaging (MRI) and cone-beam CT (CBCT) plays a critical role in clinical treatment planning by enabling accurate anatomical representation in adaptive radiotherapy. In this work, we propose GANeXt, a 3D patch-based, fully ConvNeXt-powered generative adversarial network for unified CT synthesis across different modalities and anatomical regions. Specifically, GANeXt employs an efficient U-shaped generator constructed from stacked 3D ConvNeXt blocks with compact convolution kernels, while the discriminator adopts a conditional PatchGAN. To improve synthesis quality, we incorporate a combination of loss functions, including mean absolute error (MAE), perceptual loss, segmentation-based masked MAE, and adversarial loss and a combination of Dice loss and cross-entropy for multi-head segmentation discriminator. For both tasks, training is performed with a batch size of 8 using two separate AdamW optimizers for the generator and discriminator, each equipped with a warmup and cosine decay scheduler, with learning rates of $5\times10^{-4}$ and $1\times10^{-3}$, respectively. Data preprocessing includes deformable registration, foreground cropping, percentile normalization for the input modality, and linear normalization of the CT to the range $[-1024, 1000]$. Data augmentation involves random zooming within $(0.8, 1.3)$ (for MRI-to-CT only), fixed-size cropping to $32\times160\times192$ for MRI-to-CT and $32\times128\times128$ for CBCT-to-CT, and random flipping. During inference, we apply a sliding-window approach with $0.8$ overlap and average folding to reconstruct the full-size sCT, followed by inversion of the CT normalization. After joint training on all regions without any fine-tuning, the final models are selected at the end of 3000 epochs for MRI-to-CT and 1000 epochs for CBCT-to-CT using the full training dataset.

Learning-Assisted Multi-Operator Variable Neighborhood Search for Urban Cable Routing

Authors:Wei Liu, Tao Zhang, Chenhui Lin, Kaiwen Li, Rui Wang
Date:2025-12-22 12:13:59

Urban underground cable construction is essential for enhancing the reliability of city power grids, yet its high construction costs make planning a worthwhile optimization task. In urban environments, road layouts tightly constrain cable routing. This, on the one hand, renders relation-only models (i.e., those without explicit routes) used in prior work overly simplistic, and on the other hand, dramatically enlarges the combinatorial search space, thereby imposing much higher demands on algorithm design. In this study, we formulate urban cable routing as a connectivity-path co-optimization problem and propose a learning-assisted multi-operator variable neighborhood search (L-MVNS) algorithm. The framework first introduces an auxiliary task to generate high-quality feasible initial solutions. A hybrid genetic search (HGS) and A* serve as the connectivity optimizer and the route-planning optimizer, respectively. Building on these, a multi-operator variable neighborhood search (MVNS) iteratively co-optimizes inter-substation connectivity and detailed routes via three complementary destruction operators, a modified A* repair operator, and an adaptive neighborhood-sizing mechanism. A multi-agent deep reinforcement learning module is further embedded to prioritize promising neighborhoods. We also construct a standardized and scalable benchmark suite for evaluation. Across these cases, comprehensive experiments demonstrate effectiveness and stability: relative to representative approaches, MVNS and L-MVNS reduce total construction cost by approximately 30-50%, with L-MVNS delivering additional gains on larger instances and consistently higher stability.

Optical Follow-Up Strategies for the Next Neutrino-Detected Galactic Core-Collapse Supernova

Authors:P. A. Duverne, W. K. Mouici, A. Coleiro, J. -G. Ducoin, M. W. Coughlin
Date:2025-12-22 11:44:56

Core-collapse supernovae (CCSNe) are expected to produce intense bursts of neutrinos preceding the emergence of their electromagnetic (EM) counterparts. The prompt detection of such neutrino signals offers a unique opportunity to trigger early follow-up observations in the EM domain. We aim to assess the feasibility and efficiency of an optical-NIR follow-up strategy for CCSNe discovered via neutrino bursts, by modelling the spatial distribution of events and simulating realistic observational campaigns taking into account the size of the localization error box generated by triangulating the neutrino burst. We modelled the Galactic distribution of CCSNe, including the effects of interstellar extinction, and considered three main progenitor types: Wolf-Rayet stars, red and blue supergiants. We included the shock breakout in the EM signatures that could be detected following the neutrino burst. A population of CCSNe was generated and detected by different networks of neutrino observatories, including IceCube, KM3NeT, Super-Kamiokande, Hyper-Kamiokande, and JUNO. The resulting skymaps were used as input for GWEMOPT to produce optimized follow-up plans with two optical facilities: LSST and the TAROT robotic telescopes. Both LSST and TAROT exhibit comparable detection efficiencies for the simulated CCSN population. However, the TAROT network achieves similar success rates while requiring fewer pointings to cover the CCSN skymap. Our simulations demonstrate that neutrino follow-up campaigns can effectively CCSN optical counterparts using both large and small facilities. Depending on the neutrino network, the median number of pointings for the two tested optical facilities is of the order of 20 to 100 to find the EM emission. The number of images is larger for LSST than for TAROT by a factor of 2 to 4.

Translating Flow to Policy via Hindsight Online Imitation

Authors:Yitian Zheng, Zhangchen Ye, Weijun Dong, Shengjie Wang, Yuyang Liu, Chongjie Zhang, Chuan Wen, Yang Gao
Date:2025-12-22 11:06:06

Recent advances in hierarchical robot systems leverage a high-level planner to propose task plans and a low-level policy to generate robot actions. This design allows training the planner on action-free or even non-robot data sources (e.g., videos), providing transferable high-level guidance. Nevertheless, grounding these high-level plans into executable actions remains challenging, especially with the limited availability of high-quality robot data. To this end, we propose to improve the low-level policy through online interactions. Specifically, our approach collects online rollouts, retrospectively annotates the corresponding high-level goals from achieved outcomes, and aggregates these hindsight-relabeled experiences to update a goal-conditioned imitation policy. Our method, Hindsight Flow-conditioned Online Imitation (HinFlow), instantiates this idea with 2D point flows as the high-level planner. Across diverse manipulation tasks in both simulation and physical world, our method achieves more than $2\times$ performance improvement over the base policy, significantly outperforming the existing methods. Moreover, our framework enables policy acquisition from planners trained on cross-embodiment video data, demonstrating its potential for scalable and transferable robot learning.

Vision-Language-Policy Model for Dynamic Robot Task Planning

Authors:Jin Wang, Kim Tien Ly, Jacques Cloete, Nikos Tsagarakis, Ioannis Havoutis
Date:2025-12-22 09:12:48

Bridging the gap between natural language commands and autonomous execution in unstructured environments remains an open challenge for robotics. This requires robots to perceive and reason over the current task scene through multiple modalities, and to plan their behaviors to achieve their intended goals. Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning, and cannot dynamically update task strategies when instructions change during execution, which ultimately limits their versatility and adaptability to new tasks. In this work, we propose a novel language model-based framework for dynamic robot task planning. Our Vision-Language-Policy (VLP) model, based on a vision-language model fine-tuned on real-world data, can interpret semantic instructions and integrate reasoning over the current task scene to generate behavior policies that control the robot to accomplish the task. Moreover, it can dynamically adjust the task strategy in response to changes in the task, enabling flexible adaptation to evolving task requirements. Experiments conducted with different robots and a variety of real-world tasks show that the trained model can efficiently adapt to novel scenarios and dynamically update its policy, demonstrating strong planning autonomy and cross-embodiment generalization. Videos: https://robovlp.github.io/

Optical design and characterization of a multi-depth vision simulator

Authors:Parviz Zolfaghari, Ehsan Varasteh, Koray Kavakli, Arda Gulersoy, Afsun Sahin, Hakan Urey
Date:2025-12-22 09:03:25

We present a vision simulator device (Katsim), a compact near-eye optical display designed for assessing postoperative corrected vision, preoperative intraocular lens (IOL) assessment, and objective IOL characterization. The system forms a virtual image using an amplitude-modulated LCoS spatial light modulator (AM-SLM), RGB LED illumination, and a high-speed varifocal lens. In the proposed architecture, the LED illumination and varifocal lens diopter changes are triggered in synchrony with the SLM RGB subframes, rendering three depth planes perceptually simultaneously via high-frequency time-multiplexing. Operating at 60 frames per second (fps), the system achieves an effective 180 Hz depth-coded cycle, enabling sharp, multi-depth rendering within a dynamically adjustable depth range from 0.2 m to optical infinity. The system's eyebox is configurable from 1 to 5 mm, while maintaining a fixed spatial location and preserving angular magnification regardless of changes in focus or eyebox size. The designed system features a 9.15-degree field of view. An integrated infrared pupil-tracking module detects non-cataractous regions of the cataractous crystalline lens, and the projected imagery is mechanically steered through those clear zones in real time. The proposed vision simulator supports both subjective simulation of post-surgical vision for patient-specific counseling and objective optical evaluation of IOLs, including resolution and contrast fidelity (e.g., modulation transfer function, contrast transfer function, and defocus curves). By decoupling depth modulation from eyebox position and size, the system offers a modular, portable platform that supports enhanced preoperative planning, personalized IOL selection, objective IOL characterization, and use as a novel research vision tool.

A Characterization of Law-Invariant and Coherent Risk Measures through Optimal Transport

Authors:Riccardo Bonalli, Benoît Bonnet-Weill, Laurent Pfeiffer
Date:2025-12-22 08:53:17

In this article, we propose a novel characterization of law-invariant and coherent risk measures, based on a generalized optimal transport problem in which the second marginal of the admissible plans is not fixed, but required to lie within a target set of probability measures. One of the main contributions of this work is a general representation formula for such risk measures, which is closely related to Kusuoka's theorem. When the aforementioned target set is convex, our representation result allows for the systematic derivation of general duality formulas. To illustrate our findings, we explicitly compute the target sets associated with several classical law-invariant coherent risk measures, including the prototypical conditional value at risk and higher moment measures.

AMap: Distilling Future Priors for Ahead-Aware Online HD Map Construction

Authors:Ruikai Li, Xinrun Li, Mengwei Xie, Hao Shan, Shoumeng Qiu, Xinyuan Chang, Yizhe Fan, Feng Xiong, Han Jiang, Yilong Ren, Haiyang Yu, Mu Xu, Yang Long, Varun Ojha, Zhiyong Cui
Date:2025-12-22 08:46:59

Online High-Definition (HD) map construction is pivotal for autonomous driving. While recent approaches leverage historical temporal fusion to improve performance, we identify a critical safety flaw in this paradigm: it is inherently ``spatially backward-looking." These methods predominantly enhance map reconstruction in traversed areas, offering minimal improvement for the unseen road ahead. Crucially, our analysis of downstream planning tasks reveals a severe asymmetry: while rearward perception errors are often tolerable, inaccuracies in the forward region directly precipitate hazardous driving maneuvers. To bridge this safety gap, we propose AMap, a novel framework for Ahead-aware online HD Mapping. We pioneer a ``distill-from-future" paradigm, where a teacher model with privileged access to future temporal contexts guides a lightweight student model restricted to the current frame. This process implicitly compresses prospective knowledge into the student model, endowing it with ``look-ahead" capabilities at zero inference-time cost. Technically, we introduce a Multi-Level BEV Distillation strategy with spatial masking and an Asymmetric Query Adaptation module to effectively transfer future-aware representations to the student's static queries. Extensive experiments on the nuScenes and Argoverse 2 benchmark demonstrate that AMap significantly enhances current-frame perception. Most notably, it outperforms state-of-the-art temporal models in critical forward regions while maintaining the efficiency of single current frame inference.

Assessing the impact of the electron ion collider in China on Deeply Virtual Compton Scattering

Authors:Yuan-Yuan Huang, Xu Cao, Taifu Feng, Krešimir Kumerički, Yu Lu
Date:2025-12-22 08:44:09

We assess the impact of future measurements of deeply virtual Compton scattering (DVCS) off protons using the planned detector at the Electron-Ion Collider in China (EicC), proposed as an upgrade to the High Intensity heavy-ion Accelerator Facility (HIAF). We develop a neural-network architecture to flexibly parameterize the Compton Form Factors (CFFs), extrapolate reliably into unmeasured kinematic regions, and provide robust uncertainty estimates through the replica method. The framework is fitted to the available worldwide DVCS data using the Gepard software. We find a significant reduction in the uncertainties of all CFFs after incorporating pseudo-data from single and double polarization asymmetries at the EicC, with particularly strong improvements in the sea-quark region.

WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

Authors:Pengxuan Yang, Ben Lu, Zhongpu Xia, Chao Han, Yinfeng Gao, Teng Zhang, Kun Zhan, XianPeng Lang, Yupeng Zheng, Qichao Zhang
Date:2025-12-22 08:27:44

Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% -> 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).

Nuclear Physics Mid Term Plan at LNGS

Authors:R. Buompane, F. Cavanna, C. Curceanu, A. D'Onofrio, A. Di Leva, A. Formicola, L. Gialanella, C. Gustavino, G. Imbriani, M. Junker, A. Marcianò, F. Marzaioli, R. Nania, F. Napolitano, K. Piscicchia, O. Straniero, C. Abia, M. Aliotta, D. Bemmerer, A. Best, A. Boeltzig, C. Bruno, A. Caciolli, A. Chieffi, G. Ciani, G. D'Agata, R. J. deBoer, M. De Cesare, D. Dell'Aquila, R. Depalo, I. Dominguez, F. Ferraro, J. Garcia Duarte, A. Guglielmetti, Gy. Gyürky, S. Hayakawa, M. La Cognata, L. Lamia, L. E. Marcucci, E. Masha, M. Mazzocco, E. L. Morales-Gallegos, S. Palmerini, I. Passariello, A. Petraglia, D. Piatti, M. Pignatari, R. G. Pizzone, G. Porzio, D. Rapagnani, G. G. Rapisarda, S. Romano, M. Rubino, C. Santonastaso, M. L. Sergi, J. Skowronski, R. Spartà, F. Terrasi, A. Tumino, S. Turkat, M. Wiescher, S. Zavatarelli
Date:2025-12-22 05:51:42

The Istituto Nazionale di Fisica Nucleare-Laboratori Nazionali del Gran Sasso (LNGS) is one of the largest underground physics laboratory, a very peculiar environment suited for experiments in Astroparticle Physics, Nuclear Physics and Fundamental Symmetries. The newly established Bellotti Ion Beam facility represents a major advance in the possibilities of studying nuclear processes in an underground environment. A workshop was organized at LNGS in the framework of the Nuclear Physics Mid Term Plan in Italy, an initiative of the Nuclear Physics Division of the Instituto Nazionale di Fisica Nucleare to discuss the opportunities that will be possible to study in the near future by employing state-of-the-art detection systems. In this report, a detailed discussion of the outcome of the workshop is presented.

Decoupled Generative Modeling for Human-Object Interaction Synthesis

Authors:Hwanhee Jung, Seunggwan Lee, Jeongyoon Yoon, SeungHyeon Kim, Giljoo Nam, Qixing Huang, Sangpil Kim
Date:2025-12-22 05:33:59

Synthesizing realistic human-object interaction (HOI) is essential for 3D computer vision and robotics, underpinning animation and embodied control. Existing approaches often require manually specified intermediate waypoints and place all optimization objectives on a single network, which increases complexity, reduces flexibility, and leads to errors such as unsynchronized human and object motion or penetration. To address these issues, we propose Decoupled Generative Modeling for Human-Object Interaction Synthesis (DecHOI), which separates path planning and action synthesis. A trajectory generator first produces human and object trajectories without prescribed waypoints, and an action generator conditions on these paths to synthesize detailed motions. To further improve contact realism, we employ adversarial training with a discriminator that focuses on the dynamics of distal joints. The framework also models a moving counterpart and supports responsive, long-sequence planning in dynamic scenes, while preserving plan consistency. Across two benchmarks, FullBodyManipulation and 3D-FUTURE, DecHOI surpasses prior methods on most quantitative metrics and qualitative evaluations, and perceptual studies likewise prefer our results.

Backward Growth Accounting: An Economic Tool for Strategic Planning of Business Growth

Authors:Ali Zeytoon-Nejad
Date:2025-12-22 05:01:18

Business growth is a goal of great importance for its both private and social benefits. Many firms view business growth as an imperative for their survival, stability, and long-term success. Business growth can be socially beneficial, too, as it enables businesses to expand into new territories where they can stimulate economic growth and development, creates more jobs, increase living standards, and better serve their communities by giving back more through Corporate Social Responsibility initiatives. Business growth must be planned reasonably and optimally so that it can effectively achieve its critical ambitions in business practice. The current common practices for planning the supply side of business growth are usually ad-hoc and lack well-established mathematical and economic foundations. The present paper argues that business growth planning can be pursued more structurally, reliably, and meaningfully within the framework of Growth Accounting (GA), which was first introduced by Economics Nobel Laureate Robert Solow to study economic growth. It is shown that, although GA was initially put forth as a procedure to explain "economic growth" ex-post, it can similarly be used to plan "business growth" ex-ante when a general backward approach is taken in its procedure-called Backward Growth Accounting (BGA) in this paper. Taking this well-established economic-mathematical approach to planning business growth will enhance the current practices conceptually and structurally, as it is built on the basis of economic logic and mathematical tools. BGA can help businesses identify and plan for key drivers of output growth and assess shortcomings in the growth process, such as poor productivity, inadequate labor utilization, or insufficient capital investment. The paper outlines an eight-step procedure for planning business growth using BGA and includes appendices with real-world examples.

IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments

Authors:Xu Liu, Yu Liu, Hanshuo Qiu, Yang Qirong, Zhouhui Lian
Date:2025-12-22 04:42:35

Vision-Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground-based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV-based VLN remains underexplored, despite its relevance to real-world applications such as inspection, delivery, and search-and-rescue in confined spaces. To bridge this gap, we introduce \textbf{IndoorUAV}, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation pipeline to generate natural language instructions of varying granularity for each trajectory. This process yields over 16,000 high-quality trajectories, comprising the \textbf{IndoorUAV-VLN} subset, which focuses on long-horizon VLN. To support short-horizon planning, we segment long trajectories into sub-trajectories by selecting semantically salient keyframes and regenerating concise instructions, forming the \textbf{IndoorUAV-VLA} subset. Finally, we introduce \textbf{IndoorUAV-Agent}, a novel navigation model designed for our benchmark, leveraging task decomposition and multimodal reasoning. We hope IndoorUAV serves as a valuable resource to advance research on vision-language embodied AI in the indoor aerial navigation domain.

Towards AI-Guided Open-World Ecological Taxonomic Classification

Authors:Cheng Yaw Low, Heejoon Koo, Jaewoo Park, Kaleb Mesfin Asfaw, Meeyoung Cha
Date:2025-12-22 03:20:05

AI-guided classification of ecological families, genera, and species underpins global sustainability efforts such as biodiversity monitoring, conservation planning, and policy-making. Progress toward this goal is hindered by long-tailed taxonomic distributions from class imbalance, along with fine-grained taxonomic variations, test-time spatiotemporal domain shifts, and closed-set assumptions that can only recognize previously seen taxa. We introduce the Open-World Ecological Taxonomy Classification, a unified framework that captures the co-occurrence of these challenges in realistic ecological settings. To address them, we propose TaxoNet, an embedding-based encoder with a dual-margin penalization loss that strengthens learning signals from rare underrepresented taxa while mitigating the dominance of overrepresented ones, directly confronting interrelated challenges. We evaluate our method on diverse ecological domains: Google Auto-Arborist (urban trees), iNat-Plantae (Plantae observations from various ecosystems in iNaturalist-2019), and NAFlora-Mini (a curated herbarium collection). Our model consistently outperforms baselines, particularly for rare taxa, establishing a strong foundation for open-world plant taxonomic monitoring. Our findings further show that general-purpose multimodal foundation models remain constrained in plant-domain applications.

DTCCL: Disengagement-Triggered Contrastive Continual Learning for Autonomous Bus Planners

Authors:Yanding Yang, Weitao Zhou, Jinhai Wang, Xiaomin Guo, Junze Wen, Xiaolong Liu, Lang Ding, Zheng Fu, Jinyu Miao, Kun Jiang, Diange Yang
Date:2025-12-22 02:59:37

Autonomous buses run on fixed routes but must operate in open, dynamic urban environments. Disengagement events on these routes are often geographically concentrated and typically arise from planner failures in highly interactive regions. Such policy-level failures are difficult to correct using conventional imitation learning, which easily overfits to sparse disengagement data. To address this issue, this paper presents a Disengagement-Triggered Contrastive Continual Learning (DTCCL) framework that enables autonomous buses to improve planning policies through real-world operation. Each disengagement triggers cloud-based data augmentation that generates positive and negative samples by perturbing surrounding agents while preserving route context. Contrastive learning refines policy representations to better distinguish safe and unsafe behaviors, and continual updates are applied in a cloud-edge loop without human supervision. Experiments on urban bus routes demonstrate that DTCCL improves overall planning performance by 48.6 percent compared with direct retraining, validating its effectiveness for scalable, closed-loop policy improvement in autonomous public transport.

Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Authors:Tianyuan Liu, Richard Dazeley, Benjamin Champion, Akan Cosgun
Date:2025-12-21 23:47:09

In this paper, we study whether inexpensive, physics-free supervision can reliably prioritize grasp-place candidates for budget-aware pick-and-place. From an object's initial pose, target pose, and a candidate grasp, we generate two path-aware geometric labels: path-wise inverse kinematics (IK) feasibility across a fixed approach-grasp-lift waypoint template, and a transit collision flag from mesh sweeps along the same template. A compact dual-output MLP learns these signals from pose encodings, and at test time its scores rank precomputed candidates for a rank-then-plan policy under the same IK gate and planner as the baseline. Although learned from cheap labels only, the scores transfer to physics-enabled executed trajectories: at a fixed planning budget the policy finds successful paths sooner with fewer planner calls while keeping final success on par or better. This work targets a single rigid cuboid with side-face grasps and a fixed waypoint template, and we outline extensions to varied objects and richer waypoint schemes.

A physics-informed, plug-and-play dose engine for gradient-based radiotherapy treatment planning

Authors:Attila Simkó, Matthias Kronsteiner, Simon Glatzer, Minh Vu, Josef A. Lundman, Joakim Jonsson, Jörgen Olofsson, Kristina Sandgren, Wolfgang Lechner, Dietmar Georg, Tommy Löfstedt, Tufve Nyholm, Anders Garpebring, Gerd Heilemann
Date:2025-12-21 19:36:09

Radiotherapy treatment planning remains a time-intensive iterative process requiring expert intervention in commercial treatment planning system (TPS). While machine learning approaches have demonstrated promise, most remain depedent on TPS-based dose calculation or surrogate dose models, preventing direct optimization of deliverable treatment plan parameters. We propose PyDoseRT (PDRT), a physics-informed, GPU-accelerated dose engine implemented in PyTorch that computes dose distributions directly from treatment delivery parameters (i.e., MLC leaf positions, jaw positions, gantry angles, and monitor units). The engine preserves gradient information throughout the dose computation pipeline, enabling gradient-based optimization of hardware-constrained treatment plans without the reliance on a commercial TPS. PDRT was evaluated on 19 and 162 clinical VMAT prostate cancer plans from two hospitals (with different treatment machines). When recalculating clinical plans, PDRT achieved high 3D gamma pass rates (mean 96.8% for 2%/2 mm and 98.9% for 3%/3 mm, depending on cohort). All optimized plans converged to clinically acceptable solutions and passed deliverability verification when imported into a commercial TPS. This physics-informed framework eliminates TPS dependency for radiotherapy optimization research by enabling gradient-based planning while ensuring that delivery parameters remain in the machine-feasible range. The gradient-enabled dose engine allows exploration of novel optimization strategies and objective functions while maintaining clinical validity. The proposed approach provides a research platform for investigating real-time adaptive radiotherapy concepts, automated planning workflows, and TPS-independent optimization strategies, and democratizing radiotherapy research, by exposing gradient-enabled, hardware-aware, open-source dose computation.

Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes

Authors:Jingjia Teng, Yang Li, Jianqiang Wang, Yingbai Hu, Songyuan Tang, Manjiang Hu
Date:2025-12-21 17:45:57

Four-wheel Independent Steering (4WIS) vehicles have attracted increasing attention for their superior maneuverability. Human drivers typically choose to cross or drive over the low-profile obstacles (e.g., plastic bags) to efficiently navigate through narrow spaces, while existing planners neglect obstacle attributes, causing inefficiency or path-finding failures. To address this, we propose a trajectory planning framework integrating the 4WIS hybrid A* and Optimal Control Problem (OCP), in which the hybrid A* provides an initial path to enhance the OCP solution. Specifically, a multimodal classification network is introduced to assess scene complexity (hard/easy task) by fusing image and vehicle state data. For hard tasks, guided points are set to decompose complex tasks into local subtasks, improving the search efficiency of 4WIS hybrid A*. The multiple steering modes of 4WIS vehicles (Ackermann, diagonal, and zero-turn) are also incorporated into node expansion and heuristic designs. Moreover, a hierarchical obstacle handling strategy is designed to guide the node expansion considering obstacle attributes, i.e., 'non-traversable', 'crossable', and 'drive-over' obstacles. It allows crossing or driving over obstacles instead of the 'avoid-only' strategy, greatly enhancing success rates of pathfinding. We also design a logical constraint for the 'drive-over' obstacle by limiting its velocity to ensure safety. Furthermore, to address dynamic obstacles with motion uncertainty, we introduce a probabilistic risk field model, constructing risk-aware driving corridors that serve as linear collision constraints in OCP. Experimental results demonstrate the proposed framework's effectiveness in generating safe, efficient, and smooth trajectories for 4WIS vehicles, especially in constrained environments.

CauTraj: A Causal-Knowledge-Guided Framework for Lane-Changing Trajectory Planning of Autonomous Vehicles

Authors:Cailin Lei, Haiyang Wu, Yuxiong Ji, Xiaoyu Cai, Yuchuan Du
Date:2025-12-21 11:44:32

Enhancing the performance of trajectory planners for lane - changing vehicles is one of the key challenges in autonomous driving within human - machine mixed traffic. Most existing studies have not incorporated human drivers' prior knowledge when designing trajectory planning models. To address this issue, this study proposes a novel trajectory planning framework that integrates causal prior knowledge into the control process. Both longitudinal and lateral microscopic behaviors of vehicles are modeled to quantify interaction risk, and a staged causal graph is constructed to capture causal dependencies in lane-changing scenarios. Causal effects between the lane-changing vehicle and surrounding vehicles are then estimated using causal inference, including average causal effects (ATE) and conditional average treatment effects (CATE). These causal priors are embedded into a model predictive control (MPC) framework to enhance trajectory planning. The proposed approach is validated on naturalistic vehicle trajectory datasets. Experimental results show that: (1) causal inference provides interpretable and stable quantification of vehicle interactions; (2) individual causal effects reveal driver heterogeneity; and (3) compared with the baseline MPC, the proposed method achieves a closer alignment with human driving behaviors, reducing maximum trajectory deviation from 1.2 m to 0.2 m, lateral velocity fluctuation by 60%, and yaw angle variability by 50%. These findings provide methodological support for human-like trajectory planning and practical value for improving safety, stability, and realism in autonomous vehicle testing and traffic simulation platforms.

Assignment-Routing Optimization: Solvers for Problems Under Constraints

Authors:Yuan Qilong, Michal Pavelka
Date:2025-12-21 06:32:31

We study the Joint Routing-Assignment (JRA) problem in which items must be assigned one-to-one to placeholders while simultaneously determining a Hamiltonian cycle visiting all nodes exactly once. Extending previous exact MIP solvers with Gurobi and cutting-plane subtour elimination, we develop a solver tailored for practical packaging-planning scenarios with richer constraints.These include multiple placeholder options, time-frame restrictions, and multi-class item packaging. Experiments on 46 mobile manipulation datasets demonstrate that the proposed MIP approach achieves global optima with stable and low computation times, significantly outperforming the shaking-based exact solver by up to an orders of magnitude. Compared to greedy baselines, the MIP solutions achieve consistent optimal distances with an average deviation of 14% for simple heuristics, confirming both efficiency and solution quality. The results highlight the practical applicability of MIP-based JRA optimization for robotic packaging, motion planning, and complex logistics .