planning - 2025-12-14

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Authors:Wendi Chen, Han Xue, Yi Wang, Fangyuan Zhou, Jun Lv, Yang Jin, Shirun Tang, Chuan Wen, Cewu Lu
Date:2025-12-11 18:59:46

Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapid, high-frequency local contact dynamics. Integrating these signals is challenging due to their fundamental frequency and informational disparities. In this work, we propose ImplicitRDP, a unified end-to-end visual-force diffusion policy that integrates visual planning and reactive force control within a single network. We introduce Structural Slow-Fast Learning, a mechanism utilizing causal attention to simultaneously process asynchronous visual and force tokens, allowing the policy to perform closed-loop adjustments at the force frequency while maintaining the temporal coherence of action chunks. Furthermore, to mitigate modality collapse where end-to-end models fail to adjust the weights across different modalities, we propose Virtual-target-based Representation Regularization. This auxiliary objective maps force feedback into the same space as the action, providing a stronger, physics-grounded learning signal than raw force prediction. Extensive experiments on contact-rich tasks demonstrate that ImplicitRDP significantly outperforms both vision-only and hierarchical baselines, achieving superior reactivity and success rates with a streamlined training pipeline. Code and videos will be publicly available at https://implicit-rdp.github.io.

A vision for ground-based astronomy beyond the 2030s: How to build ESO's next big telescope sustainably

Authors:Laurane Fréour, Mathilde Bouvier, Tony Mroczkowski, Callie Clontz, Fatemeh Zahra Majidi, Vasundhara Shaw, Olivier Absil, Anna Cabré, Olivier Lai, Dylan Magill, Jake D. Turner
Date:2025-12-11 18:31:04

Astronomy is the study of the Universe and all the objects that it comprises. Our attention is therefore usually focused beyond Earth, home to the only form of life known today. However, how can we continue to explore the secrets of the Universe, if we stand by and watch our only home burn? We know that there is no Planet B. It is therefore urgent that, as astronomers, we collectively work to protect the Earth, allowing future generations the opportunity to continue to uncover the secrets of the cosmos. As astronomical facilities account for the majority of our community's carbon footprint, we propose guidelines that we hold crucial for the European Southern Observatory (ESO) to consider in the context of the Expanding Horizons programme as it plans a next-generation, transformational facility.

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Authors:Jingli Lin, Runsen Xu, Shaohao Zhu, Sihan Yang, Peizhou Cao, Yunlong Ran, Miao Hu, Chenming Zhu, Yiman Xie, Yilin Long, Wenbo Hu, Dahua Lin, Tai Wang, Jiangmiao Pang
Date:2025-12-11 17:57:24

Spatial understanding over continuous visual input is crucial for MLLMs to evolve into general-purpose assistants in physical environments. Yet there is still no comprehensive benchmark that holistically assesses the progress toward this goal. In this work, we introduce MMSI-Video-Bench, a fully human-annotated benchmark for video-based spatial intelligence in MLLMs. It operationalizes a four-level framework, Perception, Planning, Prediction, and Cross-Video Reasoning, through 1,106 questions grounded in 1,278 clips from 25 datasets and in-house videos. Each item is carefully designed and reviewed by 3DV experts with explanatory rationales to ensure precise, unambiguous grounding. Leveraging its diverse data sources and holistic task coverage, MMSI-Video-Bench also supports three domain-oriented sub-benchmarks (Indoor Scene Perception Bench, Robot Bench and Grounding Bench) for targeted capability assessment. We evaluate 25 strong open-source and proprietary MLLMs, revealing a striking human--AI gap: many models perform near chance, and the best reasoning model lags humans by nearly 60%. We further find that spatially fine-tuned models still fail to generalize effectively on our benchmark. Fine-grained error analysis exposes systematic failures in geometric reasoning, motion grounding, long-horizon prediction, and cross-video correspondence. We also show that typical frame-sampling strategies transfer poorly to our reasoning-intensive benchmark, and that neither 3D spatial cues nor chain-of-thought prompting yields meaningful gains. We expect our benchmark to establish a solid testbed for advancing video-based spatial intelligence.

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Authors:Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, Andreas Zell
Date:2025-12-11 14:59:07

End-to-end autonomous driving methods built on vision language models (VLMs) have undergone rapid development driven by their universal visual understanding and strong reasoning capabilities obtained from the large-scale pretraining. However, we find that current VLMs struggle to understand fine-grained 3D spatial relationships which is a fundamental requirement for systems interacting with the physical world. To address this issue, we propose SpaceDrive, a spatial-aware VLM-based driving framework that treats spatial information as explicit positional encodings (PEs) instead of textual digit tokens, enabling joint reasoning over semantic and spatial representations. SpaceDrive employs a universal positional encoder to all 3D coordinates derived from multi-view depth estimation, historical ego-states, and text prompts. These 3D PEs are first superimposed to augment the corresponding 2D visual tokens. Meanwhile, they serve as a task-agnostic coordinate representation, replacing the digit-wise numerical tokens as both inputs and outputs for the VLM. This mechanism enables the model to better index specific visual semantics in spatial reasoning and directly regress trajectory coordinates rather than generating digit-by-digit, thereby enhancing planning accuracy. Extensive experiments validate that SpaceDrive achieves state-of-the-art open-loop performance on the nuScenes dataset and the second-best Driving Score of 78.02 on the Bench2Drive closed-loop benchmark over existing VLM-based methods.

COMPARE: Clinical Optimization with Modular Planning and Assessment via RAG-Enhanced AI-OCT: Superior Decision Support for Percutaneous Coronary Intervention Compared to ChatGPT-5 and Junior Operators

Authors:Wei Fang, Chiyao Wang, Wenshuai Ma, Hui Liu, Jianqiang Hu, Xiaona Niu, Yi Chu, Mingming Zhang, Jingxiao Yang, Dongwei Zhang, Zelin Li, Pengyun Liu, Jiawei Zheng, Pengke Zhang, Chaoshi Qin, Wangang Guo, Bin Wang, Yugang Xue, Wei Zhang, Zikuan Wang, Rui Zhu, Yihui Cao, Quanmao Lu, Rui Meng, Yan Li
Date:2025-12-11 14:41:37

Background: While intravascular imaging, particularly optical coherence tomography (OCT), improves percutaneous coronary intervention (PCI) outcomes, its interpretation is operator-dependent. General-purpose artificial intelligence (AI) shows promise but lacks domain-specific reliability. We evaluated the performance of CA-GPT, a novel large model deployed on an AI-OCT system, against that of the general-purpose ChatGPT-5 and junior physicians for OCT-guided PCI planning and assessment. Methods: In this single-center analysis of 96 patients who underwent OCT-guided PCI, the procedural decisions generated by the CA-GPT, ChatGPT-5, and junior physicians were compared with an expert-derived procedural record. Agreement was assessed using ten pre-specified metrics across pre-PCI and post-PCI phases. Results: For pre-PCI planning, CA-GPT demonstrated significantly higher median agreement scores (5[IQR 3.75-5]) compared to both ChatGPT-5 (3[2-4], P<0.001) and junior physicians (4[3-4], P<0.001). CA-GPT significantly outperformed ChatGPT-5 across all individual pre-PCI metrics and showed superior performance to junior physicians in stent diameter (90.3% vs. 72.2%, P<0.05) and length selection (80.6% vs. 52.8%, P<0.01). In post-PCI assessment, CA-GPT maintained excellent overall agreement (5[4.75-5]), significantly higher than both ChatGPT-5 (4[4-5], P<0.001) and junior physicians (5[4-5], P<0.05). Subgroup analysis confirmed CA-GPT's robust performance advantage in complex scenarios. Conclusion: The CA-GPT-based AI-OCT system achieved superior decision-making agreement versus a general-purpose large language model and junior physicians across both PCI planning and assessment phases. This approach provides a standardized and reliable method for intravascular imaging interpretation, demonstrating significant potential to augment operator expertise and optimize OCT-guided PCI.

nDspec: a new Python library for modelling multi-dimensional datasets in X-ray astronomy

Authors:Matteo Lucchini, Benjamin Ricketts, Phil Uttley, Daniela Huppenkothen
Date:2025-12-11 13:12:18

The current fleet of X-ray telescopes produces a wealth of multi-dimensional data, allowing us to study sources in time, photon energy and polarization. At the same time, it has become increasingly clear that progress in our physical understanding will only come from studying these sources in multiple dimensions simultaneously. Enabling multi-dimensional studies of X-ray sources requires new theoretical models predicting these data sets, new methods to analyse them and a software framework to combine data, models and methods efficiently. In this paper, we introduce the alpha release of nDspec, a new python-based library designed to allow users to model one- and multi-dimensional datasets common to X-ray astronomy. In the alpha release, we focus on modelling time-averaged data as well as Fourier spectral-timing mode, but highlight how additional dimensions can be added. We discuss design philosophy and current features, and showcase an example use case by characterizing a NICER observation of a black hole X-ray binary. We also highlight current plans for extensions to other dimensions and new features.

Motion Planning for Safe Landing of a Human-Piloted Parafoil

Authors:Maximillian Fainkich, Kiril Solovey, Anna Clarke
Date:2025-12-11 12:39:48

Most skydiving accidents occur during the parafoil-piloting and landing stages and result from human lapses in judgment while piloting the parafoil. Training of novice pilots is protracted due to the lack of functional and easily accessible training simulators. Moreover, work on parafoil trajectory planning suitable for aiding human training remains limited. To bridge this gap, we study the problem of computing safe trajectories for human-piloted parafoil flight and examine how such trajectories fare against human-generated solutions. For the algorithmic part, we adapt the sampling-based motion planner Stable Sparse RRT (SST) by Li et al., to cope with the problem constraints while minimizing the bank angle (control effort) as a proxy for safety. We then compare the computer-generated solutions with data from human-generated parafoil flight, where the algorithm offers a relative cost improvement of 20\%-80\% over the performance of the human pilot. We observe that human pilots tend to, first, close the horizontal distance to the landing area, and then address the vertical gap by spiraling down to the suitable altitude for starting a landing maneuver. The algorithm considered here makes smoother and more gradual descents, arriving at the landing area at the precise altitude necessary for the final approach while maintaining safety constraints. Overall, the study demonstrates the potential of computer-generated guidelines, rather than traditional rules of thumb, which can be integrated into future simulators to train pilots for safer and more cost-effective flights.

Reconstructing Transportation Cost Planning Theory: A Multi-Layered Framework Integrating Stepwise Functions, AI-Driven Dynamic Pricing, and Sustainable Autonomy

Authors:Samuel Darwisman
Date:2025-12-11 10:15:00

The theoretical landscape of transportation cost planning is shifting from deterministic linear models to dynamic, data-driven optimization. As supply chains face volatility, static 20th-century cost assumptions prove increasingly inadequate. Despite rapid technological advancements, a unified framework linking economic production theory with the operational realities of autonomous, sustainable logistics remains absent. Existing models fail to address non-linear stepwise costs and real-time stochastic variables introduced by market dynamics. This study reconstructs transportation cost planning theory by synthesizing Grand, Middle-Range, and Applied theories. It aims to integrate stepwise cost functions, AI-driven decision-making, and environmental externalities into a cohesive planning model. A systematic theoretical synthesis was conducted using 28 high-impact papers published primarily between 2018 and 2025, employing multi-layered analysis to reconstruct cost drivers. The study identifies three critical shifts: the transition from linear to stepwise fixed costs, the necessity of AI-driven dynamic pricing for revenue optimization, and the role of Autonomous Electric Vehicles (AEVs) in minimizing long-term marginal costs. A "Dynamic-Sustainable Cost Planning Theory" is proposed, arguing that cost efficiency now depends on algorithmic prediction and autonomous fleet utilization rather than simple distance minimization.

T-SKM-Net: Trainable Neural Network Framework for Linear Constraint Satisfaction via Sampling Kaczmarz-Motzkin Method

Authors:Haoyu Zhu, Yao Zhang, Jiashen Ren, Qingchun Hou
Date:2025-12-11 09:35:13

Neural network constraint satisfaction is crucial for safety-critical applications such as power system optimization, robotic path planning, and autonomous driving. However, existing constraint satisfaction methods face efficiency-applicability trade-offs, with hard constraint methods suffering from either high computational complexity or restrictive assumptions on constraint structures. The Sampling Kaczmarz-Motzkin (SKM) method is a randomized iterative algorithm for solving large-scale linear inequality systems with favorable convergence properties, but its argmax operations introduce non-differentiability, posing challenges for neural network applications. This work proposes the Trainable Sampling Kaczmarz-Motzkin Network (T-SKM-Net) framework and, for the first time, systematically integrates SKM-type methods into neural network constraint satisfaction. The framework transforms mixed constraint problems into pure inequality problems through null space transformation, employs SKM for iterative solving, and maps solutions back to the original constraint space, efficiently handling both equality and inequality constraints. We provide theoretical proof of post-processing effectiveness in expectation and end-to-end trainability guarantees based on unbiased gradient estimators, demonstrating that despite non-differentiable operations, the framework supports standard backpropagation. On the DCOPF case118 benchmark, our method achieves 4.27ms/item GPU serial forward inference with 0.0025% max optimality gap with post-processing mode and 5.25ms/item with 0.0008% max optimality gap with joint training mode, delivering over 25$\times$ speedup compared to the pandapower solver while maintaining zero constraint violations under given tolerance.

Robust Crop Planning under Uncertainty: Aligning Economic Optimality with Agronomic Sustainability

Authors:Runhao Liu, Ziming Chen, You Li, Peng Zhang
Date:2025-12-11 08:04:57

Long-horizon agricultural planning requires optimizing crop allocation under complex spatial heterogeneity, temporal agronomic dependencies, and multi-source environmental uncertainty. Existing approaches often treat crop interactions, such as legume-cereal complementarity, which implicitly or rely on static deterministic formulations that fail to guarantee resilience against market and climate volatility. To address these challenges, we propose a Multi-Layer Robust Crop Planning Framework (MLRCPF) that integrates spatial reasoning, temporal dynamics, and robust optimization. Specifically, we formalize crop-to-crop relationships through a structured interaction matrix embedded within the state-transition logic, and employ a distributionally robust optimization layer to mitigate worst-case risks defined by a data-driven ambiguity set. Evaluations on a real-world high-mix farming dataset from North China demonstrate the effectiveness of the proposed approach. The framework autonomously generates sustainable checkerboard rotation patterns that restore soil fertility, significantly increasing the legume planting ratio compared to deterministic baselines. Economically, it successfully resolves the trade-off between optimality and stability. These results highlight the importance of explicitly encoding domain-specific structural priors into optimization models for resilient decision-making in complex agricultural systems.

Integrated Planning and Machine-Level Scheduling for High-Mix Discrete Manufacturing: A Profit-Driven Heuristic Framework

Authors:Runhao Liu, Ziming Chen, You Li, Zequn Xie, Peng Zhang
Date:2025-12-11 07:17:29

Modern manufacturing enterprises struggle to create efficient and reliable production schedules under multi-variety, small-batch, and rush-order conditions. High-mix discrete manufacturing systems require jointly optimizing mid-term production planning and machine-level scheduling under heterogeneous resources and stringent delivery commitments. We address this problem with a profit-driven integrated framework that couples a mixed-integer planning model with a machine-level scheduling heuristic. The planning layer allocates production, accessory co-production, and outsourcing under aggregate economic and capacity constraints, while the scheduling layer refines these allocations using a structure-aware procedure that enforces execution feasibility and stabilizes daily machine behavior. This hierarchical design preserves the tractability of aggregated optimization while capturing detailed operational restrictions. Evaluations are conducted on a real industrial scenario. A flexible machine-level execution scheme yields 73.3% on-time completion and significant outsourcing demand, revealing bottleneck congestion. In contrast, a stability-enforcing execution policy achieves 100% on-time completion, eliminates all outsourcing, and maintains balanced machine utilization with only 1.9 to 4.6% capacity loss from changeovers. These results show that aligning planning decisions with stability-oriented execution rules enables practical and interpretable profit-maximizing decisions in complex manufacturing environments.

CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

Authors:Shresth Grover, Priyank Pathak, Akash Kumar, Vibhav Vineet, Yogesh S Rawat
Date:2025-12-11 06:46:51

Large-scale Vision-Language Models (VLMs) exhibit impressive complex reasoning capabilities but remain largely unexplored in visual sequential planning, i.e., executing multi-step actions towards a goal. Additionally, practical sequential planning often involves non-optimal (erroneous) steps, challenging VLMs to detect and correct such steps. We propose Corrective Sequential Planning Benchmark (CoSPlan) to evaluate VLMs in error-prone, vision-based sequential planning tasks across 4 domains: maze navigation, block rearrangement, image reconstruction,and object reorganization. CoSPlan assesses two key abilities: Error Detection (identifying non-optimal action) and Step Completion (correcting and completing action sequences to reach the goal). Despite using state-of-the-art reasoning techniques such as Chain-of-Thought and Scene Graphs, VLMs (e.g. Intern-VLM and Qwen2) struggle on CoSPlan, failing to leverage contextual cues to reach goals. Addressing this, we propose a novel training-free method, Scene Graph Incremental updates (SGI), which introduces intermediate reasoning steps between the initial and goal states. SGI helps VLMs reason about sequences, yielding an average performance gain of 5.2%. In addition to enhancing reliability in corrective sequential planning, SGI generalizes to traditional planning tasks such as Plan-Bench and VQA.

Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences

Authors:Sarwan Ali, Taslim Murad
Date:2025-12-10 23:03:10

Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availability of large-scale viral sequence data presents significant opportunities for computational analysis; however, existing approaches face notable limitations. Phylogenetic tree-based methods are computationally intensive and do not scale efficiently to today's multi-million-sequence datasets. Similarly, current embedding-based techniques often rely on aligned sequences or exhibit suboptimal predictive performance and high runtime costs, creating barriers to practical large-scale analysis. In this study, we focus on the most prevalent SARS-CoV-2 lineages associated with the spike protein region and introduce a scalable embedding method that leverages hashing to generate compact, low-dimensional representations of spike sequences. These embeddings are subsequently used to train a variety of machine learning models for supervised lineage classification. We conduct an extensive evaluation comparing our approach with multiple baseline and state-of-the-art biological sequence embedding methods across diverse metrics. Our results demonstrate that the proposed embeddings offer substantial improvements in efficiency, achieving up to 86.4\% classification accuracy while reducing embedding generation time by as much as 99.81\%. This highlights the method's potential as a fast, effective, and scalable solution for large-scale viral sequence analysis.

Fast Functionally Redundant Inverse Kinematics for Robotic Toolpath Optimisation in Manufacturing Tasks

Authors:Andrew Razjigaev, Hans Lohr, Alejandro Vargas-Uscategui, Peter King, Tirthankar Bandyopadhyay
Date:2025-12-10 22:07:07

Industrial automation with six-axis robotic arms is critical for many manufacturing tasks, including welding and additive manufacturing applications; however, many of these operations are functionally redundant due to the symmetrical tool axis, which effectively makes the operation a five-axis task. Exploiting this redundancy is crucial for achieving the desired workspace and dexterity required for the feasibility and optimisation of toolpath planning. Inverse kinematics algorithms can solve this in a fast, reactive framework, but these techniques are underutilised over the more computationally expensive offline planning methods. We propose a novel algorithm to solve functionally redundant inverse kinematics for robotic manipulation utilising a task space decomposition approach, the damped least-squares method and Halley's method to achieve fast and robust solutions with reduced joint motion. We evaluate our methodology in the case of toolpath optimisation in a cold spray coating application on a non-planar surface. The functionally redundant inverse kinematics algorithm can quickly solve motion plans that minimise joint motion, expanding the feasible operating space of the complex toolpath. We validate our approach on an industrial ABB manipulator and cold-spray gun executing the computed toolpath.

Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation

Authors:Steven Caro, Stephen L. Smith
Date:2025-12-10 21:40:22

Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD.

SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration

Authors:Yan Zhuang, Jiawei Ren, Xiaokang Ye, Jianzhi Shen, Ruixuan Zhang, Tianai Yue, Muhammad Faayez, Xuhong He, Ziqiao Ma, Lianhui Qin, Zhiting Hu, Tianmin Shu
Date:2025-12-10 20:04:08

Recent advances in foundation models have shown promising results in developing generalist robotics that can perform diverse tasks in open-ended scenarios given multimodal inputs. However, current work has been mainly focused on indoor, household scenarios. In this work, we present SimWorld-Robotics~(SWR), a simulation platform for embodied AI in large-scale, photorealistic urban environments. Built on Unreal Engine 5, SWR procedurally generates unlimited photorealistic urban scenes populated with dynamic elements such as pedestrians and traffic systems, surpassing prior urban simulations in realism, complexity, and scalability. It also supports multi-robot control and communication. With these key features, we build two challenging robot benchmarks: (1) a multimodal instruction-following task, where a robot must follow vision-language navigation instructions to reach a destination in the presence of pedestrians and traffic; and (2) a multi-agent search task, where two robots must communicate to cooperatively locate and meet each other. Unlike existing benchmarks, these two new benchmarks comprehensively evaluate a wide range of critical robot capacities in realistic scenarios, including (1) multimodal instructions grounding, (2) 3D spatial reasoning in large environments, (3) safe, long-range navigation with people and traffic, (4) multi-robot collaboration, and (5) grounded communication. Our experimental results demonstrate that state-of-the-art models, including vision-language models (VLMs), struggle with our tasks, lacking robust perception, reasoning, and planning abilities necessary for urban environments.

Closing the Train-Test Gap in World Models for Gradient-Based Planning

Authors:Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal, Yann LeCun, Oumayma Bounou, Pavel Izmailov, Micah Goldblum
Date:2025-12-10 18:59:45

World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.

NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway

Authors:Sander Riisøen Jyhne, Aditya Gupta, Ben Worsley, Marianne Andersen, Ivar Oveland, Alexander Salveson Nossum
Date:2025-12-10 18:47:25

We present NordFKB, a fine-grained benchmark dataset for geospatial AI in Norway, derived from the authoritative, highly accurate, national Felles KartdataBase (FKB). The dataset contains high-resolution orthophotos paired with detailed annotations for 36 semantic classes, including both per-class binary segmentation masks in GeoTIFF format and COCO-style bounding box annotations. Data is collected from seven geographically diverse areas, ensuring variation in climate, topography, and urbanization. Only tiles containing at least one annotated object are included, and training/validation splits are created through random sampling across areas to ensure representative class and context distributions. Human expert review and quality control ensures high annotation accuracy. Alongside the dataset, we release a benchmarking repository with standardized evaluation protocols and tools for semantic segmentation and object detection, enabling reproducible and comparable research. NordFKB provides a robust foundation for advancing AI methods in mapping, land administration, and spatial planning, and paves the way for future expansions in coverage, temporal scope, and data modalities.

YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos

Authors:Ryan Meegan, Adam D'Souza, Bryan Bo Cao, Shubham Jain, Kristin Dana
Date:2025-12-10 18:32:38

Visual navigation has emerged as a practical alternative to traditional robotic navigation pipelines that rely on detailed mapping and path planning. However, constructing and maintaining 3D maps is often computationally expensive and memory-intensive. We address the problem of visual navigation when exploration videos of a large environment are available. The videos serve as a visual reference, allowing a robot to retrace the explored trajectories without relying on metric maps. Our proposed method, YOPO-Nav (You Only Pass Once), encodes an environment into a compact spatial representation composed of interconnected local 3D Gaussian Splatting (3DGS) models. During navigation, the framework aligns the robot's current visual observation with this representation and predicts actions that guide it back toward the demonstrated trajectory. YOPO-Nav employs a hierarchical design: a visual place recognition (VPR) module provides coarse localization, while the local 3DGS models refine the goal and intermediate poses to generate control actions. To evaluate our approach, we introduce the YOPO-Campus dataset, comprising 4 hours of egocentric video and robot controller inputs from over 6 km of human-teleoperated robot trajectories. We benchmark recent visual navigation methods on trajectories from YOPO-Campus using a Clearpath Jackal robot. Experimental results show YOPO-Nav provides excellent performance in image-goal navigation for real-world scenes on a physical robot. The dataset and code will be made publicly available for visual navigation and scene representation research.

UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving

Authors:Hao Lu, Ziyang Liu, Guangfeng Jiang, Yuanfei Luo, Sheng Chen, Yangang Zhang, Ying-Cong Chen
Date:2025-12-10 17:50:29

Autonomous driving (AD) systems struggle in long-tail scenarios due to limited world knowledge and weak visual dynamic modeling. Existing vision-language-action (VLA)-based methods cannot leverage unlabeled videos for visual causal learning, while world model-based methods lack reasoning capabilities from large language models. In this paper, we construct multiple specialized datasets providing reasoning and planning annotations for complex scenarios. Then, a unified Understanding-Generation-Planning framework, named UniUGP, is proposed to synergize scene reasoning, future video generation, and trajectory planning through a hybrid expert architecture. By integrating pre-trained VLMs and video generation models, UniUGP leverages visual dynamics and semantic reasoning to enhance planning performance. Taking multi-frame observations and language instructions as input, it produces interpretable chain-of-thought reasoning, physically consistent trajectories, and coherent future videos. We introduce a four-stage training strategy that progressively builds these capabilities across multiple existing AD datasets, along with the proposed specialized datasets. Experiments demonstrate state-of-the-art performance in perception, reasoning, and decision-making, with superior generalization to challenging long-tail situations.

Scintillation response of cryogenic CsI to few-keV and sub-keV nuclear recoils

Authors:J. I. Collar, C. M. Lewis, A. Simón, S. G. Yoon
Date:2025-12-10 16:49:01

Monochromatic neutron emissions from photonuclear sources $^{88}$Y/Be and $^{124}$Sb/Be are employed to obtain the response of pure (undoped) cesium iodide at 80 K. The use of a low-noise, high-quantum-efficiency avalanche photodiode in combination with a novel waveshifter results in a 70 eV analysis threshold. This reach allows to observe signals from sub-keV nuclear recoils originating in neutron scattering. The extracted quenching factor drops much faster towards low energy than the extrapolation of a model developed for room-temperature CsI[Na]. We comment on the impact of our measurement on planned use of cryogenic CsI in neutrino physics and dark matter experiments.

A roadmap of geospatial soil quality analysis systems

Authors:Habiba BEN ABDERRAHMANE, Slimane Oulad-Naoui, Benameur ZIANI
Date:2025-12-10 16:40:12

Soil quality (SQ) plays a crucial role in sustainable agriculture, environmental conservation, and land-use planning. Traditional SQ assessment techniques rely on costly, labor-intensive sampling and laboratory analysis, limiting their spatial and temporal coverage. Advances in Geographic Information Systems (GIS), remote sensing, and machine learning (ML) enabled efficient SQ evaluation. This paper presents a comprehensive roadmap distinguishing it from previous reviews by proposing a unified and modular pipeline that integrates multi-source soil data, GIS and remote sensing tools, and machine learning techniques to support transparent and scalable soil quality assessment. It also includes practical applications. Contrary to existing studies that predominantly target isolated soil parameters or specific modeling methodologies, this approach consolidates recent advancements in Geographic Information Systems (GIS), remote sensing technologies, and machine learning algorithms within the entire soil quality assessment pipeline. It also addresses existing challenges and limitations while exploring future developments and emerging trends in the field that can deliver the next generation of soil quality systems making them more transparent, adaptive, and aligned with sustainable land management.

Towards Practical and Usable In-network Classification

Authors:Di Zhu, Jianxi Chen, Hyojoon Kim
Date:2025-12-10 16:24:59

In-network machine learning enables real-time classification directly on network hardware, offering consistently low inference latency. However, current solutions are limited by strict hardware constraints, scarce on-device resources, and poor usability, making them impractical for ML developers and cloud operators. To this end, we propose ACORN, an end-to-end system that automates the distributed deployment of practical machine learning models across the network. ACORN provides a fully automated pipeline that loads and deploys Python ML models on network devices using an optimized deployment plan from an ILP planner. To support larger models under hardware constraints and allow runtime programmability, ACORN adopts a novel data plane representation for Decision Tree, Random Forest, and Support Vector Machine models. We implement ACORN prototype in P4 and run it on real programmable hardware. Our evaluation shows ACORN can deploy classification ML models with 2-4x more features than state-of-the-art solutions, while imposing negligible overhead on network performance and traffic. We will make our data plane program, model translator, optimizer, and all related scripts publicly available.

High-Resolution Water Sampling via a Solar-Powered Autonomous Surface Vehicle

Authors:Misael Mamani, Mariel Fernandez, Grace Luna, Steffani Limachi, Leonel Apaza, Carolina Montes-Dávalos, Marcelo Herrera, Edwin Salcedo
Date:2025-12-10 16:12:59

Accurate water quality assessment requires spatially resolved sampling, yet most unmanned surface vehicles (USVs) can collect only a limited number of samples or rely on single-point sensors with poor representativeness. This work presents a solar-powered, fully autonomous USV featuring a novel syringe-based sampling architecture capable of acquiring 72 discrete, contamination-minimized water samples per mission. The vehicle incorporates a ROS 2 autonomy stack with GPS-RTK navigation, LiDAR and stereo-vision obstacle detection, Nav2-based mission planning, and long-range LoRa supervision, enabling dependable execution of sampling routes in unstructured environments. The platform integrates a behavior-tree autonomy architecture adapted from Nav2, enabling mission-level reasoning and perception-aware navigation. A modular 6x12 sampling system, controlled by distributed micro-ROS nodes, provides deterministic actuation, fault isolation, and rapid module replacement, achieving spatial coverage beyond previously reported USV-based samplers. Field trials in Achocalla Lagoon (La Paz, Bolivia) demonstrated 87% waypoint accuracy, stable autonomous navigation, and accurate physicochemical measurements (temperature, pH, conductivity, total dissolved solids) comparable to manually collected references. These results demonstrate that the platform enables reliable high-resolution sampling and autonomous mission execution, providing a scalable solution for aquatic monitoring in remote environments.

Analyzing Planner Design Trade-offs for MAPF under Realistic Simulation

Authors:Jingtian Yan, Zhifei Li, William Kang, Stephen F. Smith, Jiaoyang Li
Date:2025-12-10 15:15:26

Multi-Agent Path Finding (MAPF) algorithms are increasingly deployed in industrial warehouses and automated manufacturing facilities, where robots must operate reliably under real-world physical constraints. However, existing MAPF evaluation frameworks typically rely on simplified robot models, leaving a substantial gap between algorithmic benchmarks and practical performance. Recent frameworks such as SMART, incorporate kinodynamic modeling and offer the MAPF community a platform for large-scale, realistic evaluation. Building on this capability, this work investigates how key planner design choices influence performance under realistic execution settings. We systematically study three fundamental factors: (1) the relationship between solution optimality and execution performance, (2) the sensitivity of system performance to inaccuracies in kinodynamic modeling, and (3) the interaction between model accuracy and plan optimality. Empirically, we examine these factors to understand how these design choices affect performance in realistic scenarios. We highlight open challenges and research directions to steer the community toward practical, real-world deployment.

Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions

Authors:Junlin Xiao, Victor-Alexandru Darvariu, Bruno Lacerda, Nick Hawes
Date:2025-12-10 15:09:06

Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.

Exqutor: Extended Query Optimizer for Vector-augmented Analytical Queries

Authors:Hyunjoon Kim, Chaerim Lim, Hyeonjun An, Rathijit Sen, Kwanghyun Park
Date:2025-12-10 14:42:52

Vector similarity search is becoming increasingly important for data science pipelines, particularly in Retrieval-Augmented Generation (RAG), where it enhances large language model inference by enabling efficient retrieval of relevant external knowledge. As RAG expands with table-augmented generation to incorporate structured data, workloads integrating table and vector search are becoming more prevalent. However, efficiently executing such queries remains challenging due to inaccurate cardinality estimation for vector search components, leading to suboptimal query plans. In this paper, we propose Exqutor, an extended query optimizer for vector-augmented analytical queries. Exqutor is a pluggable cardinality estimation framework designed to address this issue, leveraging exact cardinality query optimization techniques to enhance estimation accuracy when vector indexes (e.g., HNSW, IVF) are available. In scenarios lacking these indexes, we employ a sampling-based approach with adaptive sampling size adjustment, dynamically tuning the sample size to balance estimation accuracy and sampling overhead. This allows Exqutor to efficiently approximate vector search cardinalities while minimizing computational costs. We integrate our framework into pgvector, VBASE, and DuckDB, demonstrating performance improvements of up to four orders of magnitude on vector-augmented analytical queries.

An Automated Tip-and-Cue Framework for Optimized Satellite Tasking and Visual Intelligence

Authors:Gil Weissman, Amir Ivry, Israel Cohen
Date:2025-12-10 14:14:08

The proliferation of satellite constellations, coupled with reduced tasking latency and diverse sensor capabilities, has expanded the opportunities for automated Earth observation. This paper introduces a fully automated Tip-and-Cue framework designed for satellite imaging tasking and scheduling. In this context, tips are generated from external data sources or analyses of prior satellite imagery, identifying spatiotemporal targets and prioritizing them for downstream planning. Corresponding cues are the imaging tasks formulated in response, which incorporate sensor constraints, timing requirements, and utility functions. The system autonomously generates candidate tasks, optimizes their scheduling across multiple satellites using continuous utility functions that reflect the expected value of each observation, and processes the resulting imagery using artificial-intelligence-based models, including object detectors and vision-language models. Structured visual reports are generated to support both interpretability and the identification of new insights for downstream tasking. The efficacy of the framework is demonstrated through a maritime vessel tracking scenario, utilizing Automatic Identification System (AIS) data for trajectory prediction, targeted observations, and the generation of actionable outputs. Maritime vessel tracking is a widely researched application, often used to benchmark novel approaches to satellite tasking, forecasting, and analysis. The system is extensible to broader applications such as smart-city monitoring and disaster response, where timely tasking and automated analysis are critical.

ReMoSPLAT: Reactive Mobile Manipulation Control on a Gaussian Splat

Authors:Nicolas Marticorena, Tobias Fischer, Niko Suenderhauf
Date:2025-12-10 13:52:08

Reactive control can gracefully coordinate the motion of the base and the arm of a mobile manipulator. However, incorporating an accurate representation of the environment to avoid obstacles without involving costly planning remains a challenge. In this work, we present ReMoSPLAT, a reactive controller based on a quadratic program formulation for mobile manipulation that leverages a Gaussian Splat representation for collision avoidance. By integrating additional constraints and costs into the optimisation formulation, a mobile manipulator platform can reach its intended end effector pose while avoiding obstacles, even in cluttered scenes. We investigate the trade-offs of two methods for efficiently calculating robot-obstacle distances, comparing a purely geometric approach with a rasterisation-based approach. Our experiments in simulation on both synthetic and real-world scans demonstrate the feasibility of our method, showing that the proposed approach achieves performance comparable to controllers that rely on perfect ground-truth information.

An Orbital House of Cards: Frequent Megaconstellation Close Conjunctions

Authors:Sarah Thiele, Skye R. Heiland, Aaron C. Boley, Samantha M. Lawler
Date:2025-12-10 13:37:34

The number of objects in orbit is rapidly increasing, primarily driven by the launch of megaconstellations, an approach to satellite constellation design that involves large numbers of satellites paired with their rapid launch and disposal. While satellites provide many benefits to society, their use comes with challenges, including the growth of space debris, collisions, ground casualty risks, optical and radio-spectrum pollution, and the alteration of Earth's upper atmosphere through rocket emissions and reentry ablation. There is substantial potential for current or planned actions in orbit to cause serious degradation of the orbital environment or lead to catastrophic outcomes, highlighting the urgent need to find better ways to quantify stress on the orbital environment. Here we propose a new metric, the CRASH Clock, that measures such stress in terms of the time it takes for a catastrophic collision to occur if there are no collision avoidance manoeuvres or there is a severe loss in situational awareness. Our calculations show the CRASH Clock is currently 2.8 days, which suggests there is now little time to recover from a wide-spread disruptive event, such as a solar storm. This is in stark contrast to the pre-megaconstellation era: in 2018, the CRASH Clock was 121 days.