planning - 2025-10-05

Hybrid Physics-ML Framework for Pan-Arctic Permafrost Infrastructure Risk at Record 2.9-Million Observation Scale

Authors:Boris Kriuk

Date:2025-10-02 16:38:36

Arctic warming threatens over 100 billion in permafrost-dependent infrastructure across Northern territories, yet existing risk assessment frameworks lack spatiotemporal validation, uncertainty quantification, and operational decision-support capabilities. We present a hybrid physics-machine learning framework integrating 2.9 million observations from 171,605 locations (2005-2021) combining permafrost fraction data with climate reanalysis. Our stacked ensemble model (Random Forest + Histogram Gradient Boosting + Elastic Net) achieves R2=0.980 (RMSE=5.01 pp) with rigorous spatiotemporal cross-validation preventing data leakage. To address machine learning limitations in extrapolative climate scenarios, we develop a hybrid approach combining learned climate-permafrost relationships (60%) with physical permafrost sensitivity models (40%, -10 pp/C). Under RCP8.5 forcing (+5C over 10 years), we project mean permafrost fraction decline of -20.3 pp (median: -20.0 pp), with 51.5% of Arctic Russia experiencing over 20 percentage point loss. Infrastructure risk classification identifies 15% high-risk zones (25% medium-risk) with spatially explicit uncertainty maps. Our framework represents the largest validated permafrost ML dataset globally, provides the first operational hybrid physics-ML forecasting system for Arctic infrastructure, and delivers open-source tools enabling probabilistic permafrost projections for engineering design codes and climate adaptation planning. The methodology is generalizable to other permafrost regions and demonstrates how hybrid approaches can overcome pure data-driven limitations in climate change applications.

SIEVE: Towards Verifiable Certification for Code-datasets

Authors:Fatou Ndiaye Mbodji, El-hacen Diallo, Jordan Samhi, Kui Liu, Jacques Klein, Tegawendé F. Bissyande

Date:2025-10-02 16:14:23

Code agents and empirical software engineering rely on public code datasets, yet these datasets lack verifiable quality guarantees. Static 'dataset cards' inform, but they are neither auditable nor do they offer statistical guarantees, making it difficult to attest to dataset quality. Teams build isolated, ad-hoc cleaning pipelines. This fragments effort and raises cost. We present SIEVE, a community-driven framework. It turns per-property checks into Confidence Cards-machine-readable, verifiable certificates with anytime-valid statistical bounds. We outline a research plan to bring SIEVE to maturity, replacing narrative cards with anytime-verifiable certification. This shift is expected to lower quality-assurance costs and increase trust in code-datasets.

VGDM: Vision-Guided Diffusion Model for Brain Tumor Detection and Segmentation

Authors:Arman Behnam

Date:2025-10-02 14:52:08

Accurate detection and segmentation of brain tumors from magnetic resonance imaging (MRI) are essential for diagnosis, treatment planning, and clinical monitoring. While convolutional architectures such as U-Net have long been the backbone of medical image segmentation, their limited capacity to capture long-range dependencies constrains performance on complex tumor structures. Recent advances in diffusion models have demonstrated strong potential for generating high-fidelity medical images and refining segmentation boundaries. In this work, we propose VGDM: Vision-Guided Diffusion Model for Brain Tumor Detection and Segmentation framework, a transformer-driven diffusion framework for brain tumor detection and segmentation. By embedding a vision transformer at the core of the diffusion process, the model leverages global contextual reasoning together with iterative denoising to enhance both volumetric accuracy and boundary precision. The transformer backbone enables more effective modeling of spatial relationships across entire MRI volumes, while diffusion refinement mitigates voxel-level errors and recovers fine-grained tumor details. This hybrid design provides a pathway toward improved robustness and scalability in neuro-oncology, moving beyond conventional U-Net baselines. Experimental validation on MRI brain tumor datasets demonstrates consistent gains in Dice similarity and Hausdorff distance, underscoring the potential of transformer-guided diffusion models to advance the state of the art in tumor segmentation.

KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting

Authors:Kuiye Ding, Fanda Fan, Zheya Wang, Hongxiao Li, Yifan Wang, Lei Wang, Chunjie Luo, Jianfeng Zhan

Date:2025-10-02 14:50:50

In the World Wide Web, reliable time series forecasts provide the forward-looking signals that drive resource planning, cache placement, and anomaly response, enabling platforms to operate efficiently as user behavior and content distributions evolve. Compared with other domains, time series forecasting for Web applications requires much faster responsiveness to support real-time decision making. We present KAIROS, a non-autoregressive time series forecasting framework that directly models segment-level multi-peak distributions. Unlike autoregressive approaches, KAIROS avoids error accumulation and achieves just-in-time inference, while improving over existing non-autoregressive models that collapse to over-smoothed predictions. Trained on the large-scale corpus, KAIROS demonstrates strong zero-shot generalization on six widely used benchmarks, delivering forecasting performance comparable to state-of-the-art foundation models with similar scale, at a fraction of their inference cost. Beyond empirical results, KAIROS highlights the importance of non-autoregressive design as a scalable paradigm for foundation models in time series.

Coordinated Car-following Using Distributed MPC

Authors:Di Shen, Qi Dai, Suzhou Huang

Date:2025-10-02 13:30:44

Within the modeling framework of Markov games, we propose a series of algorithms for coordinated car-following using distributed model predictive control (DMPC). Instead of tracking prescribed feasible trajectories, driving policies are solved directly as outcomes of the DMPC optimization given the driver's perceivable states. The coordinated solutions are derived using the best response dynamics via iterated self-play, and are facilitated by direct negotiation using inter-agent or agent-infrastructure communication. These solutions closely approximate either Nash equilibrium or centralized optimization. By re-parameterizing the action sequence in DMPC as a curve along the planning horizon, we are able to systematically reduce the original DMPC to very efficient grid searches such that the optimal solution to the original DMPC can be well executed in real-time. Within our modeling framework, it is natural to cast traffic control problems as mechanism design problems, in which all agents are endogenized on an equal footing with full incentive compatibility. We show how traffic efficiency can be dramatically improved while keeping stop-and-go phantom waves tamed at high vehicle densities. Our approach can be viewed as an alternative way to formulate coordinated adaptive cruise control (CACC) without an explicit platooning (or with all vehicles in the traffic system treated as a single extended platoon). We also address the issue of linear stability of the associated discrete-time traffic dynamics and demonstrate why it does not always tell the full story about the traffic stability.

KTBox: A Modular LaTeX Framework for Semantic Color, Structured Highlighting, and Scholarly Communication

Authors:Bhaskar Mangal, Ashutosh Bhatia, Yashvardhan Sharma, Kamlesh Tiwari, Rashmi Verma

Date:2025-10-02 12:32:01

The communication of technical insight in scientific manuscripts often relies on ad-hoc formatting choices, resulting in inconsistent visual emphasis and limited portability across document classes. This paper introduces ktbox, a modular LaTeX framework that unifies semantic color palettes, structured highlight boxes, taxonomy trees, and author metadata utilities into a coherent system for scholarly writing. The framework is distributed as a set of lightweight, namespaced components: ktcolor.sty for semantic palettes, ktbox.sty for structured highlight and takeaway environments, ktlrtree.sty for taxonomy trees with fusion and auxiliary annotations, and ktorcid.sty for ORCID-linked author metadata. Each component is independently usable yet interoperable, ensuring compatibility with major templates such as IEEEtran, acmart, iclr conference, and beamer. Key features include auto-numbered takeaway boxes, wide-format highlights, flexible taxonomy tree visualizations, and multi-column layouts supporting embedded tables, enumerations, and code blocks. By adopting a clear separation of concerns and enforcing a consistent naming convention under the kt namespace, the framework transforms visual styling from cosmetic add-ons into reproducible, extensible building blocks of scientific communication, improving clarity, portability, and authoring efficiency across articles, posters, and presentations.

Constraints on WIMP-like dark matter scattering on electrons with COSINE-100

Authors:N. Carlin, J. Y. Cho, S. J. Cho, S. Choi, A. C. Ezeribe, L. E. Franca, O. Gileva, C. Ha, I. S. Hahn, S. J. Hollick, E. J. Jeon, H. W. Joo, W. G. Kang, M. Kauer, B. H. Kim, D. Y. Kim, H. J. Kim, J. Kim, K. W. Kim, S. H. Kim, S. K. Kim, W. K. Kim, Y. D. Kim, Y. H. Kim, B. R. Ko, Y. J. Ko, D. H. Lee, E. K. Lee, H. Lee, H. S. Lee, H. Y. Lee, I. S. Lee, J. Lee, J. Y. Lee, M. H. Lee, S. H. Lee, S. M. Lee, Y. J. Lee, D. S. Leonard, N. T. Luan, V. H. A. Machado, B. B. Manzato, R. H. Maruyama, S. L. Olsen, H. K. Park, H. S. Park, J. C. Park, J. S. Park, K. S. Park, K. Park, S. D. Park, R. L. C. Pitta, H. Prihtiadi, S. J. Ra, C. Rott, K. A. Shin, D. F. F. S. Cavalcante, M. K. Son, N. J. C. Spooner, L. T. Truc, L. Yang, G. H. Yu

Date:2025-10-02 11:46:32

We present results of the search for WIMP-like dark matter interaction with electrons in the NaI(Tl) crystals of the COSINE-100 experiment. The two benchmark scenarios of a heavy and a light vector boson as mediator of the interaction were studied. We found no excess events over the expected background in a data-set of 2.82 yr, with a total exposure of 172.9 kg.yr. The derived 90% confidence level upper limits exclude a WIMP-electron scattering cross section above 6.4 $\times$ 10$^{-33}$ cm$^2$ for a WIMP mass of 0.25 GeV, assuming a light mediator; and above 3.4 $\times$ 10$^{-37}$ cm$^2$ for a 0.4 GeV WIMP, assuming a heavy mediator, and represent the most stringent constraints for a NaI(Tl) target to date. We also briefly discuss a planned analysis using an annual modulation method below the current 0.7 keV threshold of COSINE-100, down to few photoelectrons yield.

Deep Hedging Under Non-Convexity: Limitations and a Case for AlphaZero

Authors:Matteo Maggiolo, Giuseppe Nuti, Miroslav Štrupl, Oleg Szehr

Date:2025-10-02 10:28:59

This paper examines replication portfolio construction in incomplete markets - a key problem in financial engineering with applications in pricing, hedging, balance sheet management, and energy storage planning. We model this as a two-player game between an investor and the market, where the investor makes strategic bets on future states while the market reveals outcomes. Inspired by the success of Monte Carlo Tree Search in stochastic games, we introduce an AlphaZero-based system and compare its performance to deep hedging - a widely used industry method based on gradient descent. Through theoretical analysis and experiments, we show that deep hedging struggles in environments where the $Q$-function is not subject to convexity constraints - such as those involving non-convex transaction costs, capital constraints, or regulatory limitations - converging to local optima. We construct specific market environments to highlight these limitations and demonstrate that AlphaZero consistently finds near-optimal replication strategies. On the theoretical side, we establish a connection between deep hedging and convex optimization, suggesting that its effectiveness is contingent on convexity assumptions. Our experiments further suggest that AlphaZero is more sample-efficient - an important advantage in data-scarce, overfitting-prone derivative markets.

Computing Phylogenetic Diversity

Authors:Jannik Schestag

Date:2025-10-02 09:46:32

Phylogenetic Diversity(PD)is a well-regarded measure of the overall biodiversity of a set of present-day species(taxa)that indicates its ecological significance.In the Maximize Phylogenetic Diversity(Max-PD)problem one is asked to find a small set of taxa in a phylogenetic tree for which this measure is maximized.Max-PD is particularly relevant in conservation planning,where limited resources necessitate prioritizing certain taxa to minimize biodiversity loss.Although Max-PD can be solved in polynomial time [Steel,SB,2005;Pardi&Goldman,PLoS,2005],its generalizations-which aim to model biological processes and other aspects in conservation planning with greater accuracy-often exhibit NP-hardness,making them computationally challenging.This thesis explores a selection of these generalized problems within the framework of parameterized complexity. In Generalized Noah's Ark Problem(GNAP),each taxon only survives at a certain survival probability,which can be increased by investing more money in the taxon.We show that GNAP is W[1]-hard with respect to the number of taxa but is XP with respect to the number of different costs and different survival probabilities. Additionally,we show that unit-cost-NAP,a special case of GNAP,is NP-hard. In Time Sensitive Maximization of Phylogenetic Diversity(Time-PD),different extinction times of taxa are considered after which they can no longer be saved.For Time-PD,we present color-coding algorithms that prove that Time-PD is fixed-parameter tractable(FPT)with respect to the threshold of diversity and the acceptable loss of diversity. In Optimizing PD with Dependencies(PDD),each saved taxon must be a source in the ecological system or a predator of another saved species.These dependencies are given in a food-web.We show that PDD is FPT when parameterized with the size of the solution plus the height of the phylogenetic tree. Further,we consider pa...

Like Playing a Video Game: Spatial-Temporal Optimization of Foot Trajectories for Controlled Football Kicking in Bipedal Robots

Authors:Wanyue Li, Ji Ma, Minghao Lu, Peng Lu

Date:2025-10-02 09:37:30

Humanoid robot soccer presents several challenges, particularly in maintaining system stability during aggressive kicking motions while achieving precise ball trajectory control. Current solutions, whether traditional position-based control methods or reinforcement learning (RL) approaches, exhibit significant limitations. Model predictive control (MPC) is a prevalent approach for ordinary quadruped and biped robots. While MPC has demonstrated advantages in legged robots, existing studies often oversimplify the leg swing progress, relying merely on simple trajectory interpolation methods. This severely constrains the foot's environmental interaction capability, hindering tasks such as ball kicking. This study innovatively adapts the spatial-temporal trajectory planning method, which has been successful in drone applications, to bipedal robotic systems. The proposed approach autonomously generates foot trajectories that satisfy constraints on target kicking position, velocity, and acceleration while simultaneously optimizing swing phase duration. Experimental results demonstrate that the optimized trajectories closely mimic human kicking behavior, featuring a backswing motion. Simulation and hardware experiments confirm the algorithm's efficiency, with trajectory planning times under 1 ms, and its reliability, achieving nearly 100 % task completion accuracy when the soccer goal is within the range of -90{\deg} to 90{\deg}.

What Matters in RL-Based Methods for Object-Goal Navigation? An Empirical Study and A Unified Framework

Authors:Hongze Wang, Boyang Sun, Jiaxu Xing, Fan Yang, Marco Hutter, Dhruv Shah, Davide Scaramuzza, Marc Pollefeys

Date:2025-10-02 09:24:32

Object-Goal Navigation (ObjectNav) is a critical component toward deploying mobile robots in everyday, uncontrolled environments such as homes, schools, and workplaces. In this context, a robot must locate target objects in previously unseen environments using only its onboard perception. Success requires the integration of semantic understanding, spatial reasoning, and long-horizon planning, which is a combination that remains extremely challenging. While reinforcement learning (RL) has become the dominant paradigm, progress has spanned a wide range of design choices, yet the field still lacks a unifying analysis to determine which components truly drive performance. In this work, we conduct a large-scale empirical study of modular RL-based ObjectNav systems, decomposing them into three key components: perception, policy, and test-time enhancement. Through extensive controlled experiments, we isolate the contribution of each and uncover clear trends: perception quality and test-time strategies are decisive drivers of performance, whereas policy improvements with current methods yield only marginal gains. Building on these insights, we propose practical design guidelines and demonstrate an enhanced modular system that surpasses State-of-the-Art (SotA) methods by 6.6% on SPL and by a 2.7% success rate. We also introduce a human baseline under identical conditions, where experts achieve an average 98% success, underscoring the gap between RL agents and human-level navigation. Our study not only sets the SotA performance but also provides principled guidance for future ObjectNav development and evaluation.

The Perceived Influences of Environment on Health in Italy: a Penalized Ordinal Regression Approach

Authors:Mattia Stival, Angela Andreella, Gaia Bertarelli, Catarina Midões, Stefano Federico Tonellato, Enrica De Cian, Stefano Campostrini

Date:2025-10-02 08:44:39

Understanding how individuals perceive their living environment is a complex task, as it reflects both personal and contextual determinants. In this paper, we address this task by analyzing the environmental module of the Italian nationwide health surveillance system PASSI (Progressi delle Aziende Sanitarie per la Salute in Italia), integrating it with contextual information at the municipal level, including socio-economic indicators, pollution exposure, and other geographical characteristics. Methodologically, we adopt a penalized semi-parallel cumulative ordinal regression model to analyze how subjective perceptions are shaped by both personal and territorial determinants. The approach balances flexibility and interpretability by allowing both parallel and non-parallel effects while regularizing estimates to address multicollinearity and separation issues. We use the model as an analytical tool to uncover the determinants of positivity and neutrality in environmental perceptions, defined as factors that contribute the most to improving perception or increasing the sense of neutrality. The results are diverse. First, results reveal significant heterogeneity across Italian territories, indicating that local characteristics strongly shape environmental perception. Second, various individual factors interact with contextual influences to shape perceptions. Third, hazardous environmental factors, such as higher PM2.5 levels, appear to be associated with poorer environmental perception, suggesting a tendency among respondents to recognize specific environmental issues. Overall, the approach demonstrates strong potential for application and provides useful insights for environmental policy planning.

Scalable Asynchronous Federated Modeling for Spatial Data

Authors:Jianwei Shi, Sameh Abdulah, Ying Sun, Marc G. Genton

Date:2025-10-02 08:04:46

Spatial data are central to applications such as environmental monitoring and urban planning, but are often distributed across devices where privacy and communication constraints limit direct sharing. Federated modeling offers a practical solution that preserves data privacy while enabling global modeling across distributed data sources. For instance, environmental sensor networks are privacy- and bandwidth-constrained, motivating federated spatial modeling that shares only privacy-preserving summaries to produce timely, high-resolution pollution maps without centralizing raw data. However, existing federated modeling approaches either ignore spatial dependence or rely on synchronous updates that suffer from stragglers in heterogeneous environments. This work proposes an asynchronous federated modeling framework for spatial data based on low-rank Gaussian process approximations. The method employs block-wise optimization and introduces strategies for gradient correction, adaptive aggregation, and stabilized updates. We establish linear convergence with explicit dependence on staleness, a result of standalone theoretical significance. Moreover, numerical experiments demonstrate that the asynchronous algorithm achieves synchronous performance under balanced resource allocation and significantly outperforms it in heterogeneous settings, showcasing superior robustness and scalability.

Latency-aware Multimodal Federated Learning over UAV Networks

Authors:Shaba Shaon, Dinh C. Nguyen

Date:2025-10-02 06:57:44

This paper investigates federated multimodal learning (FML) assisted by unmanned aerial vehicles (UAVs) with a focus on minimizing system latency and providing convergence analysis. In this framework, UAVs are distributed throughout the network to collect data, participate in model training, and collaborate with a base station (BS) to build a global model. By utilizing multimodal sensing, the UAVs overcome the limitations of unimodal systems, enhancing model accuracy, generalization, and offering a more comprehensive understanding of the environment. The primary objective is to optimize FML system latency in UAV networks by jointly addressing UAV sensing scheduling, power control, trajectory planning, resource allocation, and BS resource management. To address the computational complexity of our latency minimization problem, we propose an efficient iterative optimization algorithm combining block coordinate descent and successive convex approximation techniques, which provides high-quality approximate solutions. We also present a theoretical convergence analysis for the UAV-assisted FML framework under a non-convex loss function. Numerical experiments demonstrate that our FML framework outperforms existing approaches in terms of system latency and model training performance under different data settings.

Representational Alignment Across Model Layers and Brain Regions with Hierarchical Optimal Transport

Authors:Shaan Shah, Meenakshi Khosla

Date:2025-10-02 06:25:06

Standard representational similarity methods align each layer of a network to its best match in another independently, producing asymmetric results, lacking a global alignment score, and struggling with networks of different depths. These limitations arise from ignoring global activation structure and restricting mappings to rigid one-to-one layer correspondences. We propose Hierarchical Optimal Transport (HOT), a unified framework that jointly infers soft, globally consistent layer-to-layer couplings and neuron-level transport plans. HOT allows source neurons to distribute mass across multiple target layers while minimizing total transport cost under marginal constraints. This yields both a single alignment score for the entire network comparison and a soft transport plan that naturally handles depth mismatches through mass distribution. We evaluate HOT on vision models, large language models, and human visual cortex recordings. Across all domains, HOT matches or surpasses standard pairwise matching in alignment quality. Moreover, it reveals smooth, fine-grained hierarchical correspondences: early layers map to early layers, deeper layers maintain relative positions, and depth mismatches are resolved by distributing representations across multiple layers. These structured patterns emerge naturally from global optimization without being imposed, yet are absent in greedy layer-wise methods. HOT thus enables richer, more interpretable comparisons between representations, particularly when networks differ in architecture or depth.

Symskill: Symbol and Skill Co-Invention for Data-Efficient and Real-Time Long-Horizon Manipulation

Authors:Yifei Simon Shao, Yuchen Zheng, Sunan Sun, Pratik Chaudhari, Vijay Kumar, Nadia Figueroa

Date:2025-10-02 04:41:01

Multi-step manipulation in dynamic environments remains challenging. Two major families of methods fail in distinct ways: (i) imitation learning (IL) is reactive but lacks compositional generalization, as monolithic policies do not decide which skill to reuse when scenes change; (ii) classical task-and-motion planning (TAMP) offers compositionality but has prohibitive planning latency, preventing real-time failure recovery. We introduce SymSkill, a unified learning framework that combines the benefits of IL and TAMP, allowing compositional generalization and failure recovery in real-time. Offline, SymSkill jointly learns predicates, operators, and skills directly from unlabeled and unsegmented demonstrations. At execution time, upon specifying a conjunction of one or more learned predicates, SymSkill uses a symbolic planner to compose and reorder learned skills to achieve the symbolic goals, while performing recovery at both the motion and symbolic levels in real time. Coupled with a compliant controller, SymSkill enables safe and uninterrupted execution under human and environmental disturbances. In RoboCasa simulation, SymSkill can execute 12 single-step tasks with 85% success rate. Without additional data, it composes these skills into multi-step plans requiring up to 6 skill recompositions, recovering robustly from execution failures. On a real Franka robot, we demonstrate SymSkill, learning from 5 minutes of unsegmented and unlabeled play data, is capable of performing multiple tasks simply by goal specifications. The source code and additional analysis can be found on https://sites.google.com/view/symskill.

SoK: Measuring What Matters for Closed-Loop Security Agents

Authors:Mudita Khurana, Raunak Jain

Date:2025-10-02 04:20:35

Cybersecurity is a relentless arms race, with AI driven offensive systems evolving faster than traditional defenses can adapt. Research and tooling remain fragmented across isolated defensive functions, creating blind spots that adversaries exploit. Autonomous agents capable of integrating, exploit confirmation, remediation, and validation into a single closed loop offer promise, but the field lacks three essentials: a framework defining the agentic capabilities of security systems across security life cycle, a principled method for evaluating closed loop agents, and a benchmark for measuring their performance in practice. We introduce CLASP: the Closed-Loop Autonomous Security Performance framework which aligns the security lifecycle (reconnaissance, exploitation, root cause analysis, patch synthesis, validation) with core agentic capabilities (planning, tool use, memory, reasoning, reflection & perception) providing a common vocabulary and rubric for assessing agentic capabilities in security tasks. By applying CLASP to 21 representative works, we map where systems demonstrate strengths, and where capability gaps persist. We then define the Closed-Loop Capability (CLC) Score, a composite metric quantifying both degree of loop closure and operational effectiveness, and outline the requirements for a closed loop benchmark. Together, CLASP and the CLC Score, provide the vocabulary, diagnostics, and measurements needed to advance both function level performance and measure closed loop security agents.

FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models

Authors:Zijun Lin, Jiafei Duan, Haoquan Fang, Dieter Fox, Ranjay Krishna, Cheston Tan, Bihan Wen

Date:2025-10-02 03:48:07

Recent advances in robotic manipulation have integrated low-level robotic control into Vision-Language Models (VLMs), extending them into Vision-Language-Action (VLA) models. Although state-of-the-art VLAs achieve strong performance in downstream robotic applications, supported by large-scale crowd-sourced robot training data, they still inevitably encounter failures during execution. Enabling robots to reason about and recover from unpredictable and abrupt failures remains a critical challenge. Existing robotic manipulation datasets, collected in either simulation or the real world, primarily provide only ground-truth trajectories, leaving robots unable to recover once failures occur. Moreover, the few datasets that address failure detection typically offer only textual explanations, which are difficult to utilize directly in VLA models. To address this gap, we introduce FailSafe, a novel failure generation and recovery system that automatically produces diverse failure cases paired with executable recovery actions. FailSafe can be seamlessly applied to any manipulation task in any simulator, enabling scalable creation of failure-action data. To demonstrate its effectiveness, we fine-tune LLaVa-OneVision-7B (LLaVa-OV-7B) to build FailSafe-VLM. Experimental results show that FailSafe-VLM successfully helps robotic arm detect and recover from potential failures, improving the performance of three state-of-the-art VLA models pi0-FAST, OpenVLA, OpenVLA-OFT) by up to 22.6% on average across several tasks in Maniskill. Furthermore, FailSafe-VLM could generalize across different spatial configurations, camera viewpoints, and robotic embodiments. We plan to release the FailSafe code to the community.

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Authors:Angen Ye, Zeyu Zhang, Boyuan Wang, Xiaofeng Wang, Dapeng Zhang, Zheng Zhu

Date:2025-10-02 02:54:03

Vision-Language-Action (VLA) models aim to unify perception, language understanding, and action generation, offering strong cross-task and cross-scene generalization with broad impact on embodied AI. However, current VLA models often lack explicit step-by-step reasoning, instead emitting final actions without considering affordance constraints or geometric relations. Their post-training pipelines also rarely reinforce reasoning quality, relying primarily on supervised fine-tuning with weak reward design. To address these challenges, we present VLA-R1, a reasoning-enhanced VLA that integrates Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative Policy Optimization (GRPO) to systematically optimize both reasoning and execution. Specifically, we design an RLVR-based post-training strategy with verifiable rewards for region alignment, trajectory consistency, and output formatting, thereby strengthening reasoning robustness and execution accuracy. Moreover, we develop VLA-CoT-13K, a high-quality dataset that provides chain-of-thought supervision explicitly aligned with affordance and trajectory annotations. Furthermore, extensive evaluations on in-domain, out-of-domain, simulation, and real-robot platforms demonstrate that VLA-R1 achieves superior generalization and real-world performance compared to prior VLA methods. We plan to release the model, code, and dataset following the publication of this work. Code: https://github.com/GigaAI-research/VLA-R1. Website: https://gigaai-research.github.io/VLA-R1.

Online Hierarchical Policy Learning using Physics Priors for Robot Navigation in Unknown Environments

Authors:Wei Han Chen, Yuchen Liu, Alexiy Buynitsky, Ahmed H. Qureshi

Date:2025-10-01 23:29:56

Robot navigation in large, complex, and unknown indoor environments is a challenging problem. The existing approaches, such as traditional sampling-based methods, struggle with resolution control and scalability, while imitation learning-based methods require a large amount of demonstration data. Active Neural Time Fields (ANTFields) have recently emerged as a promising solution by using local observations to learn cost-to-go functions without relying on demonstrations. Despite their potential, these methods are hampered by challenges such as spectral bias and catastrophic forgetting, which diminish their effectiveness in complex scenarios. To address these issues, our approach decomposes the planning problem into a hierarchical structure. At the high level, a sparse graph captures the environment's global connectivity, while at the low level, a planner based on neural fields navigates local obstacles by solving the Eikonal PDE. This physics-informed strategy overcomes common pitfalls like spectral bias and neural field fitting difficulties, resulting in a smooth and precise representation of the cost landscape. We validate our framework in large-scale environments, demonstrating its enhanced adaptability and precision compared to previous methods, and highlighting its potential for online exploration, mapping, and real-world navigation.

VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs

Authors:Mohamad Al Mdfaa, Svetlana Lukina, Timur Akhtyamov, Arthur Nigmatzyanov, Dmitrii Nalberskii, Sergey Zagoruyko, Gonzalo Ferrer

Date:2025-10-01 21:53:44

Vision-language models (VLMs) have shown potential for robot navigation but encounter fundamental limitations: they lack persistent scene memory, offer limited spatial reasoning, and do not scale effectively with video duration for real-time application. We present VL-KnG, a Visual Scene Understanding system that tackles these challenges using spatiotemporal knowledge graph construction and computationally efficient query processing for navigation goal identification. Our approach processes video sequences in chunks utilizing modern VLMs, creates persistent knowledge graphs that maintain object identity over time, and enables explainable spatial reasoning through queryable graph structures. We also introduce WalkieKnowledge, a new benchmark with about 200 manually annotated questions across 8 diverse trajectories spanning approximately 100 minutes of video data, enabling fair comparison between structured approaches and general-purpose VLMs. Real-world deployment on a differential drive robot demonstrates practical applicability, with our method achieving 77.27% success rate and 76.92% answer accuracy, matching Gemini 2.5 Pro performance while providing explainable reasoning supported by the knowledge graph, computational efficiency for real-time deployment across different tasks, such as localization, navigation and planning. Code and dataset will be released after acceptance.

VENTURA: Adapting Image Diffusion Models for Unified Task Conditioned Navigation

Authors:Arthur Zhang, Xiangyun Meng, Luca Calliari, Dong-Ki Kim, Shayegan Omidshafiei, Joydeep Biswas, Ali Agha, Amirreza Shaban

Date:2025-10-01 19:21:28

Robots must adapt to diverse human instructions and operate safely in unstructured, open-world environments. Recent Vision-Language models (VLMs) offer strong priors for grounding language and perception, but remain difficult to steer for navigation due to differences in action spaces and pretraining objectives that hamper transferability to robotics tasks. Towards addressing this, we introduce VENTURA, a vision-language navigation system that finetunes internet-pretrained image diffusion models for path planning. Instead of directly predicting low-level actions, VENTURA generates a path mask (i.e. a visual plan) in image space that captures fine-grained, context-aware navigation behaviors. A lightweight behavior-cloning policy grounds these visual plans into executable trajectories, yielding an interface that follows natural language instructions to generate diverse robot behaviors. To scale training, we supervise on path masks derived from self-supervised tracking models paired with VLM-augmented captions, avoiding manual pixel-level annotation or highly engineered data collection setups. In extensive real-world evaluations, VENTURA outperforms state-of-the-art foundation model baselines on object reaching, obstacle avoidance, and terrain preference tasks, improving success rates by 33% and reducing collisions by 54% across both seen and unseen scenarios. Notably, we find that VENTURA generalizes to unseen combinations of distinct tasks, revealing emergent compositional capabilities. Videos, code, and additional materials: https://venturapath.github.io

Safe Motion Planning and Control Using Predictive and Adaptive Barrier Methods for Autonomous Surface Vessels

Authors:Alejandro Gonzalez-Garcia, Wei Xiao, Wei Wang, Alejandro Astudillo, Wilm Decré, Jan Swevers, Carlo Ratti, Daniela Rus

Date:2025-10-01 18:36:52

Safe motion planning is essential for autonomous vessel operations, especially in challenging spaces such as narrow inland waterways. However, conventional motion planning approaches are often computationally intensive or overly conservative. This paper proposes a safe motion planning strategy combining Model Predictive Control (MPC) and Control Barrier Functions (CBFs). We introduce a time-varying inflated ellipse obstacle representation, where the inflation radius is adjusted depending on the relative position and attitude between the vessel and the obstacle. The proposed adaptive inflation reduces the conservativeness of the controller compared to traditional fixed-ellipsoid obstacle formulations. The MPC solution provides an approximate motion plan, and high-order CBFs ensure the vessel's safety using the varying inflation radius. Simulation and real-world experiments demonstrate that the proposed strategy enables the fully-actuated autonomous robot vessel to navigate through narrow spaces in real time and resolve potential deadlocks, all while ensuring safety.

Kilometer-Scale GNSS-Denied UAV Navigation via Heightmap Gradients: A Winning System from the SPRIN-D Challenge

Authors:Michal Werner, David Čapek, Tomáš Musil, Ondřej Franěk, Tomáš Báča, Martin Saska

Date:2025-10-01 18:23:42

Reliable long-range flight of unmanned aerial vehicles (UAVs) in GNSS-denied environments is challenging: integrating odometry leads to drift, loop closures are unavailable in previously unseen areas and embedded platforms provide limited computational power. We present a fully onboard UAV system developed for the SPRIN-D Funke Fully Autonomous Flight Challenge, which required 9 km long-range waypoint navigation below 25 m AGL (Above Ground Level) without GNSS or prior dense mapping. The system integrates perception, mapping, planning, and control with a lightweight drift-correction method that matches LiDAR-derived local heightmaps to a prior geo-data heightmap via gradient-template matching and fuses the evidence with odometry in a clustered particle filter. Deployed during the competition, the system executed kilometer-scale flights across urban, forest, and open-field terrain and reduced drift substantially relative to raw odometry, while running in real time on CPU-only hardware. We describe the system architecture, the localization pipeline, and the competition evaluation, and we report practical insights from field deployment that inform the design of GNSS-denied UAV autonomy.

Optimizing Fairness in Production Planning: A Human-Centric Approach to Machine and Workforce Allocation

Authors:Alexander Nasuta, Alessandro Cisi, Sylwia Olbrych, Gustavo Vieira, Rui Fernandes, Lucas Paletta, Marlene Mayr, Rishyank Chevuri, Robert Woitsch, Hans Aoyang Zhou, Anas Abdelrazeq, Robert H. Schmitt

Date:2025-10-01 16:41:18

This work presents a two-layer, human-centric production planning framework designed to optimize both operational efficiency and workforce fairness in industrial manufacturing. The first layer formulates the Order-Line allocation as a Constraint Programming (CP) problem, generating high-utilization production schedules that respect machine capacities, processing times, and due dates. The second layer models Worker-Line allocation as a Markov Decision Process (MDP), integrating human factors such as worker preference, experience, resilience, and medical constraints into the assignment process. Three solution strategies, greedy allocation, MCTS, and RL, are implemented and compared across multiple evaluation scenarios. The proposed system is validated through 16 test sessions with domain experts from the automotive industry, combining quantitative key performance indicators (KPIs) with expert ratings. Results indicate that the CP-based scheduling approach produces compact, feasible production plans with low tardiness, while the MDP-based worker allocation significantly improves fairness and preference alignment compared to baseline approaches. Domain experts rated both the Order-Line and Worker-Line components as effective and highlighted opportunities to further refine the objective function to penalize excessive earliness and improve continuity in worker assignments. Overall, the findings demonstrate that combining CP with learning-based decision-making provides a robust approach for human-centric production planning. The approach enables simultaneous optimization of throughput and workforce well-being, offering a practical foundation for fair and efficient manufacturing scheduling in industrial settings.

The Silicon Strip Detector Subsystem for the Trans-Iron Galactic Element Recorder for the International Space Station (TIGERISS)

Authors:John F. Krizmanic, Scott Nutter

Date:2025-10-01 16:27:30

The Trans-Iron Galactic Element Recorder for the International Space Station (TIGERISS) is under construction and is planned for launch in 2027 and will be attached at the SOX location on the Columbus module on the ISS. TIGERISS will make the first definitive measurements of Ultra-Heavy Galactic Cosmic Rays (UHGCRs; Z >29) on an individual element basis past barium ($^{56}$Ba), through the lanthinides, and to lead ($^{82}$Pb). TIGERISS has a geometry factor of 1.06 m$^2$ sr and is comprised of four planes of single-sided silicon strip detectors (SSDs) arranged in orthogonal X-Y layers with an X-Y pair above and an X-Y pair below two large-area Cherenkov detectors. The top Cherenkov detector is comprised of a mosaic of aerogel radiators (n =1.05) while the bottom Cherenkov detector has an acrylic radiator (n = 1.49). The combination of the Cherenkov velocity measurements with the precise measurements of the ionization and trajectory of the traversing cosmic rays leads to highly accurate charge measurements of $<$ 0.25 c.u. over the entire elemental range of $^5$B through $^{82}$Pb. These TIGERISS measurements are highly sensitive in determining the strength of s-process, r-process, and rp-processes of Galactic nucleosynthesis while providing critical data needed for multi-messenger studies to determine the contributions of extreme phenomena, including supernovae (SN) and Neutron Star Mergers (NSMs), in the production of galactic matter. The science goals of TIGERISS, mission status, instrument design and performance of the TIGERISS SSD subsystem in relation to the measurements and science goals of TIGERISS are discussed in this paper.

ROSplane 2.0: A Fixed-Wing Autopilot for Research

Authors:Ian Reid, Joseph Ritchie, Jacob Moore, Brandon Sutherland, Gabe Snow, Phillip Tokumaru, Tim McLain

Date:2025-10-01 15:44:27

Unmanned aerial vehicle (UAV) research requires the integration of cutting-edge technology into existing autopilot frameworks. This process can be arduous, requiring extensive resources, time, and detailed knowledge of the existing system. ROSplane is a lean, open-source fixed-wing autonomy stack built by researchers for researchers. It is designed to accelerate research by providing clearly defined interfaces with an easily modifiable framework. Powered by ROS 2, ROSplane allows for rapid integration of low or high-level control, path planning, or estimation algorithms. A focus on lean, easily understood code and extensive documentation lowers the barrier to entry for researchers. Recent developments to ROSplane improve its capacity to accelerate UAV research, including the transition from ROS 1 to ROS 2, enhanced estimation and control algorithms, increased modularity, and an improved aerodynamic modeling pipeline. This aerodynamic modeling pipeline significantly reduces the effort of transitioning from simulation to real-world testing without requiring expensive system identification or computational fluid dynamics tools. ROSplane's architecture reduces the effort required to integrate new research tools and methods, expediting hardware experimentation.

CL-UZH submission to the NIST SRE 2024 Speaker Recognition Evaluation

Authors:Aref Farhadipour, Shiran Liu, Masoumeh Chapariniya, Valeriia Perepelytsia, Srikanth Madikeri, Teodora Vukovic, Volker Dellwo

Date:2025-10-01 14:27:00

The CL-UZH team submitted one system each for the fixed and open conditions of the NIST SRE 2024 challenge. For the closed-set condition, results for the audio-only trials were achieved using the X-vector system developed with Kaldi. For the audio-visual results we used only models developed for the visual modality. Two sets of results were submitted for the open-set and closed-set conditions, one based on a pretrained model using the VoxBlink2 and VoxCeleb2 datasets. An Xvector-based model was trained from scratch using the CTS superset dataset for the closed set. In addition to the submission of the results of the SRE24 evaluation to the competition website, we talked about the performance of the proposed systems on the SRE24 evaluation in this report.

Target Population Synthesis using CT-GAN

Authors:Tanay Rastogi, Daniel Jonsson

Date:2025-10-01 13:20:18

Agent-based models used in scenario planning for transportation and urban planning usually require detailed population information from the base as well as target scenarios. These populations are usually provided by synthesizing fake agents through deterministic population synthesis methods. However, these deterministic population synthesis methods face several challenges, such as handling high-dimensional data, scalability, and zero-cell issues, particularly when generating populations for target scenarios. This research looks into how a deep generative model called Conditional Tabular Generative Adversarial Network (CT-GAN) can be used to create target populations either directly from a collection of marginal constraints or through a hybrid method that combines CT-GAN with Fitness-based Synthesis Combinatorial Optimization (FBS-CO). The research evaluates the proposed population synthesis models against travel survey and zonal-level aggregated population data. Results indicate that the stand-alone CT-GAN model performs the best when compared with FBS-CO and the hybrid model. CT-GAN by itself can create realistic-looking groups that match single-variable distributions, but it struggles to maintain relationships between multiple variables. However, the hybrid model demonstrates improved performance compared to FBS-CO by leveraging CT-GAN ability to generate a descriptive base population, which is then refined using FBS-CO to align with target-year marginals. This study demonstrates that CT-GAN represents an effective methodology for target populations and highlights how deep generative models can be successfully integrated with conventional synthesis techniques to enhance their performance.

From 2D to 3D, Deep Learning-based Shape Reconstruction in Magnetic Resonance Imaging: A Review

Authors:Emma McMillian, Abhirup Banerjee, Alfonso Bueno-Orovio

Date:2025-10-01 09:57:29

Deep learning-based 3-dimensional (3D) shape reconstruction from 2-dimensional (2D) magnetic resonance imaging (MRI) has become increasingly important in medical disease diagnosis, treatment planning, and computational modeling. This review surveys the methodological landscape of 3D MRI reconstruction, focusing on 4 primary approaches: point cloud, mesh-based, shape-aware, and volumetric models. For each category, we analyze the current state-of-the-art techniques, their methodological foundation, limitations, and applications across anatomical structures. We provide an extensive overview ranging from cardiac to neurological to lung imaging. We also focus on the clinical applicability of models to diseased anatomy, and the influence of their training and testing data. We examine publicly available datasets, computational demands, and evaluation metrics. Finally, we highlight the emerging research directions including multimodal integration and cross-modality frameworks. This review aims to provide researchers with a structured overview of current 3D reconstruction methodologies to identify opportunities for advancing deep learning towards more robust, generalizable, and clinically impactful solutions.