planning - 2026-02-10

Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense

Authors:Jiacheng Liu, Yaxin Luo, Jiacheng Cui, Xinyi Shang, Xiaohan Zhao, Zhiqiang Shen

Date:2026-02-09 18:55:33

The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models, such as Gemini3-Pro-High and GPT-5.2-Xhigh have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles like "Bingo". In response, we introduce Next-Gen CAPTCHAs, a scalable defense framework designed to secure the next-generation web against the advanced agents. Unlike static datasets, our benchmark is built upon a robust data generation pipeline, allowing for large-scale and easily scalable evaluations, notably, for backend-supported types, our system is capable of generating effectively unbounded CAPTCHA instances. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents, offering a scalable and diverse defense mechanism for the agentic era.

From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection

Authors:Zilin Fang, Anxing Xiao, David Hsu, Gim Hee Lee

Date:2026-02-09 18:46:12

Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform real-time adaptation in diverse human-robot interaction contexts. Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions. Project page: https://path-etiquette.github.io

stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation

Authors:Lucas Maes, Quentin Le Lidec, Dan Haramati, Nassim Massaudi, Damien Scieur, Yann LeCun, Randall Balestriero

Date:2026-02-09 18:04:22

World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardization. To mitigate these issues, we introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem that provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations. In addition, each environment in SWM enables controllable factors of variation, including visual and physical properties, to support robustness and continual learning research. Finally, we demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.

Any-to-All MRI Synthesis: A Unified Foundation Model for Nasopharyngeal Carcinoma and Its Downstream Applications

Authors:Yao Pu, Yiming Shi, Zhenxi Zhang, Peixin Yu, Yitao Zhuang, Xiang Wang, Hongzhao Chen, Jing Cai, Ge Ren

Date:2026-02-09 15:56:53

Magnetic resonance imaging (MRI) is essential for nasopharyngeal carcinoma (NPC) radiotherapy (RT), but practical constraints, such as patient discomfort, long scan times, and high costs often lead to incomplete modalities in clinical practice, compromising RT planning accuracy. Traditional MRI synthesis methods are modality-specific, limited in anatomical adaptability, and lack clinical interpretability-failing to meet NPC's RT needs. Here, we developed a unified foundation model integrating contrastive visual representation learning and vision-language alignment (VLA) to enable any-to-all MRI synthesis. The model uses a contrastive encoder for modality-invariant representations and a CLIP-based text-informed decoder for semantically consistent synthesis, supporting any-to-all MRI synthesis via one unified foundation model. Trained on 40,825 images from 13 institutions, it achieves consistently high performance (average SSIM 0.90, PSNR 27) across 26 internal/external validation sites (15,748 images), with superior synthesis fidelity and robustness to noise and domain shifts. Meanwhile, its unified representation enhances downstream RT-relevant tasks (e.g., segmentation). This work advances digital medicine solutions for NPC care by leveraging foundation models to bridge technical synthesis and clinical utility.

A Generic Service-Oriented Function Offloading Framework for Connected Automated Vehicles

Authors:Robin Dehler, Michael Buchholz

Date:2026-02-09 15:39:28

Function offloading is a promising solution to address limitations concerning computational capacity and available energy of Connected Automated Vehicles~(CAVs) or other autonomous robots by distributing computational tasks between local and remote computing devices in form of distributed services. This paper presents a generic function offloading framework that can be used to offload an arbitrary set of computational tasks with a focus on autonomous driving. To provide flexibility, the function offloading framework is designed to incorporate different offloading decision making algorithms and quality of service~(QoS) requirements that can be adjusted to different scenarios or the objectives of the CAVs. With a focus on the applicability, we propose an efficient location-based approach, where the decision whether tasks are processed locally or remotely depends on the location of the CAV. We apply the proposed framework on the use case of service-oriented trajectory planning, where we offload the trajectory planning task of CAVs to a Multi-Access Edge Computing~(MEC) server. The evaluation is conducted in both simulation and real-world application. It demonstrates the potential of the function offloading framework to guarantee the QoS for trajectory planning while improving the computational efficiency of the CAVs. Moreover, the simulation results also show the adaptability of the framework to diverse scenarios involving simultaneous offloading requests from multiple CAVs.

Intermediate Results on the Complexity of STRIPS$_{1}^{1}$

Authors:Stefan Edelkamp, Jiří Fink, Petr Gregor, Anders Jonsson, Bernhard Nebel

Date:2026-02-09 14:21:10

This paper is based on Bylander's results on the computational complexity of propositional STRIPS planning. He showed that when only ground literals are permitted, determining plan existence is PSPACE-complete even if operators are limited to two preconditions and two postconditions. While NP-hardness is settled, it is unknown whether propositional STRIPS with operators that only have one precondition and one effect is NP-complete. We shed light on the question whether this small solution hypothesis for STRIPS$^1_1$ is true, calling a SAT solver for small instances, introducing the literal graph, and mapping it to Petri nets.

OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval

Authors:Teng Wang, Rong Shan, Jianghao Lin, Junjie Wu, Tianyi Xu, Jianping Zhang, Wenteng Chen, Changwang Zhang, Zhaoxiang Wang, Weinan Zhang, Jun Wang

Date:2026-02-09 12:44:56

Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval, which suffers from single-model myopia, and heuristic agentic retrieval, which is limited by suboptimal, trial-and-error orchestration. To this end, we propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval. We are the first to reformulate agentic CIR from a heuristic search process into a principled trajectory optimization problem. Instead of relying on heuristic trial-and-error exploration, OSCAR employs a novel offline-online paradigm. In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem, mathematically deriving optimal trajectories that maximize ground-truth coverage for training samples via rigorous boolean set operations. These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner at online inference time. Extensive experiments on three public benchmarks and a private industrial benchmark show that OSCAR consistently outperforms SOTA baselines. Notably, it achieves superior performance using only 10% of training data, demonstrating strong generalization of planning logic rather than dataset-specific memorization.

Mimic Intent, Not Just Trajectories

Authors:Renming Huang, Chendong Zeng, Wenjing Tang, Jingtian Cai, Cewu Lu, Panpan Cai

Date:2026-02-09 12:44:35

While imitation learning (IL) has achieved impressive success in dexterous manipulation through generative modeling and pretraining, state-of-the-art approaches like Vision-Language-Action (VLA) models still struggle with adaptation to environmental changes and skill transfer. We argue this stems from mimicking raw trajectories without understanding the underlying intent. To address this, we propose explicitly disentangling behavior intent from execution details in end-2-end IL: \textit{``Mimic Intent, Not just Trajectories'' (MINT)}. We achieve this via \textit{multi-scale frequency-space tokenization}, which enforces a spectral decomposition of action chunk representation. We learn action tokens with a multi-scale coarse-to-fine structure, and force the coarsest token to capture low-frequency global structure and finer tokens to encode high-frequency details. This yields an abstract \textit{Intent token} that facilitates planning and transfer, and multi-scale \textit{Execution tokens} that enable precise adaptation to environmental dynamics. Building on this hierarchy, our policy generates trajectories through \textit{next-scale autoregression}, performing progressive \textit{intent-to-execution reasoning}, thus boosting learning efficiency and generalization. Crucially, this disentanglement enables \textit{one-shot transfer} of skills, by simply injecting the Intent token from a demonstration into the autoregressive generation process. Experiments on several manipulation benchmarks and on a real robot demonstrate state-of-the-art success rates, superior inference efficiency, robust generalization against disturbances, and effective one-shot transfer.

EvoCorps: An Evolutionary Multi-Agent Framework for Depolarizing Online Discourse

Authors:Ning Lin, Haolun Li, Mingshu Liu, Chengyun Ruan, Kaibo Huang, Yukun Wei, Zhongliang Yang, Linna Zhou

Date:2026-02-09 11:24:17

Polarization in online discourse erodes social trust and accelerates misinformation, yet technical responses remain largely diagnostic and post-hoc. Current governance approaches suffer from inherent latency and static policies, struggling to counter coordinated adversarial amplification that evolves in real-time. We present EvoCorps, an evolutionary multi-agent framework for proactive depolarization. EvoCorps frames discourse governance as a dynamic social game and coordinates roles for monitoring, planning, grounded generation, and multi-identity diffusion. A retrieval-augmented collective cognition core provides factual grounding and action--outcome memory, while closed-loop evolutionary learning adapts strategies as the environment and attackers change. We implement EvoCorps on the MOSAIC social-AI simulation platform for controlled evaluation in a multi-source news stream with adversarial injection and amplification. Across emotional polarization, viewpoint extremity, and argumentative rationality, EvoCorps improves discourse outcomes over an adversarial baseline, pointing to a practical path from detection and post-hoc mitigation to in-process, closed-loop intervention. The code is available at https://github.com/ln2146/EvoCorps.

Identifying Host Galaxies of Binary Black Hole Mergers with Next-Generation Gravitational Wave Detector Networks

Authors:Sumedha Biswas, Andrew Levan, Peter G. Jonker, Kendall Ackley, Gregory Ashton, Nikhil Sarin

Date:2026-02-09 10:10:34

Identifying the host galaxy of a binary black hole (BBH) merger detected via gravitational waves (GWs) remains a challenge due to the absence of electromagnetic counterparts and the large localization volumes produced by current-generation detectors. A confident host association would provide stellar population properties to constrain BBH formation channels and enable measurements of cosmological parameters such as the Hubble constant, H0. We simulate BBH mergers in nearby (z<0.25) host galaxies to evaluate the feasibility of host identification with future GW detector networks, including configurations with the planned LIGO-India detector and third-generation detectors such as the Einstein Telescope (ET) and Cosmic Explorer (CE). We construct two injection grids to explore variations in BBH mass, distance, and directional sensitivity, and infer localization volumes using the Fisher Information Matrix (FIM)-based parameter estimation implemented through BILBY. To assess the prospects for unique host identification, we introduce a set of diagnostics: theoretical comoving volume thresholds for galaxies of a given stellar mass, derived from galaxy stellar mass functions, a metallicity-based volume threshold motivated by progenitor environment models, stellar mass fractions to quantify candidate host prominence, and the probability of chance alignment (p_c). These metrics provide ways to evaluate host associations and constrain BBH formation channels. We find that future networks that include ET and CE localize BBH mergers to volumes smaller than those theoretical thresholds, implying potentially unique host identification, out to ~1000 Mpc at a rate of ~100 yr^{-1}. While associations for individual events may remain uncertain, our framework is well-suited to population-level analyses, enabling constraints on BBH formation scenarios in the era of next-generation GW detector networks.

Graph-Loc: Robust Graph-Based LiDAR Pose Tracking with Compact Structural Map Priors under Low Observability and Occlusion

Authors:Wentao Zhao, Yihe Niu, Zikun Chen, Rui Li, Yanbo Wang, Tianchen Deng, Jingchuan Wang

Date:2026-02-09 09:19:47

Map-based LiDAR pose tracking is essential for long-term autonomous operation, where onboard map priors need be compact for scalable storage and fast retrieval, while online observations are often partial, repetitive, and heavily occluded. We propose Graph-Loc, a graph-based localization framework that tracks the platform pose against compact structural map priors represented as a lightweight point-line graph. Such priors can be constructed from heterogeneous sources commonly available in practice, including polygon outlines vectorized from occupancy/grid maps and CAD/model/floor-plan layouts. For each incoming LiDAR scan, Graph-Loc extracts sparse point and line primitives to form an observation graph, retrieves a pose-conditioned visible subgraph via LiDAR ray simulation, and performs scan-to-map association through unbalanced optimal transport with a local graph-context regularizer. The unbalanced formulation relaxes mass conservation, improving robustness to missing, spurious, and fragmented structures under occlusion. To enhance stability in low-observability segments, we estimate information anisotropy from the refinement normal matrix and defer updates along weakly constrained directions until sufficient constraints reappear. Experiments on public benchmarks, controlled stress tests, and real-world deployments demonstrate accurate and stable tracking with KB-level priors from heterogeneous map sources, including under geometrically degenerate and sustained occlusion and in the presence of gradual scene changes.

BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models

Authors:Xin Wu, Zhixuan Liang, Yue Ma, Mengkang Hu, Zhiyuan Qin, Xiu Li

Date:2026-02-09 08:47:14

Multimodal Large Language Models (MLLMs) have significantly advanced embodied AI, and using them to benchmark robotic intelligence has become a pivotal trend. However, existing frameworks remain predominantly confined to single-arm manipulation, failing to capture the spatio-temporal coordination required for bimanual tasks like lifting a heavy pot. To address this, we introduce BiManiBench, a hierarchical benchmark evaluating MLLMs across three tiers: fundamental spatial reasoning, high-level action planning, and low-level end-effector control. Our framework isolates unique bimanual challenges, such as arm reachability and kinematic constraints, thereby distinguishing perceptual hallucinations from planning failures. Analysis of over 30 state-of-the-art models reveals that despite high-level reasoning proficiency, MLLMs struggle with dual-arm spatial grounding and control, frequently resulting in mutual interference and sequencing errors. These findings suggest the current paradigm lacks a deep understanding of mutual kinematic constraints, highlighting the need for future research to focus on inter-arm collision-avoidance and fine-grained temporal sequencing.

T2VTree: User-Centered Visual Analytics for Agent-Assisted Thought-to-Video Authoring

Authors:Zhuoyun Zheng, Yu Dong, Gaorong Liang, Guan Li, Guihua Shan, Shiyu Cheng, Dong Tian, Jianlong Zhou, Jie Liang

Date:2026-02-09 08:06:24

Generative models have substantially expanded video generation capabilities, yet practical thought-to-video creation remains a multi-stage, multi-modal, and decision-intensive process. However, existing tools either hide intermediate decisions behind repeated reruns or expose operator-level workflows that make exploration traces difficult to manage, compare, and reuse. We present T2VTree, a user-centered visual analytics approach for agent-assisted thought-to-video authoring. T2VTree represents the authoring process as a tree visualization. Each node in the tree binds an editable specification (intent, referenced inputs, workflow choice, prompts, and parameters) with the resulting multimodal outputs, making refinement, branching, and provenance inspection directly operable. To reduce the burden of deciding what to do next, a set of collaborating agents translates step-level intent into an executable plan that remains visible and user-editable before execution. We further implement a visual analytics system that integrates branching authoring with in-place preview and stitching for convergent assembly, enabling end-to-end multi-scene creation without leaving the authoring context. We demonstrate T2VTreeVA through two multi-scene case studies and a comparative user study, showing how the T2VTree visualization and editable agent planning support reliable refinement, localized comparison, and practical reuse in real authoring workflows. T2VTree is available at: https://github.com/tezuka0210/T2VTree.

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints

Authors:Zexuan Wang, Chenghao Yang, Yingqi Que, Zhenzhu Yang, Huaqing Yuan, Yiwen Wang, Zhengxuan Jiang, Shengjie Fang, Zhenhe Wu, Zhaohui Wang, Zhixin Yao, Jiashuo Liu, Jincheng Ren, Yuzhen Li, Yang Yang, Jiaheng Liu, Jian Yang, Zaiyuan Wang, Ge Zhang, Zhoufutu Wen, Wenhao Huang

Date:2026-02-09 08:03:30

Real-world autonomous planning requires coordinating tightly coupled constraints where a single decision dictates the feasibility of all subsequent actions. However, existing benchmarks predominantly feature loosely coupled constraints solvable through local greedy decisions and rely on idealized data, failing to capture the complexity of extracting parameters from dynamic web environments. We introduce \textbf{WorldTravel}, a benchmark comprising 150 real-world travel scenarios across 5 cities that demand navigating an average of 15+ interdependent temporal and logical constraints. To evaluate agents in realistic deployments, we develop \textbf{WorldTravel-Webscape}, a multi-modal environment featuring over 2,000 rendered webpages where agents must perceive constraint parameters directly from visual layouts to inform their planning. Our evaluation of 10 frontier models reveals a significant performance collapse: even the state-of-the-art GPT-5.2 achieves only 32.67\% feasibility in text-only settings, which plummets to 19.33\% in multi-modal environments. We identify a critical Perception-Action Gap and a Planning Horizon threshold at approximately 10 constraints where model reasoning consistently fails, suggesting that perception and reasoning remain independent bottlenecks. These findings underscore the need for next-generation agents that unify high-fidelity visual perception with long-horizon reasoning to handle brittle real-world logistics.

OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration

Authors:Qi Guo, Jianing Wang, Deyang Kong, Xiangyu Xi, Jianfei Zhang, Yi Lu, Jingang Wang, Wei Wang, Shikun Zhang, Wei Ye

Date:2026-02-09 07:29:13

Parallel thinking has emerged as a new paradigm for large reasoning models (LRMs) in tackling complex problems. Recent methods leverage Reinforcement Learning (RL) to enhance parallel thinking, aiming to address the limitations in computational resources and effectiveness encountered with supervised fine-tuning. However, most existing studies primarily focus on optimizing the aggregation phase, with limited attention to the path exploration stage. In this paper, we theoretically analyze the optimization of parallel thinking under the Reinforcement Learning with Verifiable Rewards (RLVR) setting, and identify that the mutual information bottleneck among exploration paths fundamentally restricts overall performance. To address this, we propose Outline-Guided Path Exploration (OPE), which explicitly partitions the solution space by generating diverse reasoning outlines prior to parallel path reasoning, thereby reducing information redundancy and improving the diversity of information captured across exploration paths. We implement OPE with an iterative RL strategy that optimizes outline planning and outline-guided reasoning independently. Extensive experiments across multiple challenging mathematical benchmarks demonstrate that OPE effectively improves reasoning performance in different aggregation strategies, enabling LRMs to more reliably discover correct solutions.

Vec-QMDP: Vectorized POMDP Planning on CPUs for Real-Time Autonomous Driving

Authors:Xuanjin Jin, Yanxin Dong, Bin Sun, Huan Xu, Zhihui Hao, XianPeng Lang, Panpan Cai

Date:2026-02-09 07:15:19

Planning under uncertainty for real-world robotics tasks, such as autonomous driving, requires reasoning in enormous high-dimensional belief spaces, rendering the problem computationally intensive. While parallelization offers scalability, existing hybrid CPU-GPU solvers face critical bottlenecks due to host-device synchronization latency and branch divergence on SIMT architectures, limiting their utility for real-time planning and hindering real-robot deployment. We present Vec-QMDP, a CPU-native parallel planner that aligns POMDP search with modern CPUs' SIMD architecture, achieving $227\times$--$1073\times$ speedup over state-of-the-art serial planners. Vec-QMDP adopts a Data-Oriented Design (DOD), refactoring scattered, pointer-based data structures into contiguous, cache-efficient memory layouts. We further introduce a hierarchical parallelism scheme: distributing sub-trees across independent CPU cores and SIMD lanes, enabling fully vectorized tree expansion and collision checking. Efficiency is maximized with the help of UCB load balancing across trees and a vectorized STR-tree for coarse-level collision checking. Evaluated on large-scale autonomous driving benchmarks, Vec-QMDP achieves state-of-the-art planning performance with millisecond-level latency, establishing CPUs as a high-performance computing platform for large-scale planning under uncertainty.

Controlled Flight of an Insect-Scale Flapping-Wing Robot via Integrated Onboard Sensing and Computation

Authors:Yi-Hsuan Hsiao, Quang Phuc Kieu, Zhongtao Guan, Suhan Kim, Jiaze Cai, Owen Matteson, Jonathan P. How, Elizabeth Farrell Helbling, YuFeng Chen

Date:2026-02-09 07:03:53

Aerial insects can effortlessly navigate dense vegetation, whereas similarly sized aerial robots typically depend on offboard sensors and computation to maintain stable flight. This disparity restricts insect-scale robots to operation within motion capture environments, substantially limiting their applicability to tasks such as search-and-rescue and precision agriculture. In this work, we present a 1.29-gram aerial robot capable of hovering and tracking trajectories with solely onboard sensing and computation. The combination of a sensor suite, estimators, and a low-level controller achieved centimeter-scale positional flight accuracy. Additionally, we developed a hierarchical controller in which a human operator provides high-level commands to direct the robot's motion. In a 30-second flight experiment conducted outside a motion capture system, the robot avoided obstacles and ultimately landed on a sunflower. This level of sensing and computational autonomy represents a significant advancement for the aerial microrobotics community, further opening opportunities to explore onboard planning and power autonomy.

Personalized Autonomous Driving via Optimal Control with Clearance Constraints from Questionnaires

Authors:Yongjae Lim, Dabin Kim, H. Jin Kim

Date:2026-02-09 07:00:36

Driving without considering the preferred separation distance from surrounding vehicles may cause discomfort for users. To address this limitation, we propose a planning framework that explicitly incorporates user preferences regarding the desired level of safe clearance from surrounding vehicles. We design a questionnaire purposefully tailored to capture user preferences relevant to our framework, while minimizing unnecessary questions. Specifically, the questionnaire considers various interaction-relevant factors, including the surrounding vehicle's size, speed, position, and maneuvers of surrounding vehicles, as well as the maneuvers of the ego vehicle. The response indicates the user-preferred clearance for the scenario defined by the question and is incorporated as constraints in the optimal control problem. However, it is impractical to account for all possible scenarios that may arise in a driving environment within a single optimal control problem, as the resulting computational complexity renders real-time implementation infeasible. To overcome this limitation, we approximate the original problem by decomposing it into multiple subproblems, each dealing with one fixed scenario. We then solve these subproblems in parallel and select one using the cost function from the original problem. To validate our work, we conduct simulations using different user responses to the questionnaire. We assess how effectively our planner reflects user preferences compared to preference-agnostic baseline planners by measuring preference alignment.

Comparing Mixture, Box, and Wasserstein Ambiguity Sets in Distributionally Robust Asset Liability Management

Authors:Alireza Ghahtarani, Ahmed Saif, Alireza Ghasemi

Date:2026-02-09 03:03:51

Asset Liability Management (ALM) represents a fundamental challenge for financial institutions, particularly pension funds, which must navigate the tension between generating competitive investment returns and ensuring the solvency of long-term obligations. To address the limitations of traditional frameworks under uncertainty, this paper implements Distributionally Robust Optimization (DRO), an emergent paradigm that accounts for a broad spectrum of potential probability distributions. We propose and evaluate three distinct DRO formulations: mixture ambiguity sets with discrete scenarios, box ambiguity sets of discrete distribution functions, and Wasserstein metric ambiguity sets. Utilizing empirical data from the Canada Pension Plan (CPP), we conduct a comparative analysis of these models against traditional stochastic programming approaches. Our results demonstrate that DRO formulations, specifically those utilizing Wasserstein and box ambiguity sets, consistently outperform both mixture-based DRO and stochastic programming in terms of funding ratios and overall fund returns. These findings suggest that incorporating distributional robustness significantly enhances the resilience and performance of pension fund management strategies.

ByteHouse: A Cloud-Native OLAP Engine with Incremental Computation and Multi-Modal Retrieval

Authors:Yuxing Han, Yu Lin, Yifeng Dong, Xuanhe Zhou, Xindong Peng, Xinhui Tian, Zhiyuan You, Yingzhong Guo, Xi Chen, Weiping Qu, Tao Meng, Dayue Gao, Haoyu Wang, Liuxi Wei, Huanchen Zhang, Fan Wu

Date:2026-02-09 03:01:00

With the rapid rise of intelligent data services, modern enterprises increasingly require efficient, multimodal, and cost-effective data analytics infrastructures. However, in ByteDance's production environments, existing systems fall short due to limitations such as I/O-inefficient multimodal storage, inflexible query optimization (e.g., failing to optimize multimodal access patterns), and performance degradation caused by resource disaggregation (e.g., loss of data locality in remote storage). To address these challenges, we introduce ByteHouse (https://bytehouse.cloud), a cloud-native data warehouse designed for real-time multimodal data analytics. The storage layer integrates a unified table engine that provides a two-tier logical abstraction and physically consistent layout, SSD-backed cluster-scale cache (CrossCache) that supports shared caching across compute nodes, and virtual file system (NexusFS) that enable efficient local access on compute nodes. The compute layer supports analytical, batch, and incremental execution modes, with tailored optimizations for hybrid queries (e.g., runtime filtering over tiered vector indexes). The control layer coordinates global metadata and transactions, and features an effective optimizer enhanced by historical execution traces and AI-assisted plan selection. Evaluations on internal and standard workloads show that ByteHouse achieves significant efficiency improvement over existing systems.

Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning

Authors:Milan Ganai, Katie Luo, Jonas Frey, Clark Barrett, Marco Pavone

Date:2026-02-09 00:10:17

Embodied Chain-of-Thought (CoT) reasoning has significantly enhanced Vision-Language-Action (VLA) models, yet current methods rely on rigid templates to specify reasoning primitives (e.g., objects in the scene, high-level plans, structural affordances). These templates can force policies to process irrelevant information that distracts from critical action-prediction signals. This creates a bottleneck: without successful policies, we cannot verify reasoning quality; without quality reasoning, we cannot build robust policies. We introduce R&B-EnCoRe, which enables models to bootstrap embodied reasoning from internet-scale knowledge through self-supervised refinement. By treating reasoning as a latent variable within importance-weighted variational inference, models can generate and distill a refined reasoning training dataset of embodiment-specific strategies without external rewards, verifiers, or human annotation. We validate R&B-EnCoRe across manipulation (Franka Panda in simulation, WidowX in hardware), legged navigation (bipedal, wheeled, bicycle, quadruped), and autonomous driving embodiments using various VLA architectures with 1B, 4B, 7B, and 30B parameters. Our approach achieves 28% gains in manipulation success, 101% improvement in navigation scores, and 21% reduction in collision-rate metric over models that indiscriminately reason about all available primitives. R&B-EnCoRe enables models to distill reasoning that is predictive of successful control, bypassing manual annotation engineering while grounding internet-scale knowledge in physical execution.

From Stochastic Shocks to Macroscopic Tails: The Moyal Distribution as a Unified Framework for Epidemic Dynamics

Authors:Jose de Jesus Bernal-Alvarado, David Delepine

Date:2026-02-08 19:47:22

Traditional epidemiological models often fail to characterize the extreme volatility and heavy-tailed "Dragon King" events observed in real-world outbreaks. We propose a unified framework that bridges microscopic agent-based simulations with macroscopic wave decomposition using the Moyal probability density function. By treating viral transmission as a stochastic collision process, we derive a Moyal-Poisson mixture that describes secondary case distributions. Our model successfully recovers the extreme ``superspreading'' events in SARS, MERS, and COVID-19 data that standard Negative Binomial models systematically miss. Furthermore, we apply spectral decomposition to pandemic waves in Germany, demonstrating that the macroscopic "Social Friction" ($β$) is a direct emergent property of microscopic "Collision Shocks". This framework provides a useful descriptive tool for public health planning, emphasizing the need to manage extreme volatility rather than deterministic averages.

The CAPSARII Approach to Cyber-Secure Wearable, Ultra-Low-Power Networked Sensors for Soldier Health Monitoring

Authors:Luciano Bozzi, Christian Celidonio, Umberto Nuzzi, Massimo Biagini, Stefano Cherubin, Asbjørn Djupdal, Tor Andre Haugdahl, Andrea Aliverti, Alessandra Angelucci, Giovanni Agosta, Gerardo Pelosi, Paolo Belluco, Samuele Polistina, Riccardo Volpi, Luigi Malagò, Michael Schneider, Florian Wieczorek, Xabier Eguiluz

Date:2026-02-08 18:51:32

The European Defence Agency's revised Capability Development Plan (CDP) identifies as a priority improving ground combat capabilities by enhancing soldiers' equipment for better protection. The CAPSARII project proposes in innovative wearable system and Internet of Battlefield Things (IoBT) framework to monitor soldiers' physiological and psychological status, aiding tactical decisions and medical support. The CAPSARII system will enhance situational awareness and operational effectiveness by monitoring physiological, movement and environmental parameters, providing real-time tactical decision support through AI models deployed on edge nodes and enable data analysis and comparative studies via cloud-based analytics. CAPSARII also aims at improving usability through smart textile integration, longer battery life, reducing energy consumption through software and hardware optimizations, and address security concerns with efficient encryption and strong authentication methods. This innovative approach aims to transform military operations by providing a robust, data-driven decision support tool.

Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling

Authors:Xihang Yu, Rajat Talak, Lorenzo Shaikewitz, Luca Carlone

Date:2026-02-08 17:04:54

In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.

SOLO: wide-field asteroid light curve monitoring system for SPHEREx

Authors:Bumhoo Lim, Seungwon Choi, Yoonsoo P. Bach, Masateru Ishiguro, Sunho Jin, Carey M. Lisse, Max Mahlke, Jooyeon Geem, Jinguk Seo, Sihu Ahn, Hangbin Jo

Date:2026-02-08 16:13:45

We present the Solar system Objects Light curve Observatory (SOLO), a wide-field, high-cadence optical survey system designed to obtain absolutely calibrated asteroid light curves, converted to the Gaia G-band photometric system, in support of the SPHEREx Solar System Object Catalog (SSOC). SOLO was installed at the Sierra Remote Observatories (SRO) in California, USA, in July 2025 and is optimized for continuous, multi-night monitoring of asteroid brightness variations. We describe the system configuration, remote operation, and data reduction pipeline, and evaluate its optical and photometric performance using commissioning data. SOLO achieves stable photometric calibration across the 11.6 deg^2 field of view and reaches a 10-sigma limiting magnitude of G ~ 17.5 for a 180 sec exposure. Sample asteroid light curves obtained over multiple nights demonstrate consistent absolute photometry at the same rotational phase, validating the estimated performance. Finally, we outline the planned operational use of SOLO in connection with NASA's SPHEREx mission. Full science operations of SOLO are scheduled to begin in January 2026. Using these data, we aim to obtain on the order of 10^3 absolutely calibrated asteroid light curves per year in the Gaia G-band, which will be used to support the construction and scientific utilization of the SPHEREx SSOC.

Feasibility-Guided Planning over Multi-Specialized Locomotion Policies

Authors:Ying-Sheng Luo, Lu-Ching Wang, Hanjaya Mandala, Yu-Lun Chou, Guilherme Christmann, Yu-Chung Chen, Yung-Shun Chan, Chun-Yi Lee, Wei-Chao Chen

Date:2026-02-08 11:58:50

Planning over unstructured terrain presents a significant challenge in the field of legged robotics. Although recent works in reinforcement learning have yielded various locomotion strategies, planning over multiple experts remains a complex issue. Existing approaches encounter several constraints: traditional planners are unable to integrate skill-specific policies, whereas hierarchical learning frameworks often lose interpretability and require retraining whenever new policies are added. In this paper, we propose a feasibility-guided planning framework that successfully incorporates multiple terrain-specific policies. Each policy is paired with a Feasibility-Net, which learned to predict feasibility tensors based on the local elevation maps and task vectors. This integration allows classical planning algorithms to derive optimal paths. Through both simulated and real-world experiments, we demonstrate that our method efficiently generates reliable plans across diverse and challenging terrains, while consistently aligning with the capabilities of the underlying policies.

Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities

Authors:Nur Ahmad Khatim, Mansur Arief

Date:2026-02-08 11:46:40

Securing petroleum infrastructure requires balancing autonomous system efficiency with human judgment for threat escalation, a challenge unaddressed by classical facility location models assuming homogeneous resources. This paper formulates the Human-Robot Co-Dispatch Facility Location Problem (HRCD-FLP), a capacitated facility location variant incorporating tiered infrastructure criticality, human-robot supervision ratio constraints, and minimum utilization requirements. We evaluate command center selection across three technology maturity scenarios. Results show transitioning from conservative (1:3 human-robot supervision) to future autonomous operations (1:10) yields significant cost reduction while maintaining complete critical infrastructure coverage. For small problems, exact methods dominate in both cost and computation time; for larger problems, the proposed heuristic achieves feasible solutions in under 3 minutes with approximately 14% optimality gap where comparison is possible. From systems perspective, our work demonstrate that optimized planning for human-robot teaming is key to achieve both cost-effective and mission-reliable deployments.

Multi-Agent Route Planning as a QUBO Problem

Authors:Renáta Rusnáková, Martin Chovanec, Juraj Gazda

Date:2026-02-08 11:18:45

Multi-Agent Route Planning considers selecting vehicles, each associated with a single predefined route, such that the spatial coverage of a road network is increased while redundant overlaps are limited. This paper gives a formal problem definition, proves NP-hardness by reduction from the Weighted Set Packing problem, and derives a Quadratic Unconstrained Binary Optimization formulation whose coefficients directly encode unique coverage rewards and pairwise overlap penalties. A single penalty parameter controls the coverage-overlap trade-off. We distinguish between a soft regime, which supports multi-objective exploration, and a hard regime, in which the penalty is strong enough to effectively enforce near-disjoint routes. We describe a practical pipeline for generating city instances, constructing candidate routes, building the QUBO matrix, and solving it with an exact mixed-integer solver (Gurobi), simulated annealing, and D-Wave hybrid quantum annealing. Experiments on Barcelona instances with up to 10 000 vehicles reveal a clear coverage-overlap knee and show that Pareto-optimal solutions are mainly obtained under the hard-penalty regime, while D-Wave hybrid solvers and Gurobi achieve essentially identical objective values with only minor differences in runtime as problem size grows.

Dynamic Load Model for Data Centers with Pattern-Consistent Calibration

Authors:Siyu Lu, Chenhan Xiao, Yang Weng

Date:2026-02-08 08:15:00

The rapid growth of data centers has made large electronic load (LEL) modeling increasingly important for power system analysis. Such loads are characterized by fast workload-driven variability and protection-driven disconnection and reconnection behavior that are not captured by conventional load models. Existing data center load modeling includes physics-based approaches, which provide interpretable structure for grid simulation, and data-driven approaches, which capture empirical workload variability from data. However, physics-based models are typically uncalibrated to facility-level operation, while trajectory alignment in data-driven methods often leads to overfitting and unrealistic dynamic behavior. To resolve these limitations, we design the framework to leverage both physics-based structure and data-driven adaptability. The physics-based structure is parameterized to enable data-driven pattern-consistent calibration from real operational data, supporting facility-level grid planning. We further show that trajectory-level alignment is limited for inherently stochastic data center loads. Therefore, we design the calibration to align temporal and statistical patterns using temporal contrastive learning (TCL). This calibration is performed locally at the facility, and only calibrated parameters are shared with utilities, preserving data privacy. The proposed load model is calibrated by real-world operational load data from the MIT Supercloud, ASU Sol, Blue Waters, and ASHRAE datasets. Then it is integrated into the ANDES platform and evaluated on the IEEE 39-bus, NPCC 140-bus, and WECC 179-bus systems. We find that interactions among LELs can fundamentally alter post-disturbance recovery behavior, producing compound disconnection-reconnection dynamics and delayed stabilization that are not captured by uncalibrated load models.

TodoEvolve: Learning to Architect Agent Planning Systems

Authors:Jiaxi Liu, Yanzuo Jiang, Guibin Zhang, Zihan Zhang, Heng Chang, Zhenfei Yin, Qibing Ren, Junchi Yan

Date:2026-02-08 06:37:01

Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.