planning - 2026-02-17

Kalman Filtering Based Flight Management System Modeling for AAM Aircraft

Authors:Balram Kandoria, Aryaman Singh Samyal
Date:2026-02-16 17:29:05

Advanced Aerial Mobility (AAM) operations require strategic flight planning services that predict both spatial and temporal uncertainties to safely validate flight plans against hazards such as weather cells, restricted airspaces, and CNS disruption areas. Current uncertainty estimation methods for AAM vehicles rely on conservative linear models due to limited real-world performance data. This paper presents a novel Kalman Filter-based uncertainty propagation method that models AAM Flight Management System (FMS) architectures through sigmoid-blended measurement noise covariance. Unlike existing approaches with fixed uncertainty thresholds, our method continuously adapts the filter's measurement trust based on progress toward waypoints, enabling FMS correction behavior to emerge naturally. The approach scales proportionally with control inputs and is tunable to match specific aircraft characteristics or route conditions. We validate the method using real ADS-B data from general aviation aircraft divided into training and verification sets. Uncertainty propagation parameters were tuned on the training set, achieving 76% accuracy in predicting arrival times when compared against the verification dataset, demonstrating the method's effectiveness for strategic flight plan validation in AAM operations.

Gauge-independent gravitational waves from a minimal dark $U(1)$ sector with viable dark matter candidates

Authors:Wan-Zhe Feng, Zi-Hui Zhang
Date:2026-02-16 16:00:00

Searches for stochastic gravitational wave backgrounds from first-order phase transitions offer a powerful probe of hidden sectors, but quantitative predictions in gauge theories are obstructed by the gauge dependence of the finite-temperature effective potential and the associated tunneling action. We study a minimal gauged $U(1)$ dark sector containing a dark Higgs and a dark photon, optionally supplemented by a vectorlike dark fermion, coupled to the Standard Model through Higgs portal or kinetic mixing. Using the Nielsen identity together with a controlled derivative expansion and power counting, we construct a gauge-independent effective action in the high- and low-temperature limits, enabling model-intrinsic nucleation dynamics and robust gravitational wave predictions. We perform dedicated Monte Carlo scans in both limits and map viable microscopic parameters to detector-facing peak frequencies and amplitudes, spanning bands relevant to pulsar timing arrays and planned space-based interferometers. In our scans, supercooled transitions typically produce the strongest signals, whereas parametrically high-temperature transitions are comparatively rare and tend to be weak. We further connect the phase transition phenomenology to viable dark matter candidates within the same minimal field content, providing benchmark targets for dark photon dark matter and dark fermion dark matter, and highlighting their complementarity with gravitational wave observables. Overall, our results provide an end-to-end, gauge-independent pipeline from a minimal hidden sector Lagrangian to gravitational wave spectra and cosmologically viable dark matter benchmarks, yielding the most reliable and concrete predictions to date for a minimal gauged $U(1)$ dark sector.

Evolutionarily Primitive Social Entities

Authors:Angelica Kaufmann
Date:2026-02-16 15:40:03

Social entities only exist in virtue of collective acceptance or recognition, or acknowledgement by two or more individuals in the context of joint activities. Joint activities are made possible by the coordination of plans for action, and the coordination of plans for action is made possible by the capacity for collective intentionality. This paper investigates how primitive is the capacity that nonhuman animals have to create social entities, by individuating how primitive is the capacity for collective intentionality. I present a novel argument for the evolutionary primitiveness of social entities, by showing that the collective intentions upon which these social entities are created and shared are metaphysically reducible to the relevant individual intentions.

Excavation of a 69-m diameter and 94-m high cavern for the Hyper-Kamiokande detector

Authors:Y. Asaoka, H. Tanaka, S. Nakayama, K. Abe, K. Ishita, S. Moriyama, M. Shiozawa, K. Horinokuchi, C. Miura, Y. Suzuki, H. Morioka, D. Inagaki, H. Kurose, T. Suido, T. Kobuchi, M. Tobita, M. Utsuno
Date:2026-02-16 15:02:11

The excavation of the Hyper-Kamiokande cavern, 600 m underground, is complete. Measuring 69 m in diameter and 94 m in height, it is among the world's largest rock caverns. A vertically oriented, dome-capped cylindrical design was chosen to optimize cost and performance. Combined with substantial overburden, the geometry posed major engineering challenges. This paper outlines the underground works, main cavern design, excavation plan, and the evolution of support design and construction methods during excavation, namely the information-based (observational) design and construction approach.

Scalable Multi-Robot Path Planning via Quadratic Unconstrained Binary Optimization

Authors:Javier González Villasmil
Date:2026-02-16 14:50:04

Multi-Agent Path Finding (MAPF) remains a fundamental challenge in robotics, where classical centralized approaches exhibit exponential growth in joint-state complexity as the number of agents increases. This paper investigates Quadratic Unconstrained Binary Optimization (QUBO) as a structurally scalable alternative for simultaneous multi-robot path planning. This approach is a robotics-oriented QUBO formulation incorporating BFS-based logical pre-processing (achieving over 95% variable reduction), adaptive penalty design for collision and constraint enforcement, and a time-windowed decomposition strategy that enables execution within current hardware limitations. An experimental evaluation in grid environments with up to four robots demonstrated near-optimal solutions in dense scenarios and favorable scaling behavior compared to sequential classical planning. These results establish a practical and reproducible baseline for future quantum and quantum-inspired multi-robot coordinations.

Qute: Towards Quantum-Native Database

Authors:Muzhi Chen, Xuanhe Zhou, Wei Zhou, Bangrui Xu, Surui Tang, Guoliang Li, Bingsheng He, Yeye He, Yitong Song, Fan Wu
Date:2026-02-16 12:39:46

This paper envisions a quantum database (Qute) that treats quantum computation as a first-class execution option. Unlike prior simulation-based methods that either run quantum algorithms on classical machines or adapt existing databases for quantum simulation, Qute instead (i) compiles an extended form of SQL into gate-efficient quantum circuits, (ii) employs a hybrid optimizer to dynamically select between quantum and classical execution plans, (iii) introduces selective quantum indexing, and (iv) designs fidelity-preserving storage to mitigate current qubit constraints. We also present a three-stage evolution roadmap toward quantum-native database. Finally, by deploying Qute on a real quantum processor (origin_wukong), we show that it outperforms a classical baseline at scale, and we release an open-source prototype at https://github.com/weAIDB/Qute.

Removing Planner Bias in Goal Recognition Through Multi-Plan Dataset Generation

Authors:Mustafa F. Abdelwahed, Felipe Meneguzzi Kin Max Piamolini Gusmao, Joan Espasa
Date:2026-02-16 12:25:35

Autonomous agents require some form of goal and plan recognition to interact in multiagent settings. Unfortunately, all existing goal recognition datasets suffer from a systematical bias induced by the planning systems that generated them, namely heuristic-based forward search. This means that existing datasets lack enough challenge for more realistic scenarios (e.g., agents using different planners), which impacts the evaluation of goal recognisers with respect to using different planners for the same goal. In this paper, we propose a new method that uses top-k planning to generate multiple, different, plans for the same goal hypothesis, yielding benchmarks that mitigate the bias found in the current dataset. This allows us to introduce a new metric called Version Coverage Score (VCS) to measure the resilience of the goal recogniser when inferring a goal based on different sets of plans. Our results show that the resilience of the current state-of-the-art goal recogniser degrades substantially under low observability settings.

GREAT-EER: Graph Edge Attention Network for Emergency Evacuation Responses

Authors:Attila Lischka, Balázs Kulcsár
Date:2026-02-16 12:04:14

Emergency situations that require the evacuation of urban areas can arise from man-made causes (e.g., terrorist attacks or industrial accidents) or natural disasters, the latter becoming more frequent due to climate change. As a result, effective and fast methods to develop evacuation plans are of great importance. In this work, we identify and propose the Bus Evacuation Orienteering Problem (BEOP), an NP-hard combinatorial optimization problem with the goal of evacuating as many people from an affected area by bus in a short, predefined amount of time. The purpose of bus-based evacuation is to reduce congestion and disorder that arises in purely car-focused evacuation scenarios. To solve the BEOP, we propose a deep reinforcement learning-based method utilizing graph learning, which, once trained, achieves fast inference speed and is able to create evacuation routes in fractions of seconds. We can bound the gap of our evacuation plans using an MILP formulation. To validate our method, we create evacuation scenarios for San Francisco using real-world road networks and travel times. We show that we achieve near-optimal solution quality and are further able to investigate how many evacuation vehicles are necessary to achieve certain bus-based evacuation quotas given a predefined evacuation time while keeping run time adequate.

Before the Vicious Cycle Starts: Preventing Burnout Across SOC Roles Through Flow-Aligned Design

Authors:Kashyap Thimmaraju, Duc Anh Hoang, Souradip Nath, Jaron Mink, Gail-Joon Ahn
Date:2026-02-16 10:00:06

The sustainability of Security Operations Centers depends on their people, yet 71% of practitioners report burnout and 24% plan to exit cybersecurity entirely. Flow theory suggests that when job demands misalign with practitioner capabilities, work becomes overwhelming or tedious rather than engaging. Achieving challenge-skill balance begins at hiring: if job descriptions inaccurately portray requirements, organizations risk recruiting underskilled practitioners who face anxiety or overskilled ones who experience boredom. Yet we lack empirical understanding of what current SOC job descriptions actually specify. We analyzed 106 public SOC job postings from November to December 2024 across 35 organizations in 11 countries, covering Analysts (n=17), Incident Responders (n=38), Threat Hunters (n=39), and SOC Managers (n=12). Using Inductive Content Analysis, we coded certifications, technical skills, soft skills, tasks, and experience requirements. Three patterns emerged: (1) Communication skills dominate (50.9% of postings), exceeding SIEM tools (18.9%) or programming (30.2%), suggesting organizations prioritize collaboration over technical capabilities. (2) Certification expectations vary widely: CISSP leads (22.6%), but 43 distinct credentials appear with no universal standard. (3) Technical requirements show consensus: Python dominates programming (27.4%), Splunk leads SIEM platforms (14.2%), and ISO 27001 (13.2%) and NIST (10.4%) are most cited standards. These findings enable organizations to audit job descriptions against empirical baselines, help practitioners identify valued certifications and skills, and allow researchers to validate whether stated requirements align with actual demands. This establishes the foundation for flow-aligned interview protocols and investigation of how AI reshapes requirements. Dataset and codebook: https://git.tu-berlin.de/wosoc-2026/soc-jd-analysis.

MATEO: A Multimodal Benchmark for Temporal Reasoning and Planning in LVLMs

Authors:Gabriel Roccabruna, Olha Khomyn, Giuseppe Riccardi
Date:2026-02-16 09:41:50

AI agents need to plan to achieve complex goals that involve orchestrating perception, sub-goal decomposition, and execution. These plans consist of ordered steps structured according to a Temporal Execution Order (TEO, a directed acyclic graph that ensures each step executes only after its preconditions are satisfied. Existing research on foundational models' understanding of temporal execution is limited to automatically derived annotations, approximations of the TEO as a linear chain, or text-only inputs. To address this gap, we introduce MATEO (MultimodAl Temporal Execution Order), a benchmark designed to assess and improve the temporal reasoning abilities of Large Vision Language Models (LVLMs) required for real-world planning. We acquire a high-quality professional multimodal recipe corpus, authored through a standardized editorial process that decomposes instructions into discrete steps, each paired with corresponding images. We collect TEO annotations as graphs by designing and using a scalable crowdsourcing pipeline. Using MATEO, we evaluate six state-of-the-art LVLMs across model scales, varying language context, multimodal input structure, and fine-tuning strategies.

Replanning Human-Robot Collaborative Tasks with Vision-Language Models via Semantic and Physical Dual-Correction

Authors:Taichi Kato, Takuya Kiyokawa, Namiko Saito, Kensuke Harada
Date:2026-02-16 08:24:19

Human-Robot Collaboration (HRC) plays an important role in assembly tasks by enabling robots to plan and adjust their motions based on interactive, real-time human instructions. However, such instructions are often linguistically ambiguous and underspecified, making it difficult to generate physically feasible and cooperative robot behaviors. To address this challenge, many studies have applied Vision-Language Models (VLMs) to interpret high-level instructions and generate corresponding actions. Nevertheless, VLM-based approaches still suffer from hallucinated reasoning and an inability to anticipate physical execution failures. To address these challenges, we propose an HRC framework that augments a VLM-based reasoning with a dual-correction mechanism: an internal correction model that verifies logical consistency and task feasibility prior to action execution, and an external correction model that detects and rectifies physical failures through post-execution feedback. Simulation ablation studies demonstrate that the proposed method improves the success rate compared to baselines without correction models. Our real-world experiments in collaborative assembly tasks supported by object fixation or tool preparation by an upper body humanoid robot further confirm the framewor's effectiveness in enabling interactive replanning across different collaborative tasks in response to human instructions, validating its practical feasibility.

Multimodal Covariance Steering in Belief Space with Active Probing and Influence for Autonomous Driving

Authors:Devodita Chakravarty, John Dolan, Yiwei Lyu
Date:2026-02-16 08:04:16

Autonomous driving in complex traffic requires reasoning under uncertainty. Common approaches rely on prediction-based planning or risk-aware control, but these are typically treated in isolation, limiting their ability to capture the coupled nature of action and inference in interactive settings. This gap becomes especially critical in uncertain scenarios, where simply reacting to predictions can lead to unsafe maneuvers or overly conservative behavior. Our central insight is that safe interaction requires not only estimating human behavior but also shaping it when ambiguity poses risks. To this end, we introduce a hierarchical belief model that structures human behavior across coarse discrete intents and fine motion modes, updated via Bayesian inference for interpretable multi-resolution reasoning. On top of this, we develop an active probing strategy that identifies when multimodal ambiguity in human predictions may compromise safety and plans disambiguating actions that both reveal intent and gently steer human decisions toward safer outcomes. Finally, a runtime risk-evaluation layer based on Conditional Value-at-Risk (CVaR) ensures that all probing actions remain within human risk tolerance during influence. Our simulations in lane-merging and unsignaled intersection scenarios demonstrate that our approach achieves higher success rates and shorter completion times compared to existing methods. These results highlight the benefit of coupling belief inference, probing, and risk monitoring, yielding a principled and interpretable framework for planning under uncertainty.

When Security Meets Usability: An Empirical Investigation of Post-Quantum Cryptography APIs

Authors:Marthin Toruan, R. D. N. Shakya, Samuel Tseitkin, Raymond K. Zhao, Nalin Arachchilage
Date:2026-02-16 07:59:46

Advances in quantum computing increasingly threaten the security and privacy of data protected by current cryptosystems, particularly those relying on public-key cryptography. In response, the international cybersecurity community has prioritized the implementation of Post-Quantum Cryptography (PQC), a new cryptographic standard designed to resist quantum attacks while operating on classical computers. The National Institute of Standards and Technology (NIST) has already standardized several PQC algorithms and plans to deprecate classical asymmetric schemes, such as RSA and ECDSA, by 2035. Despite this urgency, PQC adoption remains slow, often due to limited developer expertise. Application Programming Interfaces (APIs) are intended to bridge this gap, yet prior research on classical security APIs demonstrates that poor usability of cryptographic APIs can lead developers to introduce vulnerabilities during implementation of the applications, a risk amplified by the novelty and complexity of PQC. To date, the usability of PQC APIs has not been systematically studied. This research presents an empirical evaluation of the usability of the PQC APIs, observing how developers interact with APIs and documentation during software development tasks. The study identifies cognitive factors that influence the developer's performance when working with PQC primitives with minimal onboarding. The findings highlight opportunities across the PQC ecosystem to improve developer-facing guidance, terminology alignment, and workflow examples to better support non-specialists.

MoRL: Reinforced Reasoning for Unified Motion Understanding and Generation

Authors:Hongpeng Wang, Zeyu Zhang, Wenhao Li, Hao Tang
Date:2026-02-16 07:42:45

Human motion understanding and generation are crucial for vision and robotics but remain limited in reasoning capability and test-time planning. We propose MoRL, a unified multimodal motion model trained with supervised fine-tuning and reinforcement learning with verifiable rewards. Our task-specific reward design combines semantic alignment and reasoning coherence for understanding with physical plausibility and text-motion consistency for generation, improving both logical reasoning and perceptual realism. To further enhance inference, we introduce Chain-of-Motion (CoM), a test-time reasoning method that enables step-by-step planning and reflection. We also construct two large-scale CoT datasets, MoUnd-CoT-140K and MoGen-CoT-140K, to align motion sequences with reasoning traces and action descriptions. Experiments on HumanML3D and KIT-ML show that MoRL achieves significant gains over state-of-the-art baselines. Code: https://github.com/AIGeeksGroup/MoRL. Website: https://aigeeksgroup.github.io/MoRL.

TWISTED-RL: Hierarchical Skilled Agents for Knot-Tying without Human Demonstrations

Authors:Guy Freund, Tom Jurgenson, Matan Sudry, Erez Karpas
Date:2026-02-16 07:21:02

Robotic knot-tying represents a fundamental challenge in robotics due to the complex interactions between deformable objects and strict topological constraints. We present TWISTED-RL, a framework that improves upon the previous state-of-the-art in demonstration-free knot-tying (TWISTED), which smartly decomposed a single knot-tying problem into manageable subproblems, each addressed by a specialized agent. Our approach replaces TWISTED's single-step inverse model that was learned via supervised learning with a multi-step Reinforcement Learning policy conditioned on abstract topological actions rather than goal states. This change allows more delicate topological state transitions while avoiding costly and ineffective data collection protocols, thus enabling better generalization across diverse knot configurations. Experimental results demonstrate that TWISTED-RL manages to solve previously unattainable knots of higher complexity, including commonly used knots such as the Figure-8 and the Overhand. Furthermore, the increase in success rates and drop in planning time establishes TWISTED-RL as the new state-of-the-art in robotic knot-tying without human demonstrations.

Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

Authors:Guangyue Peng, Zongchao Chen, Wen Luo, Yuntao Wen, Wei Li, Ruixiang Feng, Ran Le, Chen Yang, Zhenwei An, Yang Song, Tao Zhang, Houfeng Wang
Date:2026-02-16 05:13:06

Reverse Chain-of-Thought Generation (RCG) synthesizes reasoning traces from query-answer pairs, but runs the risk of producing post-hoc rationalizations: when models can see the answer during generation, the answer serves as a cognitive anchor that shapes the entire explanation. We formalize this phenomenon through a three-level measurement hierarchy: lexical, entropic, and probabilistic anchoring, each captures surface artifacts, entropy dynamics, and latent answer dependence, respectively. We analyze semantic suppression, the intuitive mitigation strategy that instructs models to ignore the answer, to find out its counterproduction: while it reduces lexical overlap, it paradoxically increases entropic and probabilistic anchoring. Drawing on Ironic Process Theory from cognitive psychology, we attribute this failure to active monitoring of the forbidden answer, which inadvertently deepens dependence on it. To break this cycle, we propose Structural Skeleton-guided Reasoning (SSR), a two-phase approach that first generates an answer-invariant functional skeleton structure, then uses this skeleton to guide full trace generation. By redirecting the information flow to structural planning rather than answer monitoring, SSR consistently reduces anchoring across all three levels. We further introduce Distilled SSR (SSR-D), which fine-tunes models on teacher-generated SSR traces to ensure reliable structural adherence. Experiments across open-ended reasoning benchmarks demonstrate that SSR-D achieves up to 10% improvement over suppression baselines while preserving out-of-distribution (OOD) generalization.

Accelerating iterative linear equation solver using modified domain-wall fermion matrix in lattice QCD simulations

Authors:Wei-Lun Chen, Issaku Kanamori, Hideo Matsufuru, Hartmut Neff
Date:2026-02-16 04:25:56

Lattice simulations of Quantum Chromodynamics (QCD) enable one to calculate the low-energy properties of the strong interaction among quarks and gluons based on the first principle. The most time-consuming part of the numerical simulations of lattice QCD is typically solving a linear equation for the quark matrix. In particular, a discretized quark formulation called the domain-wall fermion operator requires a high numerical cost, while retaining the lattice version of the chiral symmetry to good precision. The domain-wall operator is defined on a five-dimensional (5D) space extending the four-dimensional (4D) spacetime with an extra fifth coordinate. After solving the linear equation in 5D space, the result vector is projected onto the original 4D space. There is a variant of the domain-wall operator that improves the convergence of the 5D linear equation while unchanging the 4D solution vector. In this paper, we examine how this variant of the domain-wall operator accelerates the iterative linear equation solver in practical setups. We also measure the eigenvalues of the operator and compare the condition number with the convergence of the solver. We use a generic lattice QCD code set Bridge++ that is planned to be released including the improved form of the domain-wall operator examined in this work with code for the GPU.

AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

Authors:Amirali Sajadi, Tu Nguyen, Kostadin Damevski, Preetha Chatterjee
Date:2026-02-15 23:25:14

Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-actionable reports. Automated exploitation systems can help validate these reports; however, existing approaches typically operate in isolation from detection pipelines, failing to leverage readily available metadata such as vulnerability type and source-code location. In this paper, we investigate how reported security vulnerabilities can be assessed in a realistic grey-box exploitation setting that leverages minimal vulnerability metadata, specifically a CWE classification and a vulnerable code location. We introduce Agentic eXploit Engine (AXE), a multi-agent framework for Web application exploitation that maps lightweight detection metadata to concrete exploits through decoupled planning, code exploration, and dynamic execution feedback. Evaluated on the CVE-Bench dataset, AXE achieves a 30% exploitation success rate, a 3x improvement over state-of-the-art black-box baselines. Even in a single-agent configuration, grey-box metadata yields a 1.75x performance gain. Systematic error analysis shows that most failed attempts arise from specific reasoning gaps, including misinterpreted vulnerability semantics and unmet execution preconditions. For successful exploits, AXE produces actionable, reproducible proof-of-concept artifacts, demonstrating its utility in streamlining Web vulnerability triage and remediation. We further evaluate AXE's generalizability through a case study on a recent real-world vulnerability not included in CVE-Bench.

LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces

Authors:Yukang Feng, Jianwen Sun, Zelai Yang, Jiaxin Ai, Chuanhao Li, Zizhen Li, Fanrui Zhang, Kang He, Rui Ma, Jifan Lin, Jie Sun, Yang Xiao, Sizhuo Zhou, Wenxiao Wu, Yiming Liu, Pengfei Liu, Yu Qiao, Shenglin Zhang, Kaipeng Zhang
Date:2026-02-15 23:12:57

Recent advances in AI-assisted programming have empowered agents to execute complex workflows via command-line interfaces, however, existing benchmarks are limited by short task horizons, data contamination from GitHub scraping, and a lack of fine-grained evaluation metrics, fail to rigorously evaluate the long-horizon planning and execution capabilities essential for realistic software engineering. To address these gaps, we introduce LongCLI-Bench, a comprehensive benchmark designed to evaluate agentic capabilities across long-horizon, realistic tasks. We curated 20 high-quality, long-horizon tasks from over 1,000 computer science assignments and real-world workflows, covering four engineering categories: from scratch, feature addition, bug fixing, and refactoring. We propose a dual-set testing protocol for LongCLI-Bench, which measures requirement fulfillment (fail-to-pass) and regression avoidance (pass-to-pass), and incorporates step-level scoring to pinpoint execution failures. Extensive experiments reveal that even state-of-the-art agents achieve pass rates below 20% in LongCLI-Bench. Step-level analysis further indicates that the majority of tasks stall at less than 30% completion, highlighting that critical failures often occur in the early stages. Although self-correction offers marginal gains, human-agent collaboration through plan injection and interactive guidance yields significantly higher improvements. These results highlight that future research must emphasize the development of synergistic human-agent workflows alongside advances in agents' planning and execution capabilities to overcome key challenges in long-horizon task performance.

Autonomous Robotic Tissue Palpation and Abnormalities Characterisation via Ergodic Exploration

Authors:Luca Beber, Edoardo Lamon, Matteo Saveriano, Daniele Fontanelli, Luigi Palopoli
Date:2026-02-15 19:37:21

We propose a novel autonomous robotic palpation framework for real-time elastic mapping during tissue exploration using a viscoelastic tissue model. The method combines force-based parameter estimation using a commercial force/torque sensor with an ergodic control strategy driven by a tailored Expected Information Density, which explicitly biases exploration toward diagnostically relevant regions by jointly considering model uncertainty, stiffness magnitude, and spatial gradients. An Extended Kalman Filter is employed to estimate viscoelastic model parameters online, while Gaussian Process Regression provides spatial modelling of the estimated elasticity, and a Heat Equation Driven Area Coverage controller enables adaptive, continuous trajectory planning. Simulations on synthetic stiffness maps demonstrate that the proposed approach achieves better reconstruction accuracy, enhanced segmentation capability, and improved robustness in detecting stiff inclusions compared to Bayesian Optimisation-based techniques. Experimental validation on a silicone phantom with embedded inclusions emulating pathological tissue regions further corroborates the potential of the method for autonomous tissue characterisation in diagnostic and screening applications.

Path Planning Optimisation for SParse, AwaRe and Cooperative Networked Aerial Robot Teams (SpArC-NARTs): Optimisation Tool and Ground Sensing Coverage Use Cases

Authors:Maria Conceição, António Grilo, Meysam Basiri
Date:2026-02-15 17:40:45

A networked aerial robot team (NART) comprises a group of agents (e.g., unmanned aerial vehicles (UAVs), ground control stations, etc.) interconnected by wireless links. Inter-agent connectivity, even if intermittent (i.e. sparse), enables data exchanges between agents and supports cooperative behaviours in several NART missions. It can benefit online decentralised decision-making and group resilience, particularly when prior knowledge is inaccurate or incomplete. These requirements can be accounted for in the offline mission planning stages to incentivise cooperative behaviours and improve mission efficiency during the NART deployment. This paper proposes a novel path planning tool for a Sparse, Aware, and Cooperative Networked Aerial Robot Team (SpArC-NART) in exploration missions. It simultaneously considers different levels of prior information regarding the environment, limited agent energy, sensing, and communication, as well as distinct NART constitutions. The communication model takes into account the limitations of user-defined radio technology and physical phenomena. The proposed tool aims to maximise the mission goals (e.g., finding one or multiple targets, covering the full area of the environment, etc.), while cooperating with other agents to reduce agent reporting times, increase their global situational awareness (e.g., their knowledge of the environment), and facilitate mission replanning, if required. The developed cooperation mechanism leverages soft-motion constraints and dynamic rewards based on the Value of Movement and the expected communication availability between the agents at each time step. A ground sensing coverage use case was chosen to illustrate the current capabilities of this tool.

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Authors:Zheng Chu, Xiao Wang, Jack Hong, Huiming Fan, Yuqi Huang, Yue Yang, Guohai Xu, Chenxiao Zhao, Cheng Xiang, Shengchao Hu, Dongdong Kuang, Ming Liu, Bing Qin, Xing Yu
Date:2026-02-15 17:04:46

Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.

CORPGEN: Simulating Corporate Environments with Autonomous Digital Employees in Multi-Horizon Task Environments

Authors:Abubakarr Jaye, Nigel Boachie Kumankumah, Chidera Biringa, Anjel Shaileshbhai Patel, Sulaiman Vesal, Dayquan Julienne, Charlotte Siska, Manuel Raúl Meléndez Luján, Anthony Twum-Barimah, Mauricio Velazco, Tianwei Chen
Date:2026-02-15 16:54:34

Long-horizon reasoning is a key challenge for autonomous agents, yet existing benchmarks evaluate agents on single tasks in isolation. Real organizational work requires managing many concurrent long-horizon tasks with interleaving, dependencies, and reprioritization. We introduce Multi-Horizon Task Environments (MHTEs): a distinct problem class requiring coherent execution across dozens of interleaved tasks (45+, 500-1500+ steps) within persistent execution contexts spanning hours. We identify four failure modes that cause baseline CUAs to degrade from 16.7% to 8.7% completion as load scales 25% to 100%, a pattern consistent across three independent implementations. These failure modes are context saturation (O(N) vs O(1) growth), memory interference, dependency complexity (DAGs vs. chains), and reprioritization overhead. We present CorpGen, an architecture-agnostic framework addressing these failures via hierarchical planning for multi-horizon goal alignment, sub-agent isolation preventing cross-task contamination, tiered memory (working, structured, semantic), and adaptive summarization. CorpGen simulates corporate environments through digital employees with persistent identities and realistic schedules. Across three CUA backends (UFO2, OpenAI CUA, hierarchical) on OSWorld Office, CorpGen achieves up to 3.5x improvement over baselines (15.2% vs 4.3%) with stable performance under increasing load, confirming that gains stem from architectural mechanisms rather than specific CUA implementations. Ablation studies show experiential learning provides the largest gains.

Pareto and Bowley Reinsurance Games in Peer-to-Peer Insurance

Authors:Tim J. Boonen, Kenneth Tsz Hin Ng, Tak Wa Ng, Thai Nguyen
Date:2026-02-15 16:37:35

We propose a peer-to-peer (P2P) insurance scheme comprising a risk-sharing pool and a reinsurer. A plan manager determines how risks are allocated among members and ceded to the reinsurer, while the reinsurer sets the reinsurance loading. Our work focuses on the strategic interaction between the plan manager and the reinsurer, and this focus leads to two game-theoretic contract designs: a Pareto design and a Bowley design, for which we derive closed-form optimal contracts. In the Pareto design, cooperation between the reinsurer and the plan manager leads to multiple Pareto-optimal contracts, which are further refined by introducing the notion of coalitional stability. In contrast, the Bowley design yields a unique optimal contract through a leader-follower framework, and we provide a rigorous verification of the individual rationality constraints via pointwise comparisons of payoff vectors. Comparing the two designs, we prove that the Bowley-optimal contract is never Pareto optimal and typically yields lower total welfare. In our numerical examples, the presence of reinsurance improves welfare, especially with Pareto designs and a less risk-averse reinsurer. We further analyze the impact of the single-loading restriction, which disproportionately favors members with riskier losses.

ARport: An Augmented Reality System for Markerless Image-Guided Port Placement in Robotic Surgery

Authors:Zheng Han, Zixin Yang, Yonghao Long, Lin Zhang, Peter Kazanzides, Qi Dou
Date:2026-02-15 13:57:47

Purpose: Precise port placement is a critical step in robot-assisted surgery, where port configuration influences both visual access to the operative field and instrument maneuverability. To bridge the gap between preoperative planning and intraoperative execution, we present ARport, an augmented reality (AR) system that automatically maps pre-planned trocar layouts onto the patient's body surface, providing intuitive spatial guidance during surgical preparation. Methods: ARport, implemented on an optical see-through head-mounted display (OST-HMD), operates without any external sensors or markers, simplifying setup and enhancing workflow integration. It reconstructs the operative scene from RGB, depth, and pose data captured by the OST-HMD, extracts the patient's body surface using a foundation model, and performs surface-based markerless registration to align preoperative anatomical models to the extracted patient's body surface, enabling in-situ visualization of planned trocar layouts. A demonstration video illustrating the overall workflow is available online. Results: In full-scale human-phantom experiments, ARport accurately overlaid pre-planned trocar sites onto the physical phantom, achieving consistent spatial correspondence between virtual plans and real anatomy. Conclusion: ARport provides a fully marker-free and hardware-minimal solution for visualizing preoperative trocar plans directly on the patient's body surface. The system facilitates efficient intraoperative setup and demonstrates potential for seamless integration into routine clinical workflows.

Rigidity-Based Multi-Finger Coordination for Precise In-Hand Manipulation of Force-Sensitive Objects

Authors:Xinan Rong, Changhuang Wan, Aochen He, Xiaolong Li, Gangshan Jing
Date:2026-02-15 11:40:39

Precise in-hand manipulation of force-sensitive objects typically requires judicious coordinated force planning as well as accurate contact force feedback and control. Unlike multi-arm platforms with gripper end effectors, multi-fingered hands rely solely on fingertip point contacts and are not able to apply pull forces, therefore poses a more challenging problem. Furthermore, calibrated torque sensors are lacking in most commercial dexterous hands, adding to the difficulty. To address these challenges, we propose a dual-layer framework for multi-finger coordination, enabling high-precision manipulation of force-sensitive objects through joint control without tactile feedback. This approach solves coordinated contact force planning by incorporating graph rigidity and force closure constraints. By employing a force-to-position mapping, the planned force trajectory is converted to a joint trajectory. We validate the framework on a custom dexterous hand, demonstrating the capability to manipulate fragile objects-including a soft yarn, a plastic cup, and a raw egg-with high precision and safety.

GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

Authors:Yuan Cao, Dezhi Ran, Mengzhou Wu, Yuzhe Guo, Xin Chen, Ang Li, Gang Cao, Gong Zhi, Hao Yu, Linyi Li, Wei Yang, Tao Xie
Date:2026-02-15 10:58:01

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.

S2SServiceBench: A Multimodal Benchmark for Last-Mile S2S Climate Services

Authors:Chenyue Li, Wen Deng, Zhuotao Sun, Mengxi Jin, Hanzhe Cui, Han Li, Shentong Li, Man Kit Yu, Ming Long Lai, Yuhao Yang, Mengqian Lu, Binhang Yuan
Date:2026-02-15 06:46:27

Subseasonal-to-seasonal (S2S) forecasts play an essential role in providing a decision-critical weeks-to-months planning window for climate resilience and sustainability, yet a growing bottleneck is the last-mile gap: translating scientific forecasts into trusted, actionable climate services, requiring reliable multimodal understanding and decision-facing reasoning under uncertainty. Meanwhile, multimodal large language models (MLLMs) and corresponding agentic paradigms have made rapid progress in supporting various workflows, but it remains unclear whether they can reliably generate decision-making deliverables from operational service products (e.g., actionable signal comprehension, decision-making handoff, and decision analysis & planning) under uncertainty. We introduce S2SServiceBench, a multimodal benchmark for last-mile S2S climate services curated from an operational climate-service system to evaluate this capability. S2SServiceBenchcovers 10 service products with about 150+ expert-selected cases in total, spanning six application domains - Agriculture, Disasters, Energy, Finance, Health, and Shipping. Each case is instantiated at three service levels, yielding around 500 tasks and 1,000+ evaluation items across climate resilience and sustainability applications. Using S2SServiceBench, we benchmark state-of-the-art MLLMs and agents, and analyze performance across products and service levels, revealing persistent challenges in S2S service plot understanding and reasoning - namely, actionable signal comprehension, operationalizing uncertainty into executable handoffs, and stable, evidence-grounded analysis and planning for dynamic hazards-while offering actionable guidance for building future climate-service agents.

Prompt-Driven Low-Altitude Edge Intelligence: Modular Agents and Generative Reasoning

Authors:Jiahao You, Ziye Jia, Chao Dong, Qihui Wu
Date:2026-02-15 06:09:04

The large artificial intelligence models (LAMs) show strong capabilities in perception, reasoning, and multi-modal understanding, and can enable advanced capabilities in low-altitude edge intelligence. However, the deployment of LAMs at the edge remains constrained by some fundamental limitations. First, tasks are rigidly tied to specific models, limiting the flexibility. Besides, the computational and memory demands of full-scale LAMs exceed the capacity of most edge devices. Moreover, the current inference pipelines are typically static, making it difficult to respond to real-time changes of tasks. To address these challenges, we propose a prompt-to-agent edge cognition framework (P2AECF), enabling the flexible, efficient, and adaptive edge intelligence. Specifically, P2AECF transforms high-level semantic prompts into executable reasoning workflows through three key mechanisms. First, the prompt-defined cognition parses task intent into abstract and model-agnostic representations. Second, the agent-based modular execution instantiates these tasks using lightweight and reusable cognitive agents dynamically selected based on current resource conditions. Third, the diffusion-controlled inference planning adaptively constructs and refines execution strategies by incorporating runtime feedback and system context. In addition, we illustrate the framework through a representative low-altitude intelligent network use case, showing its ability to deliver adaptive, modular, and scalable edge intelligence for real-time low-altitude aerial collaborations.

An Adaptive Model Selection Framework for Demand Forecasting under Horizon-Induced Degradation to Support Business Strategy and Operations

Authors:Adolfo González, Víctor Parada
Date:2026-02-15 00:24:46

Business environments characterized by structural demand intermittency, high variability, and multi-step planning horizons require robust and reproducible model selection mechanisms. Empirical evidence shows that no forecasting model is universally dominant and that relative rankings vary across error metrics, demand regimes, and forecast horizons, generating ambiguity in multi-SKU decision contexts. This study proposes AHSIV (Adaptive Hybrid Selector for Intermittency and Variability), a horizon-aware and regime-conditioned model selection framework designed to address horizon-induced ranking instability. The proposed approach integrates scaled and absolute error metrics adjusted through a Metric Degradation by Forecast Horizon (MDFH) procedure, structural demand classification, multi-objective Pareto dominance, and hierarchical bias refinement within a unified decision architecture. The empirical evaluation is conducted on the Walmart, M3, M4, and M5 datasets under multiple train-test partition schemes and twelve-step forecasting horizons. Results indicate that AHSIV achieves statistical equivalence with the strongest monometric baseline in terms of aggregated performance while increasing the frequency of horizon-specific best-model selection. The findings demonstrate that model selection in heterogeneous demand environments cannot be treated as a static ranking problem, and that horizon-consistent, structurally adaptive mechanisms provide a principled, operationally coherent solution for multi-SKU forecasting.