planning - 2026-04-23

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Authors:Hyeonwoo Kim, Jeonghwan Kim, Kyungwon Cho, Hanbyul Joo

Date:2026-04-22 17:59:55

Recent advances in video generative models enable the synthesis of realistic human-object interaction videos across a wide range of scenarios and object categories, including complex dexterous manipulations that are difficult to capture with motion capture systems. While the rich interaction knowledge embedded in these synthetic videos holds strong potential for motion planning in dexterous robotic manipulation, their limited physical fidelity and purely 2D nature make them difficult to use directly as imitation targets in physics-based character control. We present DeVI (Dexterous Video Imitation), a novel framework that leverages text-conditioned synthetic videos to enable physically plausible dexterous agent control for interacting with unseen target objects. To overcome the imprecision of generative 2D cues, we introduce a hybrid tracking reward that integrates 3D human tracking with robust 2D object tracking. Unlike methods relying on high-quality 3D kinematic demonstrations, DeVI requires only the generated video, enabling zero-shot generalization across diverse objects and interaction types. Extensive experiments demonstrate that DeVI outperforms existing approaches that imitate 3D human-object interaction demonstrations, particularly in modeling dexterous hand-object interactions. We further validate the effectiveness of DeVI in multi-object scenes and text-driven action diversity, showcasing the advantage of using video as an HOI-aware motion planner.

Failure of ambient closed-set large-deviation upper bounds in entropic optimal transport

Authors:Maja Gwozdz

Date:2026-04-22 17:53:23

Large-deviation upper bounds on compact sets do not, in general, extend to arbitrary closed sets without additional tightness. We show that this obstruction already occurs in static entropic optimal transport. More precisely, we construct a fixed-cost model with continuous cost and nonatomic marginals for which the entropic minimisers converge in total variation to an optimal plan with noncompact support, the known compact-set upper bound remains valid, but the corresponding closed-set upper bound fails on a specific closed subset of the ambient space. For a fixed closed set, we identify the exact tail criterion for passing from compact to closed sets. We show that there does not exist a full large-deviation principle (LDP) on the ambient space at speed $1/\varepsilon$ with an arbitrary lower semicontinuous rate function.

A Hough transform approach to safety-aware scalar field mapping using Gaussian Processes

Authors:Muzaffar Qureshi, Trivikram Satharasi, Tochukwu E. Ogri, Kyle Volle, Rushikesh Kamalapurkar

Date:2026-04-22 17:27:08

This paper presents a framework for mapping unknown scalar fields using a sensor-equipped autonomous robot operating in unsafe environments. The unsafe regions are defined as regions of high-intensity, where the field value exceeds a predefined safety threshold. For safe and efficient mapping of the scalar field, the sensor-equipped robot must avoid high-intensity regions during the measurement process. In this paper, the scalar field is modeled as a sample from a Gaussian process (GP), which enables Bayesian inference and provides closed-form expressions for both the predictive mean and the uncertainty. Concurrently, the spatial structure of the high-intensity regions is estimated in real-time using the Hough transform (HT), leveraging the evolving GP posterior. A safe sampling strategy is then employed to guide the robot towards safe measurement locations, using probabilistic safety guarantees on the evolving GP posterior. The estimated high-intensity regions also facilitate the design of safe motion plans for the robot. The effectiveness of the approach is verified through two numerical simulation studies and an indoor experiment for mapping a light-intensity field using a wheeled mobile robot.

ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

Authors:Yutong Shen, Hangxu Liu, Lei Zhang, Penghui Liu, Yinqi Liu, Liuxiang Yang, Tongtong Feng

Date:2026-04-22 16:07:24

Long-Horizon (LH) tasks in Human-Scene Interaction (HSI) are complex multi-step tasks that require continuous planning, sequential decision-making, and extended execution across domains to achieve the final goal. However, existing methods heavily rely on skill chaining by concatenating pre-trained subtasks, with environment observations and self-state tightly coupled, lacking the ability to generalize to new combinations of environments and skills, failing to complete various LH tasks across domains. To solve this problem, this paper presents ALAS, a cross-domain learning framework for LH tasks via biologically inspired dual-stream disentanglement. Inspired by the brain's "where-what" dual pathway mechanism, ALAS comprises two core modules: i) an environment learning module for spatial understanding, which captures object functions, spatial relationships, and scene semantics, achieving cross-domain transfer through complete environment-self disentanglement; ii) a skill learning module for task execution, which processes self-state information including joint degrees of freedom and motor patterns, enabling cross-skill transfer through independent motor pattern encoding. We conducted extensive experiments on various LH tasks in HSI scenes. Compared with existing methods, ALAS can achieve an average subtasks success rate improvement of 23\% and average execution efficiency improvement of 29\%.

Short-time, Wavelet-inspired Mouse Submovement Detection

Authors:Auejin Ham, Ben Boudaoud

Date:2026-04-22 15:22:09

Submovements are ballistic components of human motion constituting a large part of motor interaction and arising from the cyclical and overlapping cognitive processes of perception, motor planning, and motor execution. Extracting submovements is challenging as the motions tend to overlap, or start before the previous ends. We propose and evaluate use of a wavelet-inspired technique to accurately locate and parameterize submovements from one-dimensional speed time series. Our method employs a self-weighted loss refinement step to identify and improve regions of poor quality of fit, a challenge for simpler wavelet transforms. We demonstrate the accuracy of our method by presenting analysis of ~6,400 1-2s trials of synthetic egocentric camera (first-person shooter) aim data for which we know ground truth, modeled from a similarly sized real data set of 13 users. We compare our method to dual-threshold and the persistence 1D segmentation techniques and note challenges and opportunities for future improvements.

Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

Authors:Zoya Volovikova, Nikita Sorokin, Dmitriy Lukashevskiy, Aleksandr Panov, Alexey Skrynnik

Date:2026-04-22 14:19:23

We introduce SuperIgor, a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, SuperIgor enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. Our approach involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL feedback and preferences. This creates a feedback loop where both the agent and the planner improve jointly. We validate our framework in environments with rich dynamics and stochasticity. Results show that SuperIgor agents adhere to instructions more strictly than baseline methods, while also demonstrating strong generalization to previously unseen instructions.

Reliability as a Design Principle: A Systematic Review and Integrated Framework for Renewable-Based Microgrids

Authors:Mohammed Zeehan Saleheen, Markus Wagner, Reza Razzaghi, Hao Wang

Date:2026-04-22 13:10:39

Reliable operation is a central motivation for deploying renewable-based microgrids. This paper presents a systematic rapid review that positions reliability as the central organizing principle for microgrid design. Specifically, this review systematically synthesizes recent literature to examine how planning assumptions, optimization formulations, operational flexibility mechanisms, and reliability assessment frameworks jointly shape reliability outcomes. The synthesis shows that reliability in renewable-based microgrids is governed primarily by chronological, time-coupled energy adequacy rather than installed capacity alone, with Dunkelflaute events emerging as a key determinant of adequacy failure. Reliability outcomes are shaped by the joint interaction of resource portfolios, storage operating policies, and state trajectories, network features, and protection feasibility under inverter-dominated operation. The review further demonstrates that reliability indices inherited from conventional power systems are poorly suited for renewable-based microgrids, as they compress performance into single dimensions and obscure temporal, spatial, and service-critical risk concentrations. Across optimization practice, reliability is increasingly embedded through multi-objective and constrained formulations; however, persistent gaps remain in representing correlated renewable scarcity, mission-profile-dependent component reliability, and interruption valuation (e.g., value of lost load and customer damage functions) in a consistent and decision-relevant manner. Overall, this review consolidates planning factors, optimization approaches, reliability evaluation methods, and metric suitability into an integrated roadmap for reliability-centered microgrid planning, and outlines future directions toward state-aware, service-oriented planning and assessment frameworks.

Lexicographic Minimum-Violation Motion Planning using Signal Temporal Logic

Authors:Patrick Halder, Lothar Kiltz, Hannes Homburger, Johannes Reuter, Matthias Althoff

Date:2026-04-22 10:49:09

Motion planning for autonomous vehicles often requires satisfying multiple conditionally conflicting specifications. In situations where not all specifications can be met simultaneously, minimum-violation motion planning maintains system operation by minimizing violations of specifications in accordance with their priorities. Signal temporal logic (STL) provides a formal language for rigorously defining these specifications and enables the quantitative evaluation of their violations. However, a total ordering of specifications yields a lexicographic optimization problem, which is typically computationally expensive to solve using standard methods. We address this problem by transforming the multi-objective lexicographic optimization problem into a single-objective scalar optimization problem using non-uniform quantization and bit-shifting. Specifically, we extend a deterministic model predictive path integral (MPPI) solver to efficiently solve optimization problems without quadratic input cost. Additionally, a novel predicate-robustness measure that combines spatial and temporal violations is introduced. Our results show that the proposed method offers an interpretable and scalable solution for lexicographic STL minimum-violation motion planning within a single-objective solver framework.

OVPD: A Virtual-Physical Fusion Testing Dataset of OnSite Auton-omous Driving Challenge

Authors:Yuhang Zhang, Jiarui Zhang, Bowen Jian, Xin Zhou, Zhichao Lv, Peng Hang, Rongjie Yu, Ye Tian, Jian Sun

Date:2026-04-22 10:44:19

The rapid iteration of autonomous driving algorithms has created a growing demand for high-fidelity, replayable, and diagnosable testing data. However, many public datasets lack real vehicle dynamics feedback and closed-loop interaction with surrounding traffic and road infrastructure, limiting their ability to reflect deployment readiness. To address this gap, we present OVPD (OnSite Virtual-Physical Dataset), a virtual-physical fusion testing dataset released from the 2025 OnSite Autonomous Driving Challenge. Centered on real-vehicle-in-the-loop testing, OVPD integrates virtual background traffic with vehicle-infrastructure perception to build controllable and interactive closed-loop test environments on a proving ground. The dataset contains 20 testing clips from 20 teams over a scenario chain of 15 atomic scenarios, totaling nearly 3 hours of multi-modal data, including vehicle trajectories and states, control commands, and digital-twin-rendered surround-view observations. OVPD supports long-tail planning and decision-making validation, open-loop or platform-enabled closed-loop evaluation, and comprehensive assessment across safety, efficiency, comfort, rule compliance, and traffic impact, providing actionable evidence for failure diagnosis and iterative improvement. The dataset is available via: https://huggingface.co/datasets/Yuhang253820/Onsite_OPVD

Cold-Start Forecasting of New Product Life-Cycles via Conditional Diffusion Models

Authors:Ruihan Zhou, Zishi Zhang, Jinhui Han, Yijie Peng, Xiaowei Zhang

Date:2026-04-22 09:07:55

Forecasting the life-cycle trajectory of a newly launched product is important for launch planning, resource allocation, and early risk assessment. This task is especially difficult in the pre-launch and early post-launch phases, when product-specific outcome history is limited or unavailable, creating a cold-start problem. In these phases, firms must make decisions before demand patterns become reliably observable, while early signals are often sparse, noisy, and unstable We propose the Conditional Diffusion Life-cycle Forecaster (CDLF), a conditional generative framework for forecasting new-product life-cycle trajectories under cold start. CDLF combines three sources of information: static descriptors, reference trajectories from similar products, and newly arriving observations when available. Here, static descriptors refer to structured pre-launch characteristics of the product, such as category, price tier, brand or organization identity, scale, and access conditions. This structure allows the model to condition forecasts on relevant product context and to update them adaptively over time without retraining, yielding flexible multi-modal predictive distributions under extreme data scarcity. The method satisfies consistency with a horizon-uniform distributional error bound for recursive generation. Across studies on Intel microprocessor stock keeping unit (SKU) life cycles and the platform-mediated adoption of open large language model repositories, CDLF delivers more accurate point forecasts and higher-quality probabilistic forecasts than classical diffusion models, Bayesian updating approaches, and other state-of-the-art machine-learning baselines.

MambaLiteUNet: Cross-Gated Adaptive Feature Fusion for Robust Skin Lesion Segmentation

Authors:Md Maklachur Rahman, Soon Ki Jung, Tracy Hammond

Date:2026-04-22 07:33:14

Recent segmentation models have demonstrated promising efficiency by aggressively reducing parameter counts and computational complexity. However, these models often struggle to accurately delineate fine lesion boundaries and texture patterns essential for early skin cancer diagnosis and treatment planning. In this paper, we propose MambaLiteUNet, a compact yet robust segmentation framework that integrates Mamba state space modeling into a U-Net architecture, along with three key modules: Adaptive Multi-Branch Mamba Feature Fusion (AMF), Local-Global Feature Mixing (LGFM), and Cross-Gated Attention (CGA). These modules are designed to enhance local-global feature interaction, preserve spatial details, and improve the quality of skip connections. MambaLiteUNet achieves an average IoU of 87.12% and average Dice score of 93.09% across ISIC2017, ISIC2018, HAM10000, and PH2 benchmarks, outperforming state-of-the-art models. Compared to U-Net, our model improves average IoU and Dice by 7.72 and 4.61 points, respectively, while reducing parameters by 93.6% and GFLOPs by 97.6%. Additionally, in domain generalization with six unseen lesion categories, MambaLiteUNet achieves 77.61% IoU and 87.23% Dice, performing best among all evaluated models. Our extensive experiments demonstrate that MambaLiteUNet achieves a strong balance between accuracy and efficiency, making it a competitive and practical solution for dermatological image segmentation. Our code is publicly available at: https://github.com/maklachur/MambaLiteUNet.

Cortex 2.0: Grounding World Models in Real-World Industrial Deployment

Authors:Adriana Aida, Walida Amer, Katarina Bankovic, Dhruv Behl, Fabian Busch, Annie Bhalla, Minh Duong, Florian Gienger, Rohan Godse, Denis Grachev, Ralf Gulde, Elisa Hagensieker, Junpeng Hu, Shivam Joshi, Tobias Knoblauch, Likith Kumar, Damien LaRocque, Keerthana Lokesh, Omar Moured, Khiem Nguyen, Christian Preyss, Ranjith Sriganesan, Vikram Singh, Carsten Sponner, Anh Tong, Dominik Tuscher, Marc Tuscher, Pavan Upputuri

Date:2026-04-22 06:49:12

Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive control to plan-and-act by generating candidate future trajectories in visual latent space, scoring them for expected success and efficiency, then committing only to the highest-scoring candidate. We evaluate Cortex 2.0 on a single-arm and dual-arm manipulation platform across four tasks of increasing complexity: pick and place, item and trash sorting, screw sorting, and shoebox unpacking. Cortex 2.0 consistently outperforms state-of-the-art Vision-Language-Action baselines, achieving the best results across all tasks. The system remains reliable in unstructured environments characterized by heavy clutter, frequent occlusions, and contact-rich manipulation, where reactive policies fail. These results demonstrate that world-model-based planning can operate reliably in complex industrial environments.

Weighted Knowledge Distillation for Semi-Supervised Segmentation of Maxillary Sinus in Panoramic X-ray Images

Authors:Juha Park, Jiho Choi, Jong Pil Yun, Yong Chan Park, Han-Gyeol Yeom, Byung Do Lee, Sang Jun Lee

Date:2026-04-22 06:00:14

Accurate segmentation of maxillary sinus in panoramic X-ray images is essential for dental diagnosis and surgical planning; however, this task remains relatively underexplored in dental imaging research. Structural overlap, ambiguous anatomical boundaries inherent to two-dimensional panoramic projections, and the limited availability of large scale clinical datasets with reliable pixel-level annotations make the development and evaluation of segmentation models challenging. To address these challenges, we propose a semi-supervised segmentation framework that effectively leverages both labeled and unlabeled panoramic radiographs, where knowledge distillation is utilized to train a student model with reliable structural information distilled from a teacher model. Specifically, we introduce a weighted knowledge distillation loss to suppress unreliable distillation signals caused by structural discrepancies between teacher and student predictions. To further enhance the quality of pseudo labels generated by the teacher network, we introduce SinusCycle-GAN which is a refinement network based on unpaired image-to-image translation. This refinement process improves the precision of boundaries and reduces noise propagation when learning from unlabeled data during semi-supervised training. To evaluate the proposed method, we collected clinical panoramic X-ray images from 2,511 patients, and experimental results demonstrate that the proposed method outperforms state-of-the-art segmentation models, achieving the Dice score of 96.35\% while reducing boundary error. The results indicate that the proposed semi-supervised framework provides robust and anatomically consistent segmentation performance under limited labeled data conditions, highlighting its potential for broader dental image analysis applications.

WildFireVQA: A Large-Scale Radiometric Thermal VQA Benchmark for Aerial Wildfire Monitoring

Authors:Mobin Habibpour, Niloufar Alipour Talemi, John Spodnik, Camren J. Khoury, Fatemeh Afghah

Date:2026-04-22 05:11:02

Wildfire monitoring requires timely, actionable situational awareness from airborne platforms, yet existing aerial visual question answering (VQA) benchmarks do not evaluate wildfire-specific multimodal reasoning grounded in thermal measurements. We introduce WildFireVQA, a large-scale VQA benchmark for aerial wildfire monitoring that integrates RGB imagery with radiometric thermal data. WildFireVQA contains 6,097 RGB-thermal samples, where each sample includes an RGB image, a color-mapped thermal visualization, and a radiometric thermal TIFF, and is paired with 34 questions, yielding a total of 207,298 multiple-choice questions spanning presence and detection, classification, distribution and segmentation, localization and direction, cross-modal reasoning, and flight planning for operational wildfire intelligence. To improve annotation reliability, we combine multimodal large language model (MLLM)-based answer generation with sensor-driven deterministic labeling, manual verification, and intra-frame and inter-frame consistency checks. We further establish a comprehensive evaluation protocol for representative MLLMs under RGB, Thermal, and retrieval-augmented settings using radiometric thermal statistics. Experiments show that across task categories, RGB remains the strongest modality for current models, while retrieved thermal context yields gains for stronger MLLMs, highlighting both the value of temperature-grounded reasoning and the limitations of existing MLLMs in safety-critical wildfire scenarios. The dataset and benchmark code are open-source at https://github.com/mobiiin/WildFire_VQA.

Toward Safe Autonomous Robotic Endovascular Interventions using World Models

Authors:Harry Robertshaw, Nikola Fischer, Han-Ru Wu, Andrea Walker Perez, Weiyuan Deng, Benjamin Jackson, Christos Bergeles, Alejandro Granados, Thomas C Booth

Date:2026-04-22 03:32:18

Autonomous mechanical thrombectomy (MT) presents substantial challenges due to highly variable vascular geometries and the requirements for accurate, real-time control. While reinforcement learning (RL) has emerged as a promising paradigm for the automation of endovascular navigation, existing approaches often show limited robustness when faced with diverse patient anatomies or extended navigation horizons. In this work, we investigate a world-model-based framework for autonomous endovascular navigation built on TD-MPC2, a model-based RL method that integrates planning and learned dynamics. We evaluate a TD-MPC2 agent trained on multiple navigation tasks across hold out patient-specific vasculatures and benchmark its performance against the state-of-the-art Soft Actor-Critic (SAC) algorithm agent. Both approaches are further validated in vitro using patient-specific vascular phantoms under fluoroscopic guidance. In simulation, TD-MPC2 demonstrates a significantly higher mean success rate than SAC (58% vs. 36%, p < 0.001), and mean tip contact forces of 0.15 N, well below the proposed 1.5 N vessel rupture threshold. Mean success rates for TD-MPC2 (68%) were comparable to SAC (60%) in vitro, but TD-MPC2 achieved superior path ratios (p = 0.017) at the cost of longer procedure times (p < 0.001). Together, these results provide the first demonstration of autonomous MT navigation validated across both hold out in silico data and fluoroscopy-guided in vitro experiments, highlighting the promise of world models for safe and generalizable AI-assisted endovascular interventions.

AgentSOC: A Multi-Layer Agentic AI Framework for Security Operations Automation

Authors:Joyjit Roy, Samaresh Kumar Singh

Date:2026-04-22 03:01:03

Security Operations Centers (SOCs) increasingly encounter difficulties in correlating heterogeneous alerts, interpreting multi-stage attack progressions, and selecting safe and effective response actions. This study introduces AgentSOC, a multi-layered agentic AI framework that enhances SOC automation by integrating perception, anticipatory reasoning, and risk-based action planning. The proposed architecture consolidates several layers of abstraction to provide a single operational loop to support normalizing alerts, enriching context, generating hypotheses, validating structural feasibility, and executing policy-compliant responses. Conceptually evaluated within a large enterprise environment, AgentSOC improves triage consistency, anticipates attackers' intentions, and provides recommended containment options that are both operationally feasible and well-balanced between security efficacy and operational impact. The results suggest that hybrid agentic reasoning has the potential to serve as a foundation for developing adaptive, safer SOC automation in large enterprises. Additionally, a minimal Proof-Of-Concept (POC) demonstration using LANL authentication data demonstrated the feasibility of the proposed architecture.

Robust Uniform Recovery of Structured Signals from Nonlinear Observations

Authors:Pedro Abdalla, Radu Balan, Junren Chen

Date:2026-04-22 00:47:29

While it is well known that the restricted isometry property (RIP) guarantees uniform sparse recovery from noisy linear measurements, uniform recovery of structured signals from nonlinear observations remains much less understood. This paper shows that the restricted approximate invertibility condition (RAIC) provides a unified approach to this end. Particularly, uniform recovery is achieved by projected gradient descent (PGD) with gradients obeying RAIC for all signals. As an application, under a large class of piecewise Lipschitz link functions (possibly discontinuous), we develop a uniform recovery theory for Gaussian single-index model by establishing the uniform RAIC for the gradient of the (scaled) $\ell_2$ loss via a covering argument. The theory generalizes the nonuniform recovery guarantees due to Plan and Vershynin (2016); Oymak and Soltanolkotabi (2017) and exhibits additional error terms that can be interpreted as the cost of uniform recovery. Intriguingly, in the three canonical settings of (a) sparse recovery via PGD with $\ell_0$ projection (i.e., iterative hard thresholding (IHT)), (b) sparse recovery via PGD with $\ell_1$ projection, and (c) recovering approximately sparse signals via PGD with $\ell_1$ projection, the additional error terms are negligible and in turn our uniform recovery error rates are at the same order of existing nonuniform ones, up to log factors. Our results hence improve on Genzel and Stollenwerk (2023). Under the specific nonlinearity of 1-bit quantization, we use a VC dimension argument to show that the uniform recovery error of IHT is at the same order of the nonuniform recovery error, with no loss of log factor. In addition, we show that the robustness of PGD to noise and corruption can be incorporated elegantly by bounding a single additional random process that captures the gradient mismatch.

On-chain Peak Shaving

Authors:Irene Aldridge, Gavhar Annaeva, Leyla Beriker, Zhiheng Cai, Samyak Choudhary, Camila Godoy, Kaicheng Gong, Zitao Huang, Jonah Ji, Hetvi Kharvasiya, Heng Li, Yuxuan Li, Tianchi Ma, Qingcheng Meng, Ruiyang Shi, Ananya Shrivastava, Jiaqi Wang, Yifan Wang, Zihua Wu, Jiayang Xu, Yuheng Yan, Zijun Zeng, Bowen Zhang, Francesco Zhang

Date:2026-04-21 20:04:50

Blockchain technology is widely expected to reduce transaction costs by automating contract enforcement and eliminating intermediaries; yet, the execution costs imposed by network congestion have received little attention in the operations management literature. We study on-chain peak shaving, the systematic scheduling of Ethereum transactions toward low-congestion windows to reduce gas fee exposure. We use transaction-level data from seven firms across seven industries (N = 62,142 transactions, January-March 2026). Gas fees vary significantly throughout the day: the peak-hour premium at 10 AM Eastern Time reaches USD 0.220 per transaction above the overnight baseline, driven primarily by speculative-arbitrage demand rather than operational activity. Firm-level scheduling responses are heterogeneous and not uniformly disciplined. Only three of seven firms transact disproportionately during off-peak hours; four transact counter-cyclically, concentrated in peak windows due to external deadlines or governance cycles. This heterogeneity is explained by two moderators: transaction deferrability and gas intensity. We formalize these into an On-Chain Scheduling Matrix that maps firms to four regimes: 1) full peak shaving, 2) selective peak shaving, 3) cost provisioning, and 4) accept-market-rate, with regime membership predicting both fee savings and residual cost floors (40-92 percent of actual expenditure). Theoretically, we extend Transaction Cost Economics to account for time-varying execution costs imposed by congestion externalities. In addition to extending Williamson's original cost taxonomy, we introduce a dual classification of gas fees as execution costs in timing but maladaptation costs in origin. The findings reposition on-chain gas-fee management alongside energy procurement and foreign exchange hedging as a domain requiring systematic operational planning.

SpanVLA: Efficient Action Bridging and Learning from Negative-Recovery Samples for Vision-Language-Action Model

Authors:Zewei Zhou, Ruining Yang, Xuewei, Qi, Yiluan Guo, Sherry X. Chen, Tao Feng, Kateryna Pistunova, Yishan Shen, Lili Su, Jiaqi Ma

Date:2026-04-21 17:34:19

Vision-Language-Action (VLA) models offer a promising autonomous driving paradigm for leveraging world knowledge and reasoning capabilities, especially in long-tail scenarios. However, existing VLA models often struggle with the high latency in action generation using an autoregressive generation framework and exhibit limited robustness. In this paper, we propose SpanVLA, a novel end-to-end autonomous driving framework, integrating an autoregressive reasoning and a flow-matching action expert. First, SpanVLA introduces an efficient bridge to leverage the vision and reasoning guidance of VLM to efficiently plan future trajectories using a flow-matching policy conditioned on historical trajectory initialization, which significantly reduces inference time. Second, to further improve the performance and robustness of the SpanVLA model, we propose a GRPO-based post-training method to enable the VLA model not only to learn from positive driving samples but also to learn how to avoid the typical negative behaviors and learn recovery behaviors. We further introduce mReasoning, a new real-world driving reasoning dataset, focusing on complex, reasoning-demanding scenarios and negative-recovery samples. Extensive experiments on the NAVSIM (v1 and v2) demonstrate the competitive performance of the SpanVLA model. Additionally, the qualitative results across diverse scenarios highlight the planning performance and robustness of our model.

Planning in entropy-regularized Markov decision processes and games

Authors:Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

Date:2026-04-21 17:17:33

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Authors:Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

Date:2026-04-21 17:11:48

Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoning plans. Given an artwork and a user query, A-MAR first decomposes the task into a structured reasoning plan that specifies the goals and evidence requirements for each step. Retrieval is then conditionedon this plan, enabling targeted evidence selection and supporting step-wise, grounded explanations. To evaluate agent-based multi- modal reasoning within the art domain, we introduce ArtCoT-QA. This diagnostic benchmark features multi-step reasoning chains for diverse art-related queries, enabling a granular analysis that extends beyond simple final answer accuracy. Experiments on SemArt and Artpedia show that A-MAR consistently outperforms static, non planned retrieval and strong MLLM baselines in final explanation quality, while evaluations on ArtCoT-QA further demonstrate its advantages in evidence grounding and multi-step reasoning ability. These results highlight the importance of reasoning-conditioned retrieval for knowledge-intensive multimodal understanding and position A-MAR as a step toward interpretable, goal-driven AI systems, with particular relevance to cultural industries. The code and data are available at: https://github.com/ShuaiWang97/A-MAR.

Multi-Cycle Spatio-Temporal Adaptation in Human-Robot Teaming

Authors:Alex Cuellar, Michael Hagenow, Julie Shah

Date:2026-04-21 16:49:59

Effective human-robot teaming is crucial for the practical deployment of robots in human workspaces. However, optimizing joint human-robot plans remains a challenge due to the difficulty of modeling individualized human capabilities and preferences. While prior research has leveraged the multi-cycle structure of domains like manufacturing to learn an individual's tendencies and adapt plans over repeated interactions, these techniques typically consider task-level and motion-level adaptation in isolation. Task-level methods optimize allocation and scheduling but often ignore spatial interference in close-proximity scenarios; conversely, motion-level methods focus on collision avoidance while ignoring the broader task context. This paper introduces RAPIDDS, a framework that unifies these approaches by modeling an individual's spatial behavior (motion paths) and temporal behavior (time required to complete tasks) over multiple cycles. RAPIDDS then jointly adapts task schedules and steers diffusion models of robot motions to maximize efficiency and minimize proximity accounting for these individualized models. We demonstrate the importance of this dual adaptation through an ablation study in simulation and a physical robot scenario using a 7-DOF robot arm. Finally, we present a user study (n=32) showing significant plan improvement compared to non-adaptive systems across both objective metrics, such as efficiency and proximity, and subjective measures, including fluency and user preference. See this paper's companion video at: https://youtu.be/55Q3lq1fINs.

A Dual Perspective on Synthetic Trajectory Generators: Utility Framework and Privacy Vulnerabilities

Authors:Aya Cherigui, Florent Guépin, Arnaud Legendre, Jean-François Couchot

Date:2026-04-21 16:42:33

Human mobility data are used in numerous applications, ranging from public health to urban planning. Human mobility is inherently sensitive, as it can contain information such as religious beliefs and political affiliations. Historically, it has been proposed to modify the information using techniques such as aggregation, obfuscation, or noise addition, to adequately protect privacy and eliminate concerns. As these methods come at a great cost in utility, new methods leveraging development in generative models, were introduced. The extent to which such methods answer the privacy-utility trade-off remains an open problem. In this paper, we introduced a first step towards solving it, by the introduction and application of a new framework for utility evaluation. Furthermore, we provide evidence that privacy evaluation remains a great challenge to consider and that it should be tackled through adversarial evaluation in accordance with the current EU regulation. We propose a new membership inference attack against a subcategory of generative models, even though this subcategory was deemed private due to its resistance over the trajectory user-linking problem.

Regulation Zero 2: A Flow-Centric Sequential Regulation Planning Framework to Counter Regulation Cascading in Pre-tactical Air Traffic Flow Management

Authors:Thinh Hoang, Zhengyi Wang, Leila Zerrouki, Daniel Delahaye

Date:2026-04-21 16:30:23

Air Traffic Flow Management (ATFM) traffic regulations are being increasingly used as rising demand meets persistent workforce shortages. This operational strain has amplified a critical phenomenon that we call \emph{regulation cascading}: the compounding, non-linear interactions that occur when multiple regulations influence one another in unpredictable ways. As the number and complexity of regulations grow, cascading effects become more pronounced, undermining the network operator's ability to protect sectors reliably. To address this challenge, we introduce Regulation Zero 2, an updated sequential planning framework that natively operates in the regulation space, optimizing over ordered sequences of flow-level regulations that remain compatible as much as possible with existing slot-allocation systems such as CASA and RBS++. We equipped Regulation Zero 2 with new heuristics to render flow finding more efficient. At its core, the method employs a hierarchical Monte Carlo Tree Search (MCTS) that first samples congestion hotspots and then selects candidate regulations synthesized by a local proposal engine. Each proposal is evaluated by a fast First-Planned-First-Served (FPFS) allocator to estimate its reward, with these feedbacks guiding the subsequent MCTS exploration. Experiments on many pan-European summer-peak traffic days that Regulation Zero delivers promising and consistent performance. Compared to a flight-centric simulated-annealing and NSGA-II baselines, it achieves markedly higher objective improvements, while maintaining a tighter scope of impact on the network. Ablation studies also found that Regulation Cascading could reduce up to 50\% of potential effectiveness. RZ also preserves FPFS fairness and supports expert knowledge injection, offering a pragmatic and low-disruption pathway toward automation in operations.

SafetyALFRED: Evaluating Safety-Conscious Planning of Multimodal Large Language Models

Authors:Josue Torres-Fonseca, Naihao Deng, Yinpei Dai, Shane Storks, Yichi Zhang, Rada Mihalcea, Casey Kennington, Joyce Chai

Date:2026-04-21 16:27:20

Multimodal Large Language Models are increasingly adopted as autonomous agents in interactive environments, yet their ability to proactively address safety hazards remains insufficient. We introduce SafetyALFRED, built upon the embodied agent benchmark ALFRED, augmented with six categories of real-world kitchen hazards. While existing safety evaluations focus on hazard recognition through disembodied question answering (QA) settings, we evaluate eleven state-of-the-art models from the Qwen, Gemma, and Gemini families on not only hazard recognition, but also active risk mitigation through embodied planning. Our experimental results reveal a significant alignment gap: while models can accurately recognize hazards in QA settings, average mitigation success rates for these hazards are low in comparison. Our findings demonstrate that static evaluations through QA are insufficient for physical safety, thus we advocate for a paradigm shift toward benchmarks that prioritize corrective actions in embodied contexts. We open-source our code and dataset under https://github.com/sled-group/SafetyALFRED.git

Active Inference-Enabled Agentic Closed-Loop ISAC with Long-Horizon Planning

Authors:Guangjin Pan, Zhuojun Tian, Mehdi Bennis, Henk Wymeersch

Date:2026-04-21 15:51:56

Wireless agentic systems enable agents to autonomously perceive, reason, and act. However, existing works neglect the tight coupling between sensing and control in closed-loop integrated sensing and communication (ISAC) systems. In this paper, we propose an active inference (AIF)-driven wireless agentic system for closed-loop ISAC, which jointly optimizes control and sensing resource allocation via backward--forward message passing on a factor graph. The AIF agent maintains a generative model as a digital twin by integrating a localization model for uncertainty-aware state inference and a localization channel knowledge map (CKM) for approximating observation quality during planning. Simulation results demonstrate that the AIF-enabled agent adaptively allocates sensing resources based on spatially varying channel conditions, achieving superior balance among tracking accuracy, control effort, and sensing resource consumption over baseline strategies.

Paparazzo: Active Mapping of Moving 3D Objects

Authors:Davide Allegro, Shiyao Li, Stefano Ghidoni, Vincent Lepetit

Date:2026-04-21 15:09:10

Current 3D mapping pipelines generally assume static environments, which limits their ability to accurately capture and reconstruct moving objects. To address this limitation, we introduce the novel task of active mapping of moving objects, in which a mapping agent must plan its trajectory while compensating for the object's motion. Our approach, Paparazzo, provides a learning-free solution that robustly predicts the target's trajectory and identifies the most informative viewpoints from which to observe it, to plan its own path. We also contribute a comprehensive benchmark designed for this new task. Through extensive experiments, we show that Paparazzo significantly improves 3D reconstruction completeness and accuracy compared to several strong baselines, marking an important step toward dynamic scene understanding. Project page: https://davidea97.github.io/paparazzo-page/

FOCAL: Filtered On-device Continuous Activity Logging for Efficient Personal Desktop Summarization

Authors:Haoran Yin, Zhiyuan Wen, Jiannong Cao, Bo Yuan, Ruosong Yang

Date:2026-04-21 15:00:41

Desktop interaction streams provide a continuous, privacy-sensitive record of interleaved user tasks. Transforming these streams into task-organized personal logs on-device faces two main challenges: exhaustive Vision-Language Model (VLM) processing strains local resources, and global stream processing causes cross-task context pollution. We present FOCAL (Filtered On-device Continuous Activity Logging), a privacy-first multi-agent system utilizing a unified filter-plan-log architecture. It cascades a lightweight Filter Agent for noise suppression, a text-only Brain Agent for task attribution, a Record Agent for selective visual reasoning, and a task-isolated Memory Agent for context-coherent summarization. Experiments on DesktopBench (comprising 2,572 screenshots across 420 complex sessions) show FOCAL reduces total token consumption by 60.4% and VLM call count by 72.3% versus a baseline, while boosting Key Information Recall (KIR) from 0.38 to 0.61. Crucially, under $A{\to}B{\to}A$ task interruptions, FOCAL maintains Task Acc 0.81 and KIR 0.80, whereas the baseline collapses to Task Acc 0.03. FOCAL pioneers the efficient, on-device summarization of instruction-free desktop streams into multi-perspective personal logs.

An effective window framework for closed-Loop regional SAR reconnaissance with hybrid direct-relay downlink scheduling

Authors:Linhong Li, Qi Feng, Kebo Li, Yangang Liang

Date:2026-04-21 14:58:38

For operational regional synthetic aperture radar (SAR) reconnaissance, mission success depends not only on geometric visibility but also on whether geometric feasibility, prescribed imaging quality, and timely data delivery can be met together within the planning horizon. This paper develops an effective window framework for regional SAR window generation, per window signal level quality screening, and hybrid direct-relay closed loop scheduling. Through coarse angular bandpass screening, a planar characteristic curve containment test, and one dimensional boundary bisection, the framework forms geometry feasible candidate observation windows with millisecond-level accuracy for their entry and exit times. Each candidate window is then assessed in stripmap mode with a companion point target under a unified echo generation and Back Projection (BP) imaging workflow; only windows whose range and azimuth impulse response width (IRW), peak sidelobe ratio (PSLR), and integrated sidelobe ratio (ISLR) all satisfy the preset thresholds are retained. The retained observation, relay, and downlink windows feed a quality constrained hybrid direct-relay closed-loop mixed-integer linear programming (MILP) formulation for joint scheduling of observation and ground return. Numerical experiments confirm millisecond-level agreement with STK reference timing for window boundaries. Every candidate window is screened against preset imaging quality thresholds. Hybrid closed-loop scheduling improves closure performance and ground returned data volume relative to a relay-only baseline

From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

Authors:Beining Wu, Fuyou Mao, Jiong Lin, Cheng Yang, Jiaxuan Lu, Yifu Guo, Siyu Zhang, Yifan Wu, Ying Huang, Fu Li

Date:2026-04-21 14:39:24

Generative engines (GEs) are reshaping information access by replacing ranked links with citation-grounded answers, yet current Generative Engine Optimization (GEO) methods optimize each instance in isolation, unable to accumulate or transfer effective strategies across tasks and engines. We reframe GEO as a strategy learning problem and propose MAGEO, a multi-agent framework in which coordinated planning, editing, and fidelity-aware evaluation serve as the execution layer, while validated editing patterns are progressively distilled into reusable, engine-specific optimization skills. To enable controlled assessment, we introduce a Twin Branch Evaluation Protocol for causal attribution of content edits and DSV-CF, a dual-axis metric that unifies semantic visibility with attribution accuracy. We further release MSME-GEO-Bench, a multi-scenario, multi-engine benchmark grounded in real-world queries. Experiments on three mainstream engines show that MAGEO substantially outperforms heuristic baselines in both visibility and citation fidelity, with ablations confirming that engine-specific preference modeling and strategy reuse are central to these gains, suggesting a scalable learning-driven paradigm for trustworthy GEO. Code is available at https://github.com/Wu-beining/MAGEO