planning - 2025-08-14

Reasoning About Knowledge on Regular Expressions is 2EXPTIME-complete

Authors:Avijeet Ghosh, Sujata Ghosh, François Schwarzentruber

Date:2025-08-13 13:10:16

Logics for reasoning about knowledge and actions have seen many applications in various domains of multi-agent systems, including epistemic planning. Change of knowledge based on observations about the surroundings forms a key aspect in such planning scenarios. Public Observation Logic (POL) is a variant of public announcement logic for reasoning about knowledge that gets updated based on public observations. Each state in an epistemic (Kripke) model is equipped with a set of expected observations. These states evolve as the expectations get matched with the actual observations. In this work, we prove that the satisfiability problem of $\POL$ is 2EXPTIME-complete.

TriForecaster: A Mixture of Experts Framework for Multi-Region Electric Load Forecasting with Tri-dimensional Specialization

Authors:Zhaoyang Zhu, Zhipeng Zeng, Qiming Chen, Linxiao Yang, Peiyuan Liu, Weiqi Chen, Liang Sun

Date:2025-08-13 12:34:15

Electric load forecasting is pivotal for power system operation, planning and decision-making. The rise of smart grids and meters has provided more detailed and high-quality load data at multiple levels of granularity, from home to bus and cities. Motivated by similar patterns of loads across different cities in a province in eastern China, in this paper we focus on the Multi-Region Electric Load Forecasting (MRELF) problem, targeting accurate short-term load forecasting for multiple sub-regions within a large region. We identify three challenges for MRELF, including regional variation, contextual variation, and temporal variation. To address them, we propose TriForecaster, a new framework leveraging the Mixture of Experts (MoE) approach within a Multi-Task Learning (MTL) paradigm to overcome these challenges. TriForecaster features RegionMixer and Context-Time Specializer (CTSpecializer) layers, enabling dynamic cooperation and specialization of expert models across regional, contextual, and temporal dimensions. Based on evaluation on four real-world MRELF datasets with varied granularity, TriForecaster outperforms state-of-the-art models by achieving an average forecast error reduction of 22.4\%, thereby demonstrating its flexibility and broad applicability. In particular, the deployment of TriForecaster on the eForecaster platform in eastern China exemplifies its practical utility, effectively providing city-level, short-term load forecasts for 17 cities, supporting a population exceeding 110 million and daily electricity usage over 100 gigawatt-hours.

Route Planning and Online Routing for Quantum Key Distribution Networks

Authors:Jorge López, Charalampos Chatzinakis, Marc Cartigny

Date:2025-08-13 12:00:55

Quantum Key Distribution (QKD) networks harness the principles of quantum physics in order to securely transmit cryptographic key material, providing physical guarantees. These networks require traditional management and operational components, such as routing information through the network elements. However, due to the limitations on capacity and the particularities of information handling in these networks, traditional shortest paths algorithms for routing perform poorly on both route planning and online routing, which is counterintuitive. Moreover, due to the scarce resources in such networks, often the expressed demand cannot be met by any assignment of routes. To address both the route planning problem and the need for fair automated suggestions in infeasible cases, we propose to model this problem as a Quadratic Programming (QP) problem. For the online routing problem, we showcase that the shortest (available) paths routing strategy performs poorly in the online setting. Furthermore, we prove that the widest shortest path routing strategy has a competitive ratio greater or equal than $\frac{1}{2}$, efficiently addressing both routing modes in QKD networks.

Preacher: Paper-to-Video Agentic System

Authors:Jingwei Liu, Ling Yang, Hao Luo, Fan Wang Hongyan Li, Mengdi Wang

Date:2025-08-13 09:08:51

The paper-to-video task converts a research paper into a structured video abstract, distilling key concepts, methods, and conclusions into an accessible, well-organized format. While state-of-the-art video generation models demonstrate potential, they are constrained by limited context windows, rigid video duration constraints, limited stylistic diversity, and an inability to represent domain-specific knowledge. To address these limitations, we introduce Preacher, the first paper-to-video agentic system. Preacher employs a top-down approach to decompose, summarize, and reformulate the paper, followed by bottom-up video generation, synthesizing diverse video segments into a coherent abstract. To align cross-modal representations, we define key scenes and introduce a Progressive Chain of Thought (P-CoT) for granular, iterative planning. Preacher successfully generates high-quality video abstracts across five research fields, demonstrating expertise beyond current video generation models. Code will be released at: https://github.com/GenVerse/Paper2Video

ESCoT: An Enhanced Step-based Coordinate Trajectory Planning Method for Multiple Car-like Robots

Authors:Junkai Jiang, Yihe Chen, Yibin Yang, Ruochen Li, Shaobing Xu, Jianqiang Wang

Date:2025-08-13 07:53:29

Multi-vehicle trajectory planning (MVTP) is one of the key challenges in multi-robot systems (MRSs) and has broad applications across various fields. This paper presents ESCoT, an enhanced step-based coordinate trajectory planning method for multiple car-like robots. ESCoT incorporates two key strategies: collaborative planning for local robot groups and replanning for duplicate configurations. These strategies effectively enhance the performance of step-based MVTP methods. Through extensive experiments, we show that ESCoT 1) in sparse scenarios, significantly improves solution quality compared to baseline step-based method, achieving up to 70% improvement in typical conflict scenarios and 34% in randomly generated scenarios, while maintaining high solving efficiency; and 2) in dense scenarios, outperforms all baseline methods, maintains a success rate of over 50% even in the most challenging configurations. The results demonstrate that ESCoT effectively solves MVTP, further extending the capabilities of step-based methods. Finally, practical robot tests validate the algorithm's applicability in real-world scenarios.

Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges

Authors:Changyuan Zhao, Guangyuan Liu, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Zan Li, Xuemin, Shen, Zhu Han, Sumei Sun, Chau Yuen, Dong In Kim

Date:2025-08-13 07:29:40

Edge General Intelligence (EGI) represents a transformative evolution of edge computing, where distributed agents possess the capability to perceive, reason, and act autonomously across diverse, dynamic environments. Central to this vision are world models, which act as proactive internal simulators that not only predict but also actively imagine future trajectories, reason under uncertainty, and plan multi-step actions with foresight. This proactive nature allows agents to anticipate potential outcomes and optimize decisions ahead of real-world interactions. While prior works in robotics and gaming have showcased the potential of world models, their integration into the wireless edge for EGI remains underexplored. This survey bridges this gap by offering a comprehensive analysis of how world models can empower agentic artificial intelligence (AI) systems at the edge. We first examine the architectural foundations of world models, including latent representation learning, dynamics modeling, and imagination-based planning. Building on these core capabilities, we illustrate their proactive applications across EGI scenarios such as vehicular networks, unmanned aerial vehicle (UAV) networks, the Internet of Things (IoT) systems, and network functions virtualization, thereby highlighting how they can enhance optimization under latency, energy, and privacy constraints. We then explore their synergy with foundation models and digital twins, positioning world models as the cognitive backbone of EGI. Finally, we highlight open challenges, such as safety guarantees, efficient training, and constrained deployment, and outline future research directions. This survey provides both a conceptual foundation and a practical roadmap for realizing the next generation of intelligent, autonomous edge systems.

CaRoBio: 3D Cable Routing with a Bio-inspired Gripper Fingernail

Authors:Jiahui Zuo, Boyang Zhang, Fumin Zhang

Date:2025-08-13 07:25:40

The manipulation of deformable linear flexures has a wide range of applications in industry, such as cable routing in automotive manufacturing and textile production. Cable routing, as a complex multi-stage robot manipulation scenario, is a challenging task for robot automation. Common parallel two-finger grippers have the risk of over-squeezing and over-tension when grasping and guiding cables. In this paper, a novel eagle-inspired fingernail is designed and mounted on the gripper fingers, which helps with cable grasping on planar surfaces and in-hand cable guiding operations. Then we present a single-grasp end-to-end 3D cable routing framework utilizing the proposed fingernails, instead of the common pick-and-place strategy. Continuous control is achieved to efficiently manipulate cables through vision-based state estimation of task configurations and offline trajectory planning based on motion primitives. We evaluate the effectiveness of the proposed framework with a variety of cables and channel slots, significantly outperforming the pick-and-place manipulation process under equivalent perceptual conditions. Our reconfigurable task setting and the proposed framework provide a reference for future cable routing manipulations in 3D space.

A Bayesian factor analysis model for non-randomised staggered designs

Authors:Constantin Schmidt, Shaun R. Seaman, Beatrice Emmanouil, Leila Reid, Stuart Smith, Daniela De Angelis, Pantelis Samartsidis

Date:2025-08-13 07:19:36

The employment of peer supporter workers starting in 2018 was one of the interventions deployed by National Health Service England as part of its Hepatitis C virus (HCV) elimination plan. Peers are individuals with relevant lived experience who educate their communities about the virus and promote testing and treatment. In this paper, we assess the causal effect of the peers intervention on HCV patient case-finding, using data on 22 administrative regions from January 2016 to May 2021. To do this, we develop a Bayesian causal factor analysis model for count outcomes and ordinal interventions. Our method provides uncertainty quantification for all causal estimands of interest, gains efficiency by jointly modelling the intervention assignment process, pre- and post-intervention outcomes, and provides estimates of both conditional average and individual treatment effects (ITEs). For ITEs, we propose a copula-based approach that allows practitioners to perform sensitivity analysis to assumptions made regarding the joint distribution of potential outcomes, that are necessary to estimate these quantities. Our analysis suggests that the introduction of peers led to an increase in HCV patient case-finding. Further, we found that the effect of the intervention increased with intervention intensity, and was stronger during the national COVID-19 lockdown.

DAgger Diffusion Navigation: DAgger Boosted Diffusion Policy for Vision-Language Navigation

Authors:Haoxiang Shi, Xiang Deng, Zaijing Li, Gongwei Chen, Yaowei Wang, Liqiang Nie

Date:2025-08-13 02:51:43

Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural language instructions through free-form 3D spaces. Existing VLN-CE approaches typically use a two-stage waypoint planning framework, where a high-level waypoint predictor generates the navigable waypoints, and then a navigation planner suggests the intermediate goals in the high-level action space. However, this two-stage decomposition framework suffers from: (1) global sub-optimization due to the proxy objective in each stage, and (2) a performance bottleneck caused by the strong reliance on the quality of the first-stage predicted waypoints. To address these limitations, we propose DAgger Diffusion Navigation (DifNav), an end-to-end optimized VLN-CE policy that unifies the traditional two stages, i.e. waypoint generation and planning, into a single diffusion policy. Notably, DifNav employs a conditional diffusion policy to directly model multi-modal action distributions over future actions in continuous navigation space, eliminating the need for a waypoint predictor while enabling the agent to capture multiple possible instruction-following behaviors. To address the issues of compounding error in imitation learning and enhance spatial reasoning in long-horizon navigation tasks, we employ DAgger for online policy training and expert trajectory augmentation, and use the aggregated data to further fine-tune the policy. This approach significantly improves the policy's robustness and its ability to recover from error states. Extensive experiments on benchmark datasets demonstrate that, even without a waypoint predictor, the proposed method substantially outperforms previous state-of-the-art two-stage waypoint-based models in terms of navigation performance. Our code is available at: https://github.com/Tokishx/DifNav.

CLF-RL: Control Lyapunov Function Guided Reinforcement Learning

Authors:Kejun Li, Zachary Olkin, Yisong Yue, Aaron D. Ames

Date:2025-08-12 21:34:07

Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.

RicciFlowRec: A Geometric Root Cause Recommender Using Ricci Curvature on Financial Graphs

Authors:Zhongtian Sun, Anoushka Harit

Date:2025-08-12 20:45:02

We propose RicciFlowRec, a geometric recommendation framework that performs root cause attribution via Ricci curvature and flow on dynamic financial graphs. By modelling evolving interactions among stocks, macroeconomic indicators, and news, we quantify local stress using discrete Ricci curvature and trace shock propagation via Ricci flow. Curvature gradients reveal causal substructures, informing a structural risk-aware ranking function. Preliminary results on S\&P~500 data with FinBERT-based sentiment show improved robustness and interpretability under synthetic perturbations. This ongoing work supports curvature-based attribution and early-stage risk-aware ranking, with plans for portfolio optimization and return forecasting. To our knowledge, RicciFlowRec is the first recommender to apply geometric flow-based reasoning in financial decision support.

Decision-Making-Based Path Planning for Autonomous UAVs: A Survey

Authors:Kelen C. Teixeira Vivaldini, Robert Pěnička, Martin Saska

Date:2025-08-12 19:38:33

One of the most critical features for the successful operation of autonomous UAVs is the ability to make decisions based on the information acquired from their surroundings. Each UAV must be able to make decisions during the flight in order to deal with uncertainties in its system and the environment, and to further act upon the information being received. Such decisions influence the future behavior of the UAV, which is expressed as the path plan. Thus, decision-making in path planning is an enabling technique for deploying autonomous UAVs in real-world applications. This survey provides an overview of existing studies that use aspects of decision-making in path planning, presenting the research strands for Exploration Path Planning and Informative Path Planning, and focusing on characteristics of how data have been modeled and understood. Finally, we highlight the existing challenges for relevant topics in this field.

The Neutrino Kaleidoscope: Searches for Non-Standard Neutrino Oscillations at Neutrino Telescopes with a TeV Muon Accelerator Source

Authors:Nicholas W. Kamp, Gray Putnam

Date:2025-08-12 18:00:01

Muon accelerators, a potential technology for enabling $\mathcal{O}$(10 TeV) parton center of mass energy collisions, would also source an intense, collimated beam of neutrinos at TeV energies. The energy and size of this beam would be excellently matched as a source for existing and planned neutrino telescopes: gigaton-sized detectors of astrophysical neutrinos at and above TeV energies. In this paper, we introduce the technical considerations and scientific reach of pairing a muon accelerator source of neutrinos with a neutrino telescope detector, a combination we dub the ''Neutrino Kaleidoscope''. In particular, such a pairing would enable searches for non-standard oscillations of the beam neutrinos as they traverse the earth between source and detector. These non-standard neutrino oscillations could be sourced by Lorentz invariance violation, which a neutrino kaleidoscope could probe up to the quantum gravity-motivated Planck scale. Such a search would also have a reach on sterile neutrinos orders of magnitude beyond existing terrestrial limits. Finally, we touch on some of the non-oscillation potential of a neutrino kaleidoscope.

CVCM Track Circuits Pre-emptive Failure Diagnostics for Predictive Maintenance Using Deep Neural Networks

Authors:Debdeep Mukherjee, Eduardo Di Santi, Clément Lefebvre, Nenad Mijatovic, Victor Martin, Thierry Josse, Jonathan Brown, Kenza Saiah

Date:2025-08-12 16:13:51

Track circuits are critical for railway operations, acting as the main signalling sub-system to locate trains. Continuous Variable Current Modulation (CVCM) is one such technology. Like any field-deployed, safety-critical asset, it can fail, triggering cascading disruptions. Many failures originate as subtle anomalies that evolve over time, often not visually apparent in monitored signals. Conventional approaches, which rely on clear signal changes, struggle to detect them early. Early identification of failure types is essential to improve maintenance planning, minimising downtime and revenue loss. Leveraging deep neural networks, we propose a predictive maintenance framework that classifies anomalies well before they escalate into failures. Validated on 10 CVCM failure cases across different installations, the method is ISO-17359 compliant and outperforms conventional techniques, achieving 99.31% overall accuracy with detection within 1% of anomaly onset. Through conformal prediction, we provide uncertainty estimates, reaching 99% confidence with consistent coverage across classes. Given CVCMs global deployment, the approach is scalable and adaptable to other track circuits and railway systems, enhancing operational reliability.

A First Look at Predictability and Explainability of Pre-request Passenger Waiting Time in Ridesharing Systems

Authors:Jie Wang, Guang Wang

Date:2025-08-12 15:42:14

Passenger waiting time prediction plays a critical role in enhancing both ridesharing user experience and platform efficiency. While most existing research focuses on post-request waiting time prediction with knowing the matched driver information, pre-request waiting time prediction (i.e., before submitting a ride request and without matching a driver) is also important, as it enables passengers to plan their trips more effectively and enhance the experience of both passengers and drivers. However, it has not been fully studied by existing works. In this paper, we take the first step toward understanding the predictability and explainability of pre-request passenger waiting time in ridesharing systems. Particularly, we conduct an in-depth data-driven study to investigate the impact of demand&supply dynamics on passenger waiting time. Based on this analysis and feature engineering, we propose FiXGBoost, a novel feature interaction-based XGBoost model designed to predict waiting time without knowing the assigned driver information. We further perform an importance analysis to quantify the contribution of each factor. Experiments on a large-scale real-world ridesharing dataset including over 30 million trip records show that our FiXGBoost can achieve a good performance for pre-request passenger waiting time prediction with high explainability.

Large Scale Robotic Material Handling: Learning, Planning, and Control

Authors:Filippo A. Spinelli, Yifan Zhai, Fang Nan, Pascal Egli, Julian Nubert, Thilo Bleumer, Lukas Miller, Ferdinand Hofmann, Marco Hutter

Date:2025-08-12 15:13:52

Bulk material handling involves the efficient and precise moving of large quantities of materials, a core operation in many industries, including cargo ship unloading, waste sorting, construction, and demolition. These repetitive, labor-intensive, and safety-critical operations are typically performed using large hydraulic material handlers equipped with underactuated grippers. In this work, we present a comprehensive framework for the autonomous execution of large-scale material handling tasks. The system integrates specialized modules for environment perception, pile attack point selection, path planning, and motion control. The main contributions of this work are two reinforcement learning-based modules: an attack point planner that selects optimal grasping locations on the material pile to maximize removal efficiency and minimize the number of scoops, and a robust trajectory following controller that addresses the precision and safety challenges associated with underactuated grippers in movement, while utilizing their free-swinging nature to release material through dynamic throwing. We validate our framework through real-world experiments on a 40 t material handler in a representative worksite, focusing on two key tasks: high-throughput bulk pile management and high-precision truck loading. Comparative evaluations against human operators demonstrate the system's effectiveness in terms of precision, repeatability, and operational safety. To the best of our knowledge, this is the first complete automation of material handling tasks on a full scale.

TaoCache: Structure-Maintained Video Generation Acceleration

Authors:Zhentao Fan, Zongzuo Wang, Weiwei Zhang

Date:2025-08-12 14:40:36

Existing cache-based acceleration methods for video diffusion models primarily skip early or mid denoising steps, which often leads to structural discrepancies relative to full-timestep generation and can hinder instruction following and character consistency. We present TaoCache, a training-free, plug-and-play caching strategy that, instead of residual-based caching, adopts a fixed-point perspective to predict the model's noise output and is specifically effective in late denoising stages. By calibrating cosine similarities and norm ratios of consecutive noise deltas, TaoCache preserves high-resolution structure while enabling aggressive skipping. The approach is orthogonal to complementary accelerations such as Pyramid Attention Broadcast (PAB) and TeaCache, and it integrates seamlessly into DiT-based frameworks. Across Latte-1, OpenSora-Plan v110, and Wan2.1, TaoCache attains substantially higher visual quality (LPIPS, SSIM, PSNR) than prior caching methods under the same speedups.

Urban-STA4CLC: Urban Theory-Informed Spatio-Temporal Attention Model for Predicting Post-Disaster Commercial Land Use Change

Authors:Ziyi Guo, Yan Wang

Date:2025-08-12 14:39:42

Natural disasters such as hurricanes and wildfires increasingly introduce unusual disturbance on economic activities, which are especially likely to reshape commercial land use pattern given their sensitive to customer visitation. However, current modeling approaches are limited in capturing such complex interplay between human activities and commercial land use change under and following disturbances. Such interactions have been more effectively captured in current resilient urban planning theories. This study designs and calibrates a Urban Theory-Informed Spatio-Temporal Attention Model for Predicting Post-Disaster Commercial Land Use Change (Urban-STA4CLC) to predict both the yearly decline and expansion of commercial land use at census block level under cumulative impact of disasters on human activities over two years. Guided by urban theories, Urban-STA4CLC integrates both spatial and temporal attention mechanisms with three theory-informed modules. Resilience theory guides a disaster-aware temporal attention module that captures visitation dynamics. Spatial economic theory informs a multi-relational spatial attention module for inter-block representation. Diffusion theory contributes a regularization term that constrains land use transitions. The model performs significantly better than non-theoretical baselines in predicting commercial land use change under the scenario of recurrent hurricanes, with around 19% improvement in F1 score (0.8763). The effectiveness of the theory-guided modules was further validated through ablation studies. The research demonstrates that embedding urban theory into commercial land use modeling models may substantially enhance the capacity to capture its gains and losses. These advances in commercial land use modeling contribute to land use research that accounts for cumulative impacts of recurrent disasters and shifts in economic activity patterns.

Automatic and standardized surgical reporting for central nervous system tumors

Authors:David Bouget, Mathilde Gajda Faanes, Asgeir Store Jakola, Frederik Barkhof, Hilko Ardon, Lorenzo Bello, Mitchel S. Berger, Shawn L. Hervey-Jumper, Julia Furtner, Albert J. S. Idema, Barbara Kiesel, Georg Widhalm, Rishi Nandoe Tewarie, Emmanuel Mandonnet, Pierre A. Robe, Michiel Wagemakers, Timothy R. Smith, Philip C. De Witt Hamer, Ole solheim, Ingerid Reinertsen

Date:2025-08-12 13:08:49

Magnetic resonance (MR) imaging is essential for evaluating central nervous system (CNS) tumors, guiding surgical planning, treatment decisions, and assessing postoperative outcomes and complication risks. While recent work has advanced automated tumor segmentation and report generation, most efforts have focused on preoperative data, with limited attention to postoperative imaging analysis. This study introduces a comprehensive pipeline for standardized postsurtical reporting in CNS tumors. Using the Attention U-Net architecture, segmentation models were trained for the preoperative (non-enhancing) tumor core, postoperative contrast-enhancing residual tumor, and resection cavity. Additionally, MR sequence classification and tumor type identification for contrast-enhancing lesions were explored using the DenseNet architecture. The models were integrated into a reporting pipeline, following the RANO 2.0 guidelines. Training was conducted on multicentric datasets comprising 2000 to 7000 patients, using a 5-fold cross-validation. Evaluation included patient-, voxel-, and object-wise metrics, with benchmarking against the latest BraTS challenge results. The segmentation models achieved average voxel-wise Dice scores of 87%, 66%, 70%, and 77% for the tumor core, non-enhancing tumor core, contrast-enhancing residual tumor, and resection cavity, respectively. Classification models reached 99.5% balanced accuracy in MR sequence classification and 80% in tumor type classification. The pipeline presented in this study enables robust, automated segmentation, MR sequence classification, and standardized report generation aligned with RANO 2.0 guidelines, enhancing postoperative evaluation and clinical decision-making. The proposed models and methods were integrated into Raidionics, open-source software platform for CNS tumor analysis, now including a dedicated module for postsurgical analysis.

Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation

Authors:Yuechen Wang, Yuming Qiao, Dan Meng, Jun Yang, Haonan Lu, Zhenyu Yang, Xudong Zhang

Date:2025-08-12 10:17:12

Multimodal Retrieval-Augmented Generation (mRAG) has emerged as a promising solution to address the temporal limitations of Multimodal Large Language Models (MLLMs) in real-world scenarios like news analysis and trending topics. However, existing approaches often suffer from rigid retrieval strategies and under-utilization of visual information. To bridge this gap, we propose E-Agent, an agent framework featuring two key innovations: a mRAG planner trained to dynamically orchestrate multimodal tools based on contextual reasoning, and a task executor employing tool-aware execution sequencing to implement optimized mRAG workflows. E-Agent adopts a one-time mRAG planning strategy that enables efficient information retrieval while minimizing redundant tool invocations. To rigorously assess the planning capabilities of mRAG systems, we introduce the Real-World mRAG Planning (RemPlan) benchmark. This novel benchmark contains both retrieval-dependent and retrieval-independent question types, systematically annotated with essential retrieval tools required for each instance. The benchmark's explicit mRAG planning annotations and diverse question design enhance its practical relevance by simulating real-world scenarios requiring dynamic mRAG decisions. Experiments across RemPlan and three established benchmarks demonstrate E-Agent's superiority: 13% accuracy gain over state-of-the-art mRAG methods while reducing redundant searches by 37%.

Visual Prompting for Robotic Manipulation with Annotation-Guided Pick-and-Place Using ACT

Authors:Muhammad A. Muttaqien, Tomohiro Motoda, Ryo Hanai, Yukiyasu Domae

Date:2025-08-12 08:45:09

Robotic pick-and-place tasks in convenience stores pose challenges due to dense object arrangements, occlusions, and variations in object properties such as color, shape, size, and texture. These factors complicate trajectory planning and grasping. This paper introduces a perception-action pipeline leveraging annotation-guided visual prompting, where bounding box annotations identify both pickable objects and placement locations, providing structured spatial guidance. Instead of traditional step-by-step planning, we employ Action Chunking with Transformers (ACT) as an imitation learning algorithm, enabling the robotic arm to predict chunked action sequences from human demonstrations. This facilitates smooth, adaptive, and data-driven pick-and-place operations. We evaluate our system based on success rate and visual analysis of grasping behavior, demonstrating improved grasp accuracy and adaptability in retail environments.

Comprehensive Comparison Network: a framework for locality-aware, routes-comparable and interpretable route recommendation

Authors:Chao Chen, Longfei Xu, Hanyu Guo, Chengzhang Wang, Ying Wang, Kaikui Liu, Xiangxiang Chu

Date:2025-08-12 08:40:52

Route recommendation (RR) is a core task of route planning in the Amap app, with the goal of recommending the optimal route among candidate routes to users. Unlike traditional recommendation methods, insights into the local quality of routes and comparisons between candidate routes are crucial for enhancing recommendation performance but often overlooked in previous studies. To achieve these, we propose a novel model called Comprehensive Comparison Network (CCN). CCN not only uses query-level features (e.g. user features) and item-level features (e.g. route features, item embedding) that are common in traditional recommendations, but also introduces comparison-level features which describe the non-overlapping segments between different routes to capture the local quality of routes. The key component Comprehensive Comparison Block (CCB) in CCN is designed to enable comparisons between routes. CCB includes a Comprehensive Comparison Operator (CCO) and a multi-scenario MLP, which can update the representations of candidate routes based on a comprehensive comparison. By stacking multiple CCBs, CCN can determine the final scores of candidate routes and recommend the optimal one to the user. Additionally, since routes directly affect the costs and risks experienced by users, the RR model must be interpretable for online deployment. Therefore, we designed an interpretable pair scoring network to achieve interpretability. Both offline and online experiments demonstrate that CCN significantly improves RR performance and exhibits strong interpretability. CCN has been fully deployed in the Amap app for over a year, providing stable and optimal benefits for route recommendations.

PETLP: A Privacy-by-Design Pipeline for Social Media Data in AI Research

Authors:Nick Oh, Giorgos D. Vrakas, Siân J. M. Brooke, Sasha Morinière, Toju Duke

Date:2025-08-12 08:33:40

Social media data presents AI researchers with overlapping obligations under the GDPR, copyright law, and platform terms -- yet existing frameworks fail to integrate these regulatory domains, leaving researchers without unified guidance. We introduce PETLP (Privacy-by-design Extract, Transform, Load, and Present), a compliance framework that embeds legal safeguards directly into extended ETL pipelines. Central to PETLP is treating Data Protection Impact Assessments as living documents that evolve from pre-registration through dissemination. Through systematic Reddit analysis, we demonstrate how extraction rights fundamentally differ between qualifying research organisations (who can invoke DSM Article 3 to override platform restrictions) and commercial entities (bound by terms of service), whilst GDPR obligations apply universally. We reveal why true anonymisation remains unachievable for social media data and expose the legal gap between permitted dataset creation and uncertain model distribution. By structuring compliance decisions into practical workflows and simplifying institutional data management plans, PETLP enables researchers to navigate regulatory complexity with confidence, bridging the gap between legal requirements and research practice.

Expert-Guided Diffusion Planner for Auto-bidding

Authors:Yunshan Peng, Wenzheng Shu, Jiahao Sun, Yanxiang Zeng, Jinan Pang, Wentao Bai, Yunke Bai, Xialong Liu, Peng Jiang

Date:2025-08-12 07:23:51

Auto-bidding is extensively applied in advertising systems, serving a multitude of advertisers. Generative bidding is gradually gaining traction due to its robust planning capabilities and generalizability. In contrast to traditional reinforcement learning-based bidding, generative bidding does not rely on the Markov Decision Process (MDP) exhibiting superior planning capabilities in long-horizon scenarios. Conditional diffusion modeling approaches have demonstrated significant potential in the realm of auto-bidding. However, relying solely on return as the optimality condition is weak to guarantee the generation of genuinely optimal decision sequences, lacking personalized structural information. Moreover, diffusion models' t-step autoregressive generation mechanism inherently carries timeliness risks. To address these issues, we propose a novel conditional diffusion modeling method based on expert trajectory guidance combined with a skip-step sampling strategy to enhance generation efficiency. We have validated the effectiveness of this approach through extensive offline experiments and achieved statistically significant results in online A/B testing, achieving an increase of 11.29% in conversion and a 12.35% in revenue compared with the baseline.

Developing a Calibrated Physics-Based Digital Twin for Construction Vehicles

Authors:Deniz Karanfil, Daniel Lindmark, Martin Servin, David Torick, Bahram Ravani

Date:2025-08-12 02:23:50

This paper presents the development of a calibrated digital twin of a wheel loader. A calibrated digital twin integrates a construction vehicle with a high-fidelity digital model allowing for automated diagnostics and optimization of operations as well as pre-planning simulations enhancing automation capabilities. The high-fidelity digital model is a virtual twin of the physical wheel loader. It uses a physics-based multibody dynamic model of the wheel loader in the software AGX Dynamics. Interactions of the wheel loader's bucket while in use in construction can be simulated in the virtual model. Calibration makes this simulation of high-fidelity which can enhance realistic planning for automation of construction operations. In this work, a wheel loader was instrumented with several sensors used to calibrate the digital model. The calibrated digital twin was able to estimate the magnitude of the forces on the bucket base with high accuracy, providing a high-fidelity simulation.

DeepFleet: Multi-Agent Foundation Models for Mobile Robots

Authors:Ameya Agaskar, Sriram Siva, William Pickering, Kyle O'Brien, Charles Kekeh, Ang Li, Brianna Gallo Sarker, Alicia Chua, Mayur Nemade, Charun Thattai, Jiaming Di, Isaac Iyengar, Ramya Dharoor, Dino Kirouani, Jimmy Erskine, Tamir Hegazy, Scott Niekum, Usman A. Khan, Federico Pecora, Joseph W. Durham

Date:2025-08-12 02:19:15

We introduce DeepFleet, a suite of foundation models designed to support coordination and planning for large-scale mobile robot fleets. These models are trained on fleet movement data, including robot positions, goals, and interactions, from hundreds of thousands of robots in Amazon warehouses worldwide. DeepFleet consists of four architectures that each embody a distinct inductive bias and collectively explore key points in the design space for multi-agent foundation models: the robot-centric (RC) model is an autoregressive decision transformer operating on neighborhoods of individual robots; the robot-floor (RF) model uses a transformer with cross-attention between robots and the warehouse floor; the image-floor (IF) model applies convolutional encoding to a multi-channel image representation of the full fleet; and the graph-floor (GF) model combines temporal attention with graph neural networks for spatial relationships. In this paper, we describe these models and present our evaluation of the impact of these design choices on prediction task performance. We find that the robot-centric and graph-floor models, which both use asynchronous robot state updates and incorporate the localized structure of robot interactions, show the most promise. We also present experiments that show that these two models can make effective use of larger warehouses operation datasets as the models are scaled up.

UQGNN: Uncertainty Quantification of Graph Neural Networks for Multivariate Spatiotemporal Prediction

Authors:Dahai Yu, Dingyi Zhuang, Lin Jiang, Rongchao Xu, Xinyue Ye, Yuheng Bu, Shenhao Wang, Guang Wang

Date:2025-08-12 01:40:05

Spatiotemporal prediction plays a critical role in numerous real-world applications such as urban planning, transportation optimization, disaster response, and pandemic control. In recent years, researchers have made significant progress by developing advanced deep learning models for spatiotemporal prediction. However, most existing models are deterministic, i.e., predicting only the expected mean values without quantifying uncertainty, leading to potentially unreliable and inaccurate outcomes. While recent studies have introduced probabilistic models to quantify uncertainty, they typically focus on a single phenomenon (e.g., taxi, bike, crime, or traffic crashes), thereby neglecting the inherent correlations among heterogeneous urban phenomena. To address the research gap, we propose a novel Graph Neural Network with Uncertainty Quantification, termed UQGNN for multivariate spatiotemporal prediction. UQGNN introduces two key innovations: (i) an Interaction-aware Spatiotemporal Embedding Module that integrates a multivariate diffusion graph convolutional network and an interaction-aware temporal convolutional network to effectively capture complex spatial and temporal interaction patterns, and (ii) a multivariate probabilistic prediction module designed to estimate both expected mean values and associated uncertainties. Extensive experiments on four real-world multivariate spatiotemporal datasets from Shenzhen, New York City, and Chicago demonstrate that UQGNN consistently outperforms state-of-the-art baselines in both prediction accuracy and uncertainty quantification. For example, on the Shenzhen dataset, UQGNN achieves a 5% improvement in both prediction accuracy and uncertainty quantification.

StreetViewAI: Making Street View Accessible Using Context-Aware Multimodal AI

Authors:Jon E. Froehlich, Alexander Fiannaca, Nimer Jaber, Victor Tsara, Shaun Kane

Date:2025-08-11 23:30:39

Interactive streetscape mapping tools such as Google Street View (GSV) and Meta Mapillary enable users to virtually navigate and experience real-world environments via immersive 360{\deg} imagery but remain fundamentally inaccessible to blind users. We introduce StreetViewAI, the first-ever accessible street view tool, which combines context-aware, multimodal AI, accessible navigation controls, and conversational speech. With StreetViewAI, blind users can virtually examine destinations, engage in open-world exploration, or virtually tour any of the over 220 billion images and 100+ countries where GSV is deployed. We iteratively designed StreetViewAI with a mixed-visual ability team and performed an evaluation with eleven blind users. Our findings demonstrate the value of an accessible street view in supporting POI investigations and remote route planning. We close by enumerating key guidelines for future work.

Identification of pressure points in modern power systems using transfer entropy

Authors:Katerina Tang, M. Vivienne Liu, C. Lindsay Anderson, Vivek Srikrishnan

Date:2025-08-11 22:48:46

Integration of variable energy resources -- e.g., solar, wind, and hydro -- and end-use electrification increase modern energy systems' weather-dependence. Identifying critical infrastructure constraining the power grid's ability to meet electricity demand under weather-induced shocks and stressors is essential for understanding risks and guiding adaptation. We use transfer entropy to identify predictive pressure points: grid components whose utilization patterns provide early signals of downstream power shortages. We apply this method to simulations of New York State's proposed future grid under various meteorological and technological scenarios, showing that pressure points often arise from complex, system-wide interactions between generation, transmission, and demand. While transfer entropy does not support conclusions about causality, the identified pressure points align with known bottlenecks and offer insight into failure pathways. Furthermore, these pressure points are not easily predicted by high-level scenario features alone, underscoring the need for holistic and adaptive approaches to reliability planning in power systems with intermittent resources.

Sparse Partial Optimal Transport via Quadratic Regularization

Authors:Khang Tran, Khoa Nguyen, Anh Nguyen, Thong Huynh, Son Pham, Sy-Hoang Nguyen-Dang, Manh Pham, Bang Vo, Mai Ngoc Tran, Mai Ngoc Tran, Dung Luong

Date:2025-08-11 21:22:35

Partial Optimal Transport (POT) has recently emerged as a central tool in various Machine Learning (ML) applications. It lifts the stringent assumption of the conventional Optimal Transport (OT) that input measures are of equal masses, which is often not guaranteed in real-world datasets, and thus offers greater flexibility by permitting transport between unbalanced input measures. Nevertheless, existing major solvers for POT commonly rely on entropic regularization for acceleration and thus return dense transport plans, hindering the adoption of POT in various applications that favor sparsity. In this paper, as an alternative approach to the entropic POT formulation in the literature, we propose a novel formulation of POT with quadratic regularization, hence termed quadratic regularized POT (QPOT), which induces sparsity to the transport plan and consequently facilitates the adoption of POT in many applications with sparsity requirements. Extensive experiments on synthetic and CIFAR-10 datasets, as well as real-world applications such as color transfer and domain adaptations, consistently demonstrate the improved sparsity and favorable performance of our proposed QPOT formulation.