planning - 2026-01-21

Spatiotemporal Wildfire Prediction and Reinforcement Learning for Helitack Suppression

Authors:Shaurya Mathur, Shreyas Bellary Manjunath, Nitin Kulkarni, Alina Vereshchaka

Date:2026-01-20 18:50:12

Wildfires are growing in frequency and intensity, devastating ecosystems and communities while causing billions of dollars in suppression costs and economic damage annually in the U.S. Traditional wildfire management is mostly reactive, addressing fires only after they are detected. We introduce \textit{FireCastRL}, a proactive artificial intelligence (AI) framework that combines wildfire forecasting with intelligent suppression strategies. Our framework first uses a deep spatiotemporal model to predict wildfire ignition. For high-risk predictions, we deploy a pre-trained reinforcement learning (RL) agent to execute real-time suppression tactics with helitack units inside a physics-informed 3D simulation. The framework generates a threat assessment report to help emergency responders optimize resource allocation and planning. In addition, we are publicly releasing a large-scale, spatiotemporal dataset containing $\mathbf{9.5}$ million samples of environmental variables for wildfire prediction. Our work demonstrates how deep learning and RL can be combined to support both forecasting and tactical wildfire response. More details can be found at https://sites.google.com/view/firecastrl.

Burst Aware Forecasting of User Traffic Demand in LEO Satellite Networks

Authors:Yekta Demirci, Guillaume Mantelet, Stephane Martel, Jean-Francois Frigon, Gunes Karabulut Kurt

Date:2026-01-20 18:45:10

In Low Earth Orbit (LEO) satellite networks, Beam Hopping (BH) technology enables the efficient utilization of limited radio resources by adapting to varying user demands and link conditions. Effective BH planning requires prior knowledge of upcoming traffic at the time of scheduling, making forecasting an important sub-task. Forecasting becomes particularly critical under heavy load conditions where an unexpected demand burst combined with link degradation may cause buffer overflows and packet loss. To address this challenge, we propose a burst aware forecasting solution. This challenge may arise in a wide range of wireless networks; therefore, the proposed solution is broadly applicable to settings characterized by bursty traffic patterns where accurate demand forecasting is essential. Our approach introduces three key enhancements to a transformer architecture: (i) a distance from the last burst embedding to capture burst proximity, (ii) two additional linear layers in the decoder to forecast both upcoming bursts and their relative impact, and (iii) use of an asymmetric cost function during model training to better capture burst dynamics. Empirical evaluations in an Earth-fixed cell under high-traffic demand scenario demonstrate that the proposed model reduces prediction error by up to 94% at a one-step horizon and maintains the ability to accurately capture bursts even near the end of longer prediction horizons following Mean Square Error (MSE) metric.

Toward Efficient Agents: Memory, Tool learning, and Planning

Authors:Xiaofang Yang, Lijun Li, Heng Zhou, Tong Zhu, Xiaoye Qu, Yuchen Fan, Qianshan Wei, Rui Ye, Li Kang, Yiran Qin, Zhiqiang Kou, Daizong Liu, Qi Li, Ning Ding, Siheng Chen, Jing Shao

Date:2026-01-20 17:51:56

Recent years have witnessed increasing interest in extending large language models into agentic systems. While the effectiveness of agents has continued to improve, efficiency, which is crucial for real-world deployment, has often been overlooked. This paper therefore investigates efficiency from three core components of agents: memory, tool learning, and planning, considering costs such as latency, tokens, steps, etc. Aimed at conducting comprehensive research addressing the efficiency of the agentic system itself, we review a broad range of recent approaches that differ in implementation yet frequently converge on shared high-level principles including but not limited to bounding context via compression and management, designing reinforcement learning rewards to minimize tool invocation, and employing controlled search mechanisms to enhance efficiency, which we discuss in detail. Accordingly, we characterize efficiency in two complementary ways: comparing effectiveness under a fixed cost budget, and comparing cost at a comparable level of effectiveness. This trade-off can also be viewed through the Pareto frontier between effectiveness and cost. From this perspective, we also examine efficiency oriented benchmarks by summarizing evaluation protocols for these components and consolidating commonly reported efficiency metrics from both benchmark and methodological studies. Moreover, we discuss the key challenges and future directions, with the goal of providing promising insights.

Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance

Authors:Qianli Ma, Chang Guo, Zhiheng Tian, Siyu Wang, Jipeng Xiao, Yuanhao Yue, Zhipeng Zhang

Date:2026-01-20 17:23:51

Writing effective rebuttals is a high-stakes task that demands more than linguistic fluency, as it requires precise alignment between reviewer intent and manuscript details. Current solutions typically treat this as a direct-to-text generation problem, suffering from hallucination, overlooked critiques, and a lack of verifiable grounding. To address these limitations, we introduce $\textbf{RebuttalAgent}$, the first multi-agents framework that reframes rebuttal generation as an evidence-centric planning task. Our system decomposes complex feedback into atomic concerns and dynamically constructs hybrid contexts by synthesizing compressed summaries with high-fidelity text while integrating an autonomous and on-demand external search module to resolve concerns requiring outside literature. By generating an inspectable response plan before drafting, $\textbf{RebuttalAgent}$ ensures that every argument is explicitly anchored in internal or external evidence. We validate our approach on the proposed $\textbf{RebuttalBench}$ and demonstrate that our pipeline outperforms strong baselines in coverage, faithfulness, and strategic coherence, offering a transparent and controllable assistant for the peer review process. Code will be released.

Optimizing Energy and Data Collection in UAV-aided IoT Networks using Attention-based Multi-Objective Reinforcement Learning

Authors:Babacar Toure, Dimitrios Tsilimantos, Omid Esrafilian, Marios Kountouris

Date:2026-01-20 15:55:11

Due to their adaptability and mobility, Unmanned Aerial Vehicles (UAVs) are becoming increasingly essential for wireless network services, particularly for data harvesting tasks. In this context, Artificial Intelligence (AI)-based approaches have gained significant attention for addressing UAV path planning tasks in large and complex environments, bridging the gap with real-world deployments. However, many existing algorithms suffer from limited training data, which hampers their performance in highly dynamic environments. Moreover, they often overlook the inherently multi-objective nature of the task, treating it in an overly simplistic manner. To address these limitations, we propose an attention-based Multi-Objective Reinforcement Learning (MORL) architecture that explicitly handles the trade-off between data collection and energy consumption in urban environments, even without prior knowledge of wireless channel conditions. Our method develops a single model capable of adapting to varying trade-off preferences and dynamic scenario parameters without the need for fine-tuning or retraining. Extensive simulations show that our approach achieves substantial improvements in performance, model compactness, sample efficiency, and most importantly, generalization to previously unseen scenarios, outperforming existing RL solutions.

VERIDAH: Solving Enumeration Anomaly Aware Vertebra Labeling across Imaging Sequences

Authors:Hendrik Möller, Hanna Schoen, Robert Graf, Matan Atad, Nathan Molinier, Anjany Sekuboyina, Bettina K. Budai, Fabian Bamberg, Steffen Ringhof, Christopher Schlett, Tobias Pischon, Thoralf Niendorf, Josua A. Decker, Marc-André Weber, Bjoern Menze, Daniel Rueckert, Jan S. Kirschke

Date:2026-01-20 15:22:59

The human spine commonly consists of seven cervical, twelve thoracic, and five lumbar vertebrae. However, enumeration anomalies may result in individuals having eleven or thirteen thoracic vertebrae and four or six lumbar vertebrae. Although the identification of enumeration anomalies has potential clinical implications for chronic back pain and operation planning, the thoracolumbar junction is often poorly assessed and rarely described in clinical reports. Additionally, even though multiple deep-learning-based vertebra labeling algorithms exist, there is a lack of methods to automatically label enumeration anomalies. Our work closes that gap by introducing "Vertebra Identification with Anomaly Handling" (VERIDAH), a novel vertebra labeling algorithm based on multiple classification heads combined with a weighted vertebra sequence prediction algorithm. We show that our approach surpasses existing models on T2w TSE sagittal (98.30% vs. 94.24% of subjects with all vertebrae correctly labeled, p < 0.001) and CT imaging (99.18% vs. 77.26% of subjects with all vertebrae correctly labeled, p < 0.001) and works in arbitrary field-of-view images. VERIDAH correctly labeled the presence 2 Möller et al. of thoracic enumeration anomalies in 87.80% and 96.30% of T2w and CT images, respectively, and lumbar enumeration anomalies in 94.48% and 97.22% for T2w and CT, respectively. Our code and models are available at: https://github.com/Hendrik-code/spineps.

Evaluating state-of-the-art cloud quantum computers for quantum neural networks in gravitational waves data analysis

Authors:Maria-Catalina Isfan, Laurentiu-Ioan Caramete, Ana Caramete

Date:2026-01-20 14:56:22

In this work, we explore the possibility of using quantum computers provided for usage in cloud by big companies (such as IBM, IonQ, IQM Quantum Computers, etc.) to run our quantum neural network (QNN) developed for data analysis in the context of LISA Space Mission, developed with the Qiskit library in Python. Our previous work demonstrated that our QNN learns patterns in gravitational wave (GW) data much faster than a classical neural network, making it suitable for fast GW signal detection in future LISA data streams. Analyzing the fees from hardware providers like IBM Quantum, Amazon Braket and Microsoft Azure, we found that the fees for running the first segment of our QNN sum up to \$2000, \$60000, and \$1000000 respectively. Using free plans, we succeed to run the 3-qubit feature map of the QNN for one random data sample on {\fontfamily{qcr} \selectfont ibm\_kyoto} and {\fontfamily{qcr}\selectfont IQM Quantum Computers\_Garnet} quantum computers, obtaining a fidelity of 99\%; we could also run the first prediction segment of our QNN on {\fontfamily{qcr} \selectfont ibm\_kyoto}, implemented for 4 qubits, and obtained a prediction accuracy of 20\%. We queried providers such as IBM Quantum, Amazon Braket, Pasqal, and Munich Quantum Valley to obtain access to their plans, but, with the exception of Amazon Braket, our applications remain unanswered to this day. Other major setbacks in using the quantum computers we had access to included Qiskit library version issues (as in the cases of IBM Quantum and IQM Quantum Computers) and the frequent unavailability of the devices, as was the case with the Microsoft Azure provider. All the results presented in this paper were accumulated in 2024.

Intermittent time series forecasting: local vs global models

Authors:Stefano Damato, Nicolò Rubattu, Dario Azzimonti, Giorgio Corani

Date:2026-01-20 14:53:24

Intermittent time series, characterised by the presence of a significant amount of zeros, constitute a large percentage of inventory items in supply chain. Probabilistic forecasts are needed to plan the inventory levels; the predictive distribution should cover non-negative values, have a mass in zero and a long upper tail. Intermittent time series are commonly forecast using local models, which are trained individually on each time series. In the last years global models, which are trained on a large collection of time series, have become popular for time series forecasting. Global models are often based on neural networks. However, they have not yet been exhaustively tested on intermittent time series. We carry out the first study comparing state-of-the-art local (iETS, TweedieGP) and global models (D-Linear, DeepAR, Transformers) on intermittent time series. For neural networks models we consider three different distribution heads suitable for intermittent time series: negative binomial, hurdle-shifted negative binomial and Tweedie. We use, for the first time, the last two distribution heads with neural networks. We perform experiments on five large datasets comprising more than 40'000 real-world time series. Among neural networks D-Linear provides best accuracy; it also consistently outperforms the local models. Moreover, it has also low computational requirements. Transformers-based architectures are instead much more computationally demanding and less accurate. Among the distribution heads, the Tweedie provides the best estimates of the highest quantiles, while the negative binomial offers overall the best performance.

Performance enhancing of hybrid quantum-classical Benders approach for MILP optimization

Authors:Sergio López-Baños, Elisabeth Lobe, Ontje Lünsdorf, Oriol Raventós

Date:2026-01-20 14:47:50

Mixed-integer linear programming problems are extensively used in industry for a wide range of optimization tasks. However, as they get larger, they present computational challenges for classical solvers within practical time limits. Quantum annealers can, in principle, accelerate the solution of problems formulated as quadratic unconstrained binary optimization instances, but their limited scale currently prevents achieving practical speedups. Quantum-classical algorithms have been proposed to take advantage of both paradigms and to allow current quantum computers to be used in larger problems. In this work, a hardware-agnostic Benders' decomposition algorithm and a series of enhancements with the goal of taking the most advantage of quantum computing are presented. The decomposition consists of a master problem with integer variables, which is reformulated as a quadratic unconstrained binary optimization problem and solved with a quantum annealer, and a linear subproblem solved by a classical computer. The enhancements consist, among others, of different embedding processes that substantially reduce the pre-processing time of the embedding computation without compromising solution quality, a conservative handling of cut constraints, and a stopping criterion that accounts for the limited size of current quantum computers and their heuristic nature. The proposed algorithm is benchmarked against classical approaches using a D-Wave quantum annealer for a scalable family of transmission network expansion planning problems.

Capacity and Energy Trade-Offs in FR3 6G Networks Using Real Deployment Data

Authors:David López-Pérez, Nicola Piovesan, Matteo Bernabè

Date:2026-01-20 14:05:35

This article presents a data-driven system-level analysis of multi-layer 6G networks operating in the upper mid-band (FR3: 7-24 GHz). Unlike most prior studies based on 3rd Generation Partnership Project (3GPP) templates, we leverage real-world deployment and traffic data from a commercial 4G/5G network in China to evaluate practical 6G strategies. Using Giulia-a deployment-informed system-level heterogeneous network model-we show that 6G can boost median throughput by up to 9.5x over heterogeneous 4G+5G deployments, but also increases power usage by up to 59%. Critically, co-locating 6G with existing sites delivers limited gains while incurring high energy cost. In contrast, non-co-located, traffic-aware deployments achieve superior throughput-to-watt efficiency, highlighting the need for strategic, user equipment (UE) hotspot-focused 6G planning.

FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation

Authors:Jing Zuo, Lingzhou Mu, Fan Jiang, Chengcheng Ma, Mu Xu, Yonggang Qi

Date:2026-01-20 13:54:10

Achieving human-level performance in Vision-and-Language Navigation (VLN) requires an embodied agent to jointly understand multimodal instructions and visual-spatial context while reasoning over long action sequences. Recent works, such as NavCoT and NavGPT-2, demonstrate the potential of Chain-of-Thought (CoT) reasoning for improving interpretability and long-horizon planning. Moreover, multimodal extensions like OctoNav-R1 and CoT-VLA further validate CoT as a promising pathway toward human-like navigation reasoning. However, existing approaches face critical drawbacks: purely textual CoTs lack spatial grounding and easily overfit to sparse annotated reasoning steps, while multimodal CoTs incur severe token inflation by generating imagined visual observations, making real-time navigation impractical. In this work, we propose FantasyVLN, a unified implicit reasoning framework that preserves the benefits of CoT reasoning without explicit token overhead. Specifically, imagined visual tokens are encoded into a compact latent space using a pretrained Visual AutoRegressor (VAR) during CoT reasoning training, and the model jointly learns from textual, visual, and multimodal CoT modes under a unified multi-CoT strategy. At inference, our model performs direct instruction-to-action mapping while still enjoying reasoning-aware representations. Extensive experiments on LH-VLN show that our approach achieves reasoning-aware yet real-time navigation, improving success rates and efficiency while reducing inference latency by an order of magnitude compared to explicit CoT methods.

Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning

Authors:Hongbo Bai, Yujin Zhou, Yile Wu, Chi-Min Chan, Pengcheng Wen, Kunhao Pan, Sirui Han, Yike Guo

Date:2026-01-20 13:18:18

Large Multimodal Models (LMMs) have achieved remarkable success in visual understanding, yet they struggle with knowledge-intensive queries involving long-tail entities or evolving information due to static parametric knowledge. Recent search-augmented approaches attempt to address this limitation, but existing methods rely on indiscriminate whole-image retrieval that introduces substantial visual redundancy and noise, and lack deep iterative reflection, limiting their effectiveness on complex visual queries. To overcome these challenges, we propose Glance-or-Gaze (GoG), a fully autonomous framework that shifts from passive perception to active visual planning. GoG introduces a Selective Gaze mechanism that dynamically chooses whether to glance at global context or gaze into high-value regions, filtering irrelevant information before retrieval. We design a dual-stage training strategy: Reflective GoG Behavior Alignment via supervised fine-tuning instills the fundamental GoG paradigm, while Complexity-Adaptive Reinforcement Learning further enhances the model's capability to handle complex queries through iterative reasoning. Experiments across six benchmarks demonstrate state-of-the-art performance. Ablation studies confirm that both Selective Gaze and complexity-adaptive RL are essential for effective visual search. We will release our data and models for further exploration soon.

AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

Authors:Yusheng Liao, Chuan Xuan, Yutong Cai, Lina Yang, Zhe Chen, Yanfeng Wang, Yu Wang

Date:2026-01-20 12:48:04

Large Language Models have demonstrated profound utility in the medical domain. However, their application to autonomous Electronic Health Records~(EHRs) navigation remains constrained by a reliance on curated inputs and simplified retrieval tasks. To bridge the gap between idealized experimental settings and realistic clinical environments, we present AgentEHR. This benchmark challenges agents to execute complex decision-making tasks, such as diagnosis and treatment planning, requiring long-range interactive reasoning directly within raw and high-noise databases. In tackling these tasks, we identify that existing summarization methods inevitably suffer from critical information loss and fractured reasoning continuity. To address this, we propose RetroSum, a novel framework that unifies a retrospective summarization mechanism with an evolving experience strategy. By dynamically re-evaluating interaction history, the retrospective mechanism prevents long-context information loss and ensures unbroken logical coherence. Additionally, the evolving strategy bridges the domain gap by retrieving accumulated experience from a memory bank. Extensive empirical evaluations demonstrate that RetroSum achieves performance gains of up to 29.16% over competitive baselines, while significantly decreasing total interaction errors by up to 92.3%.

TractRLFusion: A GPT-Based Multi-Critic Policy Fusion Framework for Fiber Tractography

Authors:Ankita Joshi, Ashutosh Sharma, Anoushkrit Goel, Ranjeet Ranjan Jha, Chirag Ahuja, Arnav Bhavsar, Aditya Nigam

Date:2026-01-20 12:26:38

Tractography plays a pivotal role in the non-invasive reconstruction of white matter fiber pathways, providing vital information on brain connectivity and supporting precise neurosurgical planning. Although traditional methods relied mainly on classical deterministic and probabilistic approaches, recent progress has benefited from supervised deep learning (DL) and deep reinforcement learning (DRL) to improve tract reconstruction. A persistent challenge in tractography is accurately reconstructing white matter tracts while minimizing spurious connections. To address this, we propose TractRLFusion, a novel GPT-based policy fusion framework that integrates multiple RL policies through a data-driven fusion strategy. Our method employs a two-stage training data selection process for effective policy fusion, followed by a multi-critic fine-tuning phase to enhance robustness and generalization. Experiments on HCP, ISMRM, and TractoInferno datasets demonstrate that TractRLFusion outperforms individual RL policies as well as state-of-the-art classical and DRL methods in accuracy and anatomical reliability.

Block-Fitness Modeling of the Global Air Mobility Network

Authors:Giulia Fischetti, Anna Mancini, Giulio Cimini, Jessica T. Davis, Abby Leung, Alessandro Vespignani, Guido Caldarelli

Date:2026-01-20 11:31:56

Accurate representations of the World Air Transportation Network (WAN) are fundamental inputs to models of global mobility, epidemic risk, and infrastructure planning. However, high-resolution, real-time data on the WAN are largely commercial and proprietary, therefore often inaccessible to the research community. Here we introduce a generative model of the WAN that treats air travel as a stochastic process within a maximum-entropy framework. The model uses airport-level passenger flows to probabilistically generate connections while preserving traffic volumes across geographic regions. The resulting reconstructed networks reproduce key structural properties of the WAN and enable simulations of dynamic spreading that closely match those obtained using the real network. Our approach provides a scalable, interpretable, and computationally efficient framework for forecasting and policy design in global mobility systems.

DroneVLA: VLA based Aerial Manipulation

Authors:Fawad Mehboob, Monijesu James, Amir Habel, Jeffrin Sam, Miguel Altamirano Cabrera, Dzmitry Tsetserukou

Date:2026-01-20 10:08:00

As aerial platforms evolve from passive observers to active manipulators, the challenge shifts toward designing intuitive interfaces that allow non-expert users to command these systems naturally. This work introduces a novel concept of autonomous aerial manipulation system capable of interpreting high-level natural language commands to retrieve objects and deliver them to a human user. The system is intended to integrate a MediaPipe based on Grounding DINO and a Vision-Language-Action (VLA) model with a custom-built drone equipped with a 1-DOF gripper and an Intel RealSense RGB-D camera. VLA performs semantic reasoning to interpret the intent of a user prompt and generates a prioritized task queue for grasping of relevant objects in the scene. Grounding DINO and dynamic A* planning algorithm are used to navigate and safely relocate the object. To ensure safe and natural interaction during the handover phase, the system employs a human-centric controller driven by MediaPipe. This module provides real-time human pose estimation, allowing the drone to employ visual servoing to maintain a stable, distinct position directly in front of the user, facilitating a comfortable handover. We demonstrate the system's efficacy through real-world experiments for localization and navigation, which resulted in a 0.164m, 0.070m, and 0.084m of max, mean euclidean, and root-mean squared errors, respectively, highlighting the feasibility of VLA for aerial manipulation operations.

PAtt: A Pattern Attention Network for ETA Prediction Using Historical Speed Profiles

Authors:ByeoungDo Kim, JunYeop Na, Kyungwook Tak, JunTae Kim, DongHyeon Kim, Duckky Kim

Date:2026-01-20 09:51:35

In this paper, we propose an ETA model (Estimated Time of Arrival) that leverages an attention mechanism over historical road speed patterns. As autonomous driving and intelligent transportation systems become increasingly prevalent, the need for accurate and reliable ETA estimation has grown, playing a vital role in navigation, mobility planning, and traffic management. However, predicting ETA remains a challenging task due to the dynamic and complex nature of traffic flow. Traditional methods often combine real-time and historical traffic data in simplistic ways, or rely on complex rule-based computations. While recent deep learning models have shown potential, they often require high computational costs and do not effectively capture the spatio-temporal patterns crucial for ETA prediction. ETA prediction inherently involves spatio-temporal causality, and our proposed model addresses this by leveraging attention mechanisms to extract and utilize temporal features accumulated at each spatio-temporal point along a route. This architecture enables efficient and accurate ETA estimation while keeping the model lightweight and scalable. We validate our approach using real-world driving datasets and demonstrate that our approach outperforms existing baselines by effectively integrating road characteristics, real-time traffic conditions, and historical speed patterns in a task-aware manner.

Fit Matters: Format-Distance Alignment Improves Conversational Search

Authors:Yitian Yang, Yugin Tan, Jung-Tai King, Yang Chen Lin, Yi-Chieh Lee

Date:2026-01-20 09:34:34

Existing conversational search systems can synthesize information into responses, but they lack principled ways to adapt response formats to users' cognitive states. This paper investigates whether aligning format and distance, which involves matching information granularity and media to users' psychological distance, improves user experience. In a between-subjects experiment (N=464) on travel planning, we crossed two distance dimensions (temporal/spatial x near/far) with four formats varying in granularity (abstract/concrete) and media (text/image-and-text). The experiment established that format--distance alignment reduced users' risk perceptions while increasing decision confidence, perceptions of information usefulness, ease of use, enjoyment, and credibility, and adoption intentions. Concrete formats imposed higher cognitive load, but yielded productive effort when matched to near-distance tasks. Images enhanced concrete but not abstract text, suggesting multimedia benefits depend on complementarity. These findings establish format--distance alignment as a distinctive and important design dimension, enabling systems to tailor response formats to users' psychological distance.

ParkingTwin: Training-Free Streaming 3D Reconstruction for Parking-Lot Digital Twins

Authors:Xinhao Liu, Yu Wang, Xiansheng Guo, Gordon Owusu Boateng, Yu Cao, Haonan Si, Xingchen Guo, Nirwan Ansari

Date:2026-01-20 08:03:58

High-fidelity parking-lot digital twins provide essential priors for path planning, collision checking, and perception validation in Automated Valet Parking (AVP). Yet robot-oriented reconstruction faces a trilemma: sparse forward-facing views cause weak parallax and ill-posed geometry; dynamic occlusions and extreme lighting hinder stable texture fusion; and neural rendering typically needs expensive offline optimization, violating edge-side streaming constraints. We propose ParkingTwin, a training-free, lightweight system for online streaming 3D reconstruction. First, OSM-prior-driven geometric construction uses OpenStreetMap semantic topology to directly generate a metric-consistent TSDF, replacing blind geometric search with deterministic mapping and avoiding costly optimization. Second, geometry-aware dynamic filtering employs a quad-modal constraint field (normal/height/depth consistency) to reject moving vehicles and transient occlusions in real time. Third, illumination-robust fusion in CIELAB decouples luminance and chromaticity via adaptive L-channel weighting and depth-gradient suppression, reducing seams under abrupt lighting changes. ParkingTwin runs at 30+ FPS on an entry-level GTX 1660. On a 68,000 m^2 real-world dataset, it achieves SSIM 0.87 (+16.0%), delivers about 15x end-to-end speedup, and reduces GPU memory by 83.3% compared with state-of-the-art 3D Gaussian Splatting (3DGS) that typically requires high-end GPUs (RTX 4090D). The system outputs explicit triangle meshes compatible with Unity/Unreal digital-twin pipelines. Project page: https://mihoutao-liu.github.io/ParkingTwin/

Toward Agentic AI: Task-Oriented Communication for Hierarchical Planning of Long-Horizon Tasks

Authors:Sin-Yu Huang

Date:2026-01-20 07:37:10

Agentic artificial intelligence (AI) is an AI paradigm that can perceive the environment, reason over observations, and execute actions to achieve specific goals. Task-oriented communication supports agentic AI by transmitting only the task-related information instead of full raw data in order to reduce the bandwidth requirement. In real-world scenarios, AI agents often need to perform a sequence of actions to complete complex tasks. Completing these long-horizon tasks requires a hierarchical agentic AI architecture, where a high-level planner module decomposes a task into subtasks, and a low-level actor module executes each subtask sequentially. Since each subtask has a distinct goal, the existing task-oriented communication schemes are not designed to handle different goals for different subtasks. To address this challenge, in this paper, we develop a hierarchical task-oriented communication (HiTOC) framework. We consider a system with an edge server and a robot as an edge device. The high-level planner and low-level actor modules reside on the edge server. The robot transmits only the environment information that is relevant to the current subtask in order to complete a long-horizon task. We propose a conditional variational information bottleneck (cVIB) approach to train the HiTOC framework to adaptively transmit minimal information required for each subtask. Simulations conducted on the AI2-THOR platform demonstrate that the proposed HiTOC framework outperforms three state-of-the-art schemes in terms of the success rate on MAP-THOR benchmark.

The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption

Authors:Apoorva Adimulam, Rajesh Gupta, Sumit Kumar

Date:2026-01-20 07:13:53

Orchestrated multi-agent systems represent the next stage in the evolution of artificial intelligence, where autonomous agents collaborate through structured coordination and communication to achieve complex, shared objectives. This paper consolidates and formalizes the technical composition of such systems, presenting a unified architectural framework that integrates planning, policy enforcement, state management, and quality operations into a coherent orchestration layer. Another primary contribution of this work is the in-depth technical delineation of two complementary communication protocols - the Model Context Protocol, which standardizes how agents access external tools and contextual data, and the Agent2Agent protocol, which governs peer coordination, negotiation, and delegation. Together, these protocols establish an interoperable communication substrate that enables scalable, auditable, and policy-compliant reasoning across distributed agent collectives. Beyond protocol design, the paper details how orchestration logic, governance frameworks, and observability mechanisms collectively sustain system coherence, transparency, and accountability. By synthesizing these elements into a cohesive technical blueprint, this paper provides comprehensive treatments of orchestrated multi-agent systems - bridging conceptual architectures with implementation-ready design principles for enterprise-scale AI ecosystems.

Resilient Routing: Risk-Aware Dynamic Routing in Smart Logistics via Spatiotemporal Graph Learning

Authors:Zhiming Xue, Sichen Zhao, Yalun Qi, Xianling Zeng, Zihan Yu

Date:2026-01-20 06:06:35

With the rapid development of the e-commerce industry, the logistics network is experiencing unprecedented pressure. The traditional static routing strategy most time cannot tolerate the traffic congestion and fluctuating retail demand. In this paper, we propose a Risk-Aware Dynamic Routing(RADR) framework which integrates Spatiotemporal Graph Neural Networks (ST-GNN) with combinatorial optimization. We first construct a logistics topology graph by using the discrete GPS data using spatial clustering methods. Subsequently, a hybrid deep learning model combining Graph Convolutional Network (GCN) and Gated Recurrent Unit (GRU) is adopted to extract spatial correlations and temporal dependencies for predicting future congestion risks. These prediction results are then integrated into a dynamic edge weight mechanism to perform path planning. We evaluated the framework on the Smart Logistics Dataset 2024, which contains real-world Internet of Things(IoT) sensor data. The experimental results show that the RADR algorithm significantly enhances the resilience of the supply chain. Particularly in the case study of high congestion scenarios, our method reduces the potential congestion risk exposure by 19.3% while only increasing the transportation distance by 2.1%. This empirical evidence confirms that the proposed data-driven approach can effectively balance delivery efficiency and operational safety.

Diffusion In Diffusion: Breaking the Autoregressive Bottleneck in Block Diffusion Models

Authors:Linrui Ma, Yufei Cui, Kai Han, Yunhe Wang

Date:2026-01-20 05:00:26

Block diffusion language models, operating as semi-autoregressive paradigms, combine the strengths of both autoregressive and diffusion paradigms. However, their strict unidirectional block dependencies introduce irreversibility and sacrifice the global planning capabilities for which diffusion models are renowned. In order to address these issues, we propose Diffusion in Diffusion, a draft-then-refine framework designed to overcome the irreversibility and myopia problems inherent in block diffusion models. Our approach first employs block diffusion to generate rapid drafts using small blocks, then refines these drafts through global bidirectional diffusion with a larger bidirectional receptive field. We utilise snapshot confidence remasking to identify the most critical tokens that require modification, and apply mix-scale training to expand the block diffusion model's global capabilities. Empirical results demonstrate that our approach sets a new benchmark for discrete diffusion models on the OpenWebText dataset. Using just 26% of the fine-tuning budget of baseline models, we reduce generative perplexity from 25.7 to 21.9, significantly narrowing the performance gap with autoregressive models.

Toward Ultra-fast Treatments: Large Energy Acceptance Beam Delivery Systems and Opportunities for Proton Beam Therapy

Authors:Jacinta Yap, Adam Steinberg, Hannah Norman, Konrad Nesteruk, Suzie Sheehy

Date:2026-01-20 04:03:14

Treatment delivery is largely determined by capabilities of the beam delivery system (BDS), where faster delivery can have many potential benefits including improved dosimetric quality, utility, cost effectiveness, patient throughput and comfort. Despite significant developments in accelerators, delivery methodologies, dose optimisation and more, the energy layer switching time (ELST) is still a persisting limitation in existing BDS. The ELST can contribute significantly to beam delivery time (BDT) and extend treatment times, requiring compensation by optimisation planning approaches, motion mitigation strategies, or active beam modification. This fundamental constraint can be addressed by increasing the narrow energy acceptance range of conventional beamlines to minimise the ELST, enabling ultra-fast delivery. A large energy acceptance (LEA) BDS has the potential to revolutionise PBT through immediate improvements to current treatment delivery and emerging delivery modalities: the complete exploitation of PBT - and unlocking its full potential - can only be made possible with advances in beam delivery technologies. We review the abundant opportunities offered by an ultra-fast BDS: shorter treatment times, reduced motion induced dose degradation, improved effectiveness of motion management techniques, possibilities for volumetric rescanning, bidirectional delivery, further planning optimisation, and novel delivery strategies. We overview the design concepts of several LEA proposals, technology requirements, and also discuss the remaining challenges and considerations with realising a LEA BDS in practice. There are multiple avenues requiring further development and study, however the clinical potential and benefits of this enabling technology are clear: ultra-fast delivery offers both immediate and future improvements to PBT treatments.

The OncoReach Stylet for Brachytherapy: Design Evaluation and Pilot Study

Authors:Pejman Kheradmand, Kent K. Yamamoto, Emma Webster, Keith Sowards, Gianna Hatheway, Katharine L. Jackson, Sabino Zani, Julie A. Raffi, Diandra N. Ayala-Peacock, Scott R. Silva, Joanna Deaton Bertram, Yash Chitalia

Date:2026-01-20 02:25:47

Cervical cancer accounts for a significant portion of the global cancer burden among women. Interstitial brachytherapy (ISBT) is a standard procedure for treating cervical cancer; it involves placing a radioactive source through a straight hollow needle within or in close proximity to the tumor and surrounding tissue. However, the use of straight needles limits surgical planning to a linear needle path. We present the OncoReach stylet, a handheld, tendon-driven steerable stylet designed for compatibility with standard ISBT 15- and 13-gauge needles. Building upon our prior work, we evaluated design parameters like needle gauge, spherical joint count and spherical joint placement, including an asymmetric disk design to identify a configuration that maximizes bending compliance while retaining axial stiffness. Free space experiments quantified tip deflection across configurations, and a two-tube Cosserat rod model accurately predicted the centerline shape of the needle for most trials. The best performing configuration was integrated into a reusable handheld prototype that enables manual actuation. A patient-derived, multi-composite phantom model of the uterus and pelvis was developed to conduct a pilot study of the OncoReach steerable stylet with one expert user. Results showed the ability to steer from less-invasive, medial entry points to reach the lateral-most targets, underscoring the significance of steerable stylets.

Quantum Encryption Resilience Score (QERS) for MQTT, HTTP, and HTTPS under Post-Quantum Cryptography in Computer, IoT, and IIoT Systems

Authors:Jonatan Rassekhnia

Date:2026-01-19 22:09:21

Post-quantum cryptography (PQC) introduces significant computational and communication overhead, which poses challenges for resource-constrained computer systems, Internet of Things (IoT), and Industrial IoT (IIoT) devices. This paper presents an experimental evaluation of the Quantum Encryption Resilience Score (QERS) applied to MQTT, HTTP, and HTTPS communication protocols operating under PQC. Using an ESP32-C6 client and an ARM-based Raspberry Pi CM4 server, latency, CPU utilization, RSSI, energy consumption, key size, and TLS handshake overhead are measured under realistic operating conditions. QERS integrates these heterogeneous metrics into normalized Basic, Tuned, and Fusion scores, enabling systematic comparison of protocol efficiency and security resilience. Experimental results show that MQTT provides the highest efficiency under PQC constraints, while HTTPS achieves the highest security-weighted resilience at the cost of increased latency and resource consumption. The proposed framework supports informed protocol selection and migration planning for PQC-enabled IoT and IIoT deployments.

TrustEnergy: A Unified Framework for Accurate and Reliable User-level Energy Usage Prediction

Authors:Dahai Yu, Rongchao Xu, Dingyi Zhuang, Yuheng Bu, Shenhao Wang, Guang Wang

Date:2026-01-19 22:09:08

Energy usage prediction is important for various real-world applications, including grid management, infrastructure planning, and disaster response. Although a plethora of deep learning approaches have been proposed to perform this task, most of them either overlook the essential spatial correlations across households or fail to scale to individualized prediction, making them less effective for accurate fine-grained user-level prediction. In addition, due to the dynamic and uncertain nature of energy usage caused by various factors such as extreme weather events, quantifying uncertainty for reliable prediction is also significant, but it has not been fully explored in existing work. In this paper, we propose a unified framework called TrustEnergy for accurate and reliable user-level energy usage prediction. There are two key technical components in TrustEnergy, (i) a Hierarchical Spatiotemporal Representation module to efficiently capture both macro and micro energy usage patterns with a novel memory-augmented spatiotemporal graph neural network, and (ii) an innovative Sequential Conformalized Quantile Regression module to dynamically adjust uncertainty bounds to ensure valid prediction intervals over time, without making strong assumptions about the underlying data distribution. We implement and evaluate our TrustEnergy framework by working with an electricity provider in Florida, and the results show our TrustEnergy can achieve a 5.4% increase in prediction accuracy and 5.7% improvement in uncertainty quantification compared to state-of-the-art baselines.

QERS: Quantum Encryption Resilience Score for Post-Quantum Cryptography in Computer, IoT, and IIoT Systems

Authors:Jonatan Rassekhnia

Date:2026-01-19 21:10:27

Post-quantum cryptography (PQC) is becoming essential for securing Internet of Things (IoT) and Industrial IoT (IIoT) systems against quantum-enabled adversaries. However, existing evaluation approaches primarily focus on isolated performance metrics, offering limited support for holistic security and deployment decisions. This paper introduces QERS (Quantum Encryption Resilience Score), a universal measurement framework that integrates cryptographic performance, system constraints, and multi-criteria decision analysis to assess PQC readiness in computer, IoT, and IIoT environments. QERS combines normalized metrics, weighted aggregation, and machine learning-assisted analysis to produce interpretable resilience scores across heterogeneous devices and communication protocols. Experimental results demonstrate how the framework enables comparative evaluation of post-quantum schemes under realistic resource constraints, supporting informed security design and migration planning. This work is presented as a preprint, with extended statistical validation planned as part of ongoing graduate research.

CLEAR: A Semantic-Geometric Terrain Abstraction for Large-Scale Unstructured Environments

Authors:Pranay Meshram, Charuvahan Adhivarahan, Ehsan Tarkesh Esfahani, Souma Chowdhury, Chen Wang, Karthik Dantu

Date:2026-01-19 19:56:06

Long-horizon navigation in unstructured environments demands terrain abstractions that scale to tens of km$^2$ while preserving semantic and geometric structure, a combination existing methods fail to achieve. Grids scale poorly; quadtrees misalign with terrain boundaries; neither encodes landcover semantics essential for traversability-aware planning. This yields infeasible or unreliable paths for autonomous ground vehicles operating over 10+ km$^2$ under real-time constraints. CLEAR (Connected Landcover Elevation Abstract Representation) couples boundary-aware spatial decomposition with recursive plane fitting to produce convex, semantically aligned regions encoded as a terrain-aware graph. Evaluated on maps spanning 9-100~km$^2$ using a physics-based simulator, CLEAR achieves up to 10x faster planning than raw grids with only 6.7% cost overhead and delivers 6-9% shorter, more reliable paths than other abstraction baselines. These results highlight CLEAR's scalability and utility for long-range navigation in applications such as disaster response, defense, and planetary exploration.

Beyond Mapping : Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans

Authors:Abdel Djalil Sad Saoud, Fred Maurice Ngolè Mboula, Hanane Slimani

Date:2026-01-19 19:38:59

Distributional shifts between training and inference time data remain a central challenge in machine learning, often leading to poor performance. It motivated the study of principled approaches for domain alignment, such as optimal transport based unsupervised domain adaptation, that relies on approximating Monge map using transport plans, which is sensitive to the transport problem regularization strategy and hyperparameters, and might yield biased domains alignment. In this work, we propose to interpret smoothed transport plans as adjacency matrices of bipartite graphs connecting source to target domain and derive domain-invariant samples' representations through spectral embedding. We evaluate our approach on acoustic adaptation benchmarks for music genre recognition, music-speech discrimination, as well as electrical cable defect detection and classification tasks using time domain reflection in different diagnosis settings, achieving overall strong performances.