planning - 2025-09-04

A Brenier Theorem on $(\mathcal{P}_2 (\mathcal{P}_2(\mathbb{R}^d )), W_2 )$ and Applications to Adapted Transport

Authors:Mathias Beiglböck, Gudmund Pammer, Stefan Schrott

Date:2025-09-03 17:41:32

Brenier's fundamental theorem characterizes optimal transport plans for measures $\mu, \nu$ on $\mathbb{R}^d$ and quadratic distance costs in terms of gradients of convex functions. In particular it guarantees the existence of optimal transport maps for measures which are absolutely continuous wrt Lebesgue measure. Our goal is to provide a version of this result for measures $P,Q$ on $\mathcal{P}_2(\mathbb{R}^d)$ and costs given by the squared Wasserstein distance $W_2^2(\mu, \nu)$. We characterize optimizers in terms of convexity of the Lions lift. This is based on an observation which seems to be of independent interest: the $c$-transform of a functional $\phi$, where $c(\mu, \nu)$ denotes maximal covariance of $\mu, \nu$ corresponds precisely to the Legendre transform of the Lions lift of $\phi$. Moreover we show that for typical $P \in\mathcal{P}_2(\mathbb{R}^d)$ the optimizer is unique and given by a transport map. In the absence of a canonical reference measure on $\mathcal{P}_2(\mathbb{R}^d)$ we use a topological notion to make `typical' precise. Specifically we show that the transport regular measures are of second Baire category. A particular motivation for our article stems from the theory of adapted transport where the adapted Wasserstein distance provides an adequate distance between stochastic processes. In contrast to other metrics, the adapted Wasserstein distance yields continuity of Doob-decomposition, optimal stopping and stochastic control problems. Based on our results for measures on $\mathcal{P}_2(\mathbb{R}^d)$ we obtain a first Brenier-type theorem for the adapted Wasserstein distance.

Real-Time Instrument Planning and Perception for Novel Measurements of Dynamic Phenomena

Authors:Itai Zilberstein, Alberto Candela, Steve Chien

Date:2025-09-03 17:32:15

Advancements in onboard computing mean remote sensing agents can employ state-of-the-art computer vision and machine learning at the edge. These capabilities can be leveraged to unlock new rare, transient, and pinpoint measurements of dynamic science phenomena. In this paper, we present an automated workflow that synthesizes the detection of these dynamic events in look-ahead satellite imagery with autonomous trajectory planning for a follow-up high-resolution sensor to obtain pinpoint measurements. We apply this workflow to the use case of observing volcanic plumes. We analyze classification approaches including traditional machine learning algorithms and convolutional neural networks. We present several trajectory planning algorithms that track the morphological features of a plume and integrate these algorithms with the classifiers. We show through simulation an order of magnitude increase in the utility return of the high-resolution instrument compared to baselines while maintaining efficient runtimes.

Generative Auto-Bidding in Large-Scale Competitive Auctions via Diffusion Completer-Aligner

Authors:Yewen Li, Jingtong Gao, Nan Jiang, Shuai Mao, Ruyi An, Fei Pan, Xiangyu Zhao, Bo An, Qingpeng Cai, Peng Jiang

Date:2025-09-03 14:25:36

Auto-bidding is central to computational advertising, achieving notable commercial success by optimizing advertisers' bids within economic constraints. Recently, large generative models show potential to revolutionize auto-bidding by generating bids that could flexibly adapt to complex, competitive environments. Among them, diffusers stand out for their ability to address sparse-reward challenges by focusing on trajectory-level accumulated rewards, as well as their explainable capability, i.e., planning a future trajectory of states and executing bids accordingly. However, diffusers struggle with generation uncertainty, particularly regarding dynamic legitimacy between adjacent states, which can lead to poor bids and further cause significant loss of ad impression opportunities when competing with other advertisers in a highly competitive auction environment. To address it, we propose a Causal auto-Bidding method based on a Diffusion completer-aligner framework, termed CBD. Firstly, we augment the diffusion training process with an extra random variable t, where the model observes t-length historical sequences with the goal of completing the remaining sequence, thereby enhancing the generated sequences' dynamic legitimacy. Then, we employ a trajectory-level return model to refine the generated trajectories, aligning more closely with advertisers' objectives. Experimental results across diverse settings demonstrate that our approach not only achieves superior performance on large-scale auto-bidding benchmarks, such as a 29.9% improvement in conversion value in the challenging sparse-reward auction setting, but also delivers significant improvements on the Kuaishou online advertising platform, including a 2.0% increase in target cost.

The Role of Embodiment in Intuitive Whole-Body Teleoperation for Mobile Manipulation

Authors:Sophia Bianchi Moyen, Rickmer Krohn, Sophie Lueth, Kay Pompetzki, Jan Peters, Vignesh Prasad, Georgia Chalvatzaki

Date:2025-09-03 11:25:36

Intuitive Teleoperation interfaces are essential for mobile manipulation robots to ensure high quality data collection while reducing operator workload. A strong sense of embodiment combined with minimal physical and cognitive demands not only enhances the user experience during large-scale data collection, but also helps maintain data quality over extended periods. This becomes especially crucial for challenging long-horizon mobile manipulation tasks that require whole-body coordination. We compare two distinct robot control paradigms: a coupled embodiment integrating arm manipulation and base navigation functions, and a decoupled embodiment treating these systems as separate control entities. Additionally, we evaluate two visual feedback mechanisms: immersive virtual reality and conventional screen-based visualization of the robot's field of view. These configurations were systematically assessed across a complex, multi-stage task sequence requiring integrated planning and execution. Our results show that the use of VR as a feedback modality increases task completion time, cognitive workload, and perceived effort of the teleoperator. Coupling manipulation and navigation leads to a comparable workload on the user as decoupling the embodiments, while preliminary experiments suggest that data acquired by coupled teleoperation leads to better imitation learning performance. Our holistic view on intuitive teleoperation interfaces provides valuable insight into collecting high-quality, high-dimensional mobile manipulation data at scale with the human operator in mind. Project website:https://sophiamoyen.github.io/role-embodiment-wbc-moma-teleop/

Plan More, Debug Less: Applying Metacognitive Theory to AI-Assisted Programming Education

Authors:Tung Phung, Heeryung Choi, Mengyan Wu, Adish Singla, Christopher Brooks

Date:2025-09-03 09:38:43

The growing adoption of generative AI in education highlights the need to integrate established pedagogical principles into AI-assisted learning environments. This study investigates the potential of metacognitive theory to inform AI-assisted programming education through a hint system designed around the metacognitive phases of planning, monitoring, and evaluation. Upon request, the system can provide three types of AI-generated hints--planning, debugging, and optimization--to guide students at different stages of problem-solving. Through a study with 102 students in an introductory data science programming course, we find that students perceive and engage with planning hints most highly, whereas optimization hints are rarely requested. We observe a consistent association between requesting planning hints and achieving higher grades across question difficulty and student competency. However, when facing harder tasks, students seek additional debugging but not more planning support. These insights contribute to the growing field of AI-assisted programming education by providing empirical evidence on the importance of pedagogical principles in AI-assisted learning.

A Hierarchical Deep Reinforcement Learning Framework for Traffic Signal Control with Predictable Cycle Planning

Authors:Hankang Gu, Yuli Zhang, Chengming Wang, Ruiyuan Jiang, Ziheng Qiao, Pengfei Fan, Dongyao Jia

Date:2025-09-03 08:20:06

Deep reinforcement learning (DRL) has become a popular approach in traffic signal control (TSC) due to its ability to learn adaptive policies from complex traffic environments. Within DRL-based TSC methods, two primary control paradigms are ``choose phase" and ``switch" strategies. Although the agent in the choose phase paradigm selects the next active phase adaptively, this paradigm may result in unexpected phase sequences for drivers, disrupting their anticipation and potentially compromising safety at intersections. Meanwhile, the switch paradigm allows the agent to decide whether to switch to the next predefined phase or extend the current phase. While this structure maintains a more predictable order, it can lead to unfair and inefficient phase allocations, as certain movements may be extended disproportionately while others are neglected. In this paper, we propose a DRL model, named Deep Hierarchical Cycle Planner (DHCP), to allocate the traffic signal cycle duration hierarchically. A high-level agent first determines the split of the total cycle time between the North-South (NS) and East-West (EW) directions based on the overall traffic state. Then, a low-level agent further divides the allocated duration within each major direction between straight and left-turn movements, enabling more flexible durations for the two movements. We test our model on both real and synthetic road networks, along with multiple sets of real and synthetic traffic flows. Empirical results show our model achieves the best performance over all datasets against baselines.

CARPO: Leveraging Listwise Learning-to-Rank for Context-Aware Query Plan Optimization

Authors:Wenrui Zhou, Qiyu Liu, Jingshu Peng, Aoqian Zhang, Lei Chen

Date:2025-09-03 07:59:16

Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-$k$ fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of \textbf{74.54\%} on the TPC-H benchmark compared to Lero's 3.63\%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.

Sustainable restoration of intermittent streams: Integrating ecological design and urban resilience

Authors:Parinaz Baradaran Anaraki, Shiva Manshour

Date:2025-09-03 06:31:05

The sustainable restoration of intermittent streams has become a critical priority in contemporary urban planning, particularly as cities confront the dual challenges of ecological degradation and climate change. In Tehran, decades of rapid urbanization and poor management practices have confined natural streams into rigid concrete channels, eroding their ecological value and disconnecting them from community life. This paper introduces an ecological and sustainability-oriented framework for the restoration of the Darband and Darabad river valleys, highlighting their potential to function as ecological corridors that support biodiversity, thermal regulation, cultural identity, and urban resilience. The study employs a systematic methodology that integrates ecological engineering, landscape design, hydrological modeling, and participatory planning. Findings suggest that restoring these river valleys through sustainable strategies, such as the creation of active green networks, multifunctional public spaces, and resilient hydrological systems, can transform them from degraded drainage corridors into life-giving urban landscapes. Moreover, the research emphasizes the necessity of linking restoration with sustainability goals to ensure long-term ecological balance, social well-being, and climate adaptation. This case study demonstrates that sustainable river restoration, when aligned with ecological design and community engagement, has the potential to reposition intermittent streams as essential infrastructures for sustainable urban development and resilience.

Introduction to the Sky Survey Schedule (SSS) framework

Authors:Kai-Tian Yuan, Hou-Yuan Lin

Date:2025-09-03 04:28:20

To fulfill the requirements of space object cataloging and enable automated intelligent responses to anomalous events, we designed a novel observation scheduling system named Sky Survey Schedule (SSS). This framework facilitates coordinated operations across multi-site observational networks comprising dozens of instruments, while simultaneously supporting asteroid monitoring and time-domain astronomy studies. The system implements two principal observation modes: fixed sky regions and target-centered tracking. The former is used for sidereal or static observation, while the latter provides dedicated follow-up capabilities for transient targets. The sky regions are divided into latitude bands, each of which is subdivided into sectors to ensure minimal overlap and comprehensive coverage. These sectors are mapped to high-level HEALPix sky grids, enabling rapid cross-referencing and correlation between instruments. At the core of SSS lies an adaptive weighting architecture that integrates multiple parameters. Initial target priorities are determined from orbital catalogs containing both known and uncorrelated objects according to cataloging requirements. The system implements dynamic weight adjustments through feedback mechanisms: confirmed stable objects receive decaying weights, long-unobserved targets experience weight recovery, while anomalies (e.g., newly detected or lost objects) trigger priority escalation. These target-specific weights combine with observational factors - including phase angle constraints, lunar interference, Earth shadowing, and elevation limits - to generate space-time priority matrices. This quantitative framework systematically incorporates operator-defined priorities for specific regions/targets through configurable weight modifiers. Observation plans are dynamically optimized considering:....

SOPSeg: Prompt-based Small Object Instance Segmentation in Remote Sensing Imagery

Authors:Chenhao Wang, Yingrui Ji, Yu Meng, Yunjian Zhang, Yao Zhu

Date:2025-09-03 04:25:03

Extracting small objects from remote sensing imagery plays a vital role in various applications, including urban planning, environmental monitoring, and disaster management. While current research primarily focuses on small object detection, instance segmentation for small objects remains underexplored, with no dedicated datasets available. This gap stems from the technical challenges and high costs of pixel-level annotation for small objects. While the Segment Anything Model (SAM) demonstrates impressive zero-shot generalization, its performance on small-object segmentation deteriorates significantly, largely due to the coarse 1/16 feature resolution that causes severe loss of fine spatial details. To this end, we propose SOPSeg, a prompt-based framework specifically designed for small object segmentation in remote sensing imagery. It incorporates a region-adaptive magnification strategy to preserve fine-grained details, and employs a customized decoder that integrates edge prediction and progressive refinement for accurate boundary delineation. Moreover, we introduce a novel prompting mechanism tailored to the oriented bounding boxes widely adopted in remote sensing applications. SOPSeg outperforms existing methods in small object segmentation and facilitates efficient dataset construction for remote sensing tasks. We further construct a comprehensive small object instance segmentation dataset based on SODA-A, and will release both the model and dataset to support future research.

Optical design and polarimetric performance of a SmallSat UV polarimeter to study interstellar dust: PUFFINS

Authors:Ramya M Anche, Hyukmo Kang, Kyle Van Gorkom, Dan Vargas, Haeun Chung, Ellie Spitzer, Meredith Kupinski, B-G Andersson, Geoff Clayton, Ewan S. Douglas, Luca Fossati, Victor Gasho, Sreejith Aickara Gopinathan, Erika Hamden, Thiem Hoang, Marcus Klupar, Ryan Lau, Alexandre Lazarian, Tram N Le, Joanna Rosenbluth, Ambily Suresh, Carlos J. Vargas

Date:2025-09-03 04:00:35

The Polarimetry in the Ultraviolet to Find Features in INterStellar dust (PUFFINS) is a SmallSat mission concept designed to obtain ultraviolet (UV) spectropolarimetric observations to probe the interstellar dust grain properties and to understand wavelength-dependent extinction and star formation. PUFFINS plans to observe 70 UV bright target stars at varying distances within a 180-320 nm wavelength range with 0.02% polarimetric accuracy. PUFFINS uses a simple telescope design with all reflective optics coated with protected aluminum to enhance reflectivity in the UV. The telescope and the spectropolarimeter, which consists of a Wollaston prism and a half-wave retarder, have been carefully selected to be greater than Technology Readiness Level 6 (TRL6). The telescope is designed to exhibit negligible instrumental polarization and crosstalk, significantly reducing the time needed for polarimetric calibration in orbit. The optimum and careful selection of the target stars will enable PUFFINS to observe an expanded and well-defined sample to test the predictions by interstellar grain alignment theory in the observation phase of 9 months. This paper outlines the details of the optical and optomechanical design and evaluates the polarimetric performance of PUFFINS.

KEPT: Knowledge-Enhanced Prediction of Trajectories from Consecutive Driving Frames with Vision-Language Models

Authors:Yujin Wang, Tianyi Wang, Quanfeng Liu, Wenxian Fan, Junfeng Jiao, Christian Claudel, Yunbing Yan, Bingzhao Gao, Jianqiang Wang, Hong Chen

Date:2025-09-03 03:10:42

Accurate short-horizon trajectory prediction is pivotal for safe and reliable autonomous driving, yet existing vision-language models (VLMs) often fail to effectively ground their reasoning in scene dynamics and domain knowledge. To address this challenge, this paper introduces KEPT, a knowledge-enhanced VLM framework that predicts ego trajectories directly from consecutive front-view driving frames. KEPT couples a temporal frequency-spatial fusion (TFSF) video encoder, trained via self-supervised learning with hard-negative mining, with a scalable k-means + HNSW retrieval stack that supplies scene-aligned exemplars. Retrieved priors are embedded into chain-of-thought (CoT) prompts with explicit planning constraints, while a triple-stage fine-tuning schedule incrementally aligns the language head to metric spatial cues, physically feasible motion, and temporally conditioned front-view planning. Evaluated on nuScenes dataset, KEPT achieves state-of-the-art performance across open-loop protocols: under NoAvg, it achieves 0.70m average L2 with a 0.21\% collision rate; under TemAvg with lightweight ego status, it attains 0.31m average L2 and a 0.07\% collision rate. Ablation studies show that all three fine-tuning stages contribute complementary benefits, and that using Top-2 retrieved exemplars yields the best accuracy-safety trade-off. The k-means-clustered HNSW index delivers sub-millisecond retrieval latency, supporting practical deployment. These results indicate that retrieval-augmented, CoT-guided VLMs offer a promising, data-efficient pathway toward interpretable and trustworthy autonomous driving.

Mechanistic Insights Into How Rewiring and Bifurcation Angle Affect DK-Crush Stent Deployment

Authors:Andrea Colombo, Dario Carbonaro, Mingzi Zhang, Claudio Chiastra, Mark Webster, Nigel Jepson, Susann Beier

Date:2025-09-03 02:26:45

Background Double Kissing Crush (DKC) is a preferred two-stent technique for complex coronary bifurcation lesions. Proximal cell rewiring is routinely recommended to reduce technical failure, and DKC is considered effective across various bifurcation angles. However, it remains unclear whether this standard approach is optimal for all patients. This study investigates the interaction between bifurcation angle and rewiring configuration to identify anatomy-specific strategies. Methods Computational modeling of the DKC procedure was used to simulate 12 DKC procedures across three left main bifurcation angles (45{\deg}, 70{\deg}, and 100{\deg}) and four rewiring configurations: proximal-proximal (P-P), proximal-distal (P-D), distal-proximal (D-P), and distal-distal (D-D). Evaluation metrics included stent malapposition, side branch ostium clearance, arterial wall stress, low time-averaged endothelial shear stress, and high shear rates. Results DKC performed in wide bifurcations (100{\deg}) resulted in worse outcomes, with malapposition reaching 18%, side branch clearance down to 23%, and up to twice the exposure to adverse high shear rates compared to narrower angles. In contrast, intermediate (70{\deg}) and narrow (45{\deg}) angles generally resulted in more favorable outcomes, though optimal rewiring varied by angle. Proximal strategies, i.e. P-P and P-D, were most effective at 70{\deg}, while D-D performed best at 45{\deg}. No single strategy was consistently superior across all bifurcation angles. Conclusions DKC outcomes depend on bifurcation angle and can be optimized by tailoring rewiring strategies, challenging the current clinical understanding. These findings support anatomy-specific procedural planning and intravascular imaging to guide rewiring. This study provides a mechanistic rationale to improve clinical decision-making and tailor bifurcation interventions.

Learning General Policies From Examples

Authors:Blai Bonet, Hector Geffner

Date:2025-09-02 19:56:08

Combinatorial methods for learning general policies that solve large collections of planning problems have been recently developed. One of their strengths, in relation to deep learning approaches, is that the resulting policies can be understood and shown to be correct. A weakness is that the methods do not scale up and learn only from small training instances and feature pools that contain a few hundreds of states and features at most. In this work, we propose a new symbolic method for learning policies based on the generalization of sampled plans that ensures structural termination and hence acyclicity. The proposed learning approach is not based on SAT/ASP, as previous symbolic methods, but on a hitting set algorithm that can effectively handle problems with millions of states, and pools with hundreds of thousands of features. The formal properties of the approach are analyzed, and its scalability is tested on a number of benchmarks.

A Digital Twin for Robotic Post Mortem Tissue Sampling using Virtual Reality

Authors:Maximilian Neidhardt, Ludwig Bosse, Vidas Raudonis, Kristina Allgoewer, Axel Heinemann, Benjamin Ondruschka, Alexander Schlaefer

Date:2025-09-02 19:06:25

Studying tissue samples obtained during autopsies is the gold standard when diagnosing the cause of death and for understanding disease pathophysiology. Recently, the interest in post mortem minimally invasive biopsies has grown which is a less destructive approach in comparison to an open autopsy and reduces the risk of infection. While manual biopsies under ultrasound guidance are more widely performed, robotic post mortem biopsies have been recently proposed. This approach can further reduce the risk of infection for physicians. However, planning of the procedure and control of the robot need to be efficient and usable. We explore a virtual reality setup with a digital twin to realize fully remote planning and control of robotic post mortem biopsies. The setup is evaluated with forensic pathologists in a usability study for three interaction methods. Furthermore, we evaluate clinical feasibility and evaluate the system with three human cadavers. Overall, 132 needle insertions were performed with an off-axis needle placement error of 5.30+-3.25 mm. Tissue samples were successfully biopsied and histopathologically verified. Users reported a very intuitive needle placement approach, indicating that the system is a promising, precise, and low-risk alternative to conventional approaches.

Energy-Efficient Split Learning for Resource-Constrained Environments: A Smart Farming Solution

Authors:Keiwan Soltani, Vishesh Kumar Tanwar, Ashish Gupta, Sajal K. Das

Date:2025-09-02 17:48:35

Smart farming systems encounter significant challenges, including limited resources, the need for data privacy, and poor connectivity in rural areas. To address these issues, we present eEnergy-Split, an energy-efficient framework that utilizes split learning (SL) to enable collaborative model training without direct data sharing or heavy computation on edge devices. By distributing the model between edge devices and a central server, eEnergy-Split reduces on-device energy usage by up to 86 percent compared to federated learning (FL) while safeguarding data privacy. Moreover, SL improves classification accuracy by up to 6.2 percent over FL on ResNet-18 and by more modest amounts on GoogleNet and MobileNetV2. We propose an optimal edge deployment algorithm and a UAV trajectory planning strategy that solves the Traveling Salesman Problem (TSP) exactly to minimize flight cost and extend and maximize communication rounds. Comprehensive evaluations on agricultural pest datasets reveal that eEnergy-Split lowers UAV energy consumption compared to baseline methods and boosts overall accuracy by up to 17 percent. Notably, the energy efficiency of SL is shown to be model-dependent-yielding substantial savings in lightweight models like MobileNet, while communication and memory overheads may reduce efficiency gains in deeper networks. These results highlight the potential of combining SL with energy-aware design to deliver a scalable, privacy-preserving solution for resource-constrained smart farming environments.

Cooperative Multi-Agent Path Planning for Heterogeneous UAVs in Contested Environments

Authors:Grant Stagg, Cameron K. Peterson

Date:2025-09-02 16:35:17

This paper addresses the challenge of navigating unmanned aerial vehicles in contested environments by introducing a cooperative multi-agent framework that increases the likelihood of safe UAV traversal. The approach involves two types of UAVs: low-priority agents that explore and localize threats, and a high-priority agent that navigates safely to its target destination while minimizing the risk of detection by enemy radar systems. The low-priority agents employ a decentralized optimization algorithm to balance exploration, radar localization, and safe path identification for the high-priority agent. For the high-priority agent, two path-planning methods are proposed: one for deterministic scenarios using weighted Voronoi diagrams, and another for uncertain scenarios that leverages generalized Voronoi diagrams (incorporating a non-Euclidean criterion derived from uncertainty in the radar's probability of detection) alongside probabilistic constraints. Both methods employ optimization techniques to refine the trajectories while accounting for kinematic constraints and radar detection probabilities. Numerical simulations demonstrate the effectiveness of our framework. This research advances UAV path planning methodologies by combining heterogeneous multi-agent cooperation, probabilistic modeling, and optimization to enhance mission success in adversarial environments.

AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent

Authors:Jingru Fan, Yufan Dang, Jingyao Wu, Huatao Li, Runde Yang, Xiyuan Yang, Yuheng Wang, Zhong Zhang, Yaxi Lu, Yankai Lin, Zhiyuan Liu, Dahai Li, Chen Qian

Date:2025-09-02 15:48:21

With the raid evolution of large language models and multimodal foundation models, the mobile-agent landscape has proliferated without converging on the fundamental challenges. This paper identifies four core problems that must be solved for mobile agents to deliver practical, scalable impact: (1) generalization across tasks, modalities, apps, and devices; (2) accuracy, specifically precise on-screen interaction and click targeting; (3) long-horizon capability for sustained, multi-step goals; and (4) efficiency, specifically high-performance runtime on resource-constrained devices. We present AppCopilot, a multimodal, multi-agent, general-purpose on-device assistant that operates across applications and constitutes a full-stack, closed-loop system from data to deployment. AppCopilot operationalizes this position through an end-to-end autonomous pipeline spanning data collection, training, deployment, high-quality and efficient inference, and mobile application development. At the model layer, it integrates multimodal foundation models with robust Chinese-English support. At the reasoning and control layer, it combines chain-of-thought reasoning, hierarchical task planning and decomposition, and multi-agent collaboration. At the execution layer, it enables user personalization and experiential adaptation, voice interaction, function calling, cross-app and cross-device orchestration, and comprehensive mobile app support. The system design incorporates profiling-driven optimization for latency, memory, and energy across heterogeneous hardware. Empirically, AppCopilot achieves significant improvements along all four dimensions: stronger generalization, higher-precision on-screen actions, more reliable long-horizon task completion, and faster, more resource-efficient runtime.

MedDINOv3: How to adapt vision foundation models for medical image segmentation?

Authors:Yuheng Li, Yizhou Wu, Yuxiang Lai, Mingzhe Hu, Xiaofeng Yang

Date:2025-09-02 14:44:43

Accurate segmentation of organs and tumors in CT and MRI scans is essential for diagnosis, treatment planning, and disease monitoring. While deep learning has advanced automated segmentation, most models remain task-specific, lacking generalizability across modalities and institutions. Vision foundation models (FMs) pretrained on billion-scale natural images offer powerful and transferable representations. However, adapting them to medical imaging faces two key challenges: (1) the ViT backbone of most foundation models still underperform specialized CNNs on medical image segmentation, and (2) the large domain gap between natural and medical images limits transferability. We introduce MedDINOv3, a simple and effective framework for adapting DINOv3 to medical segmentation. We first revisit plain ViTs and design a simple and effective architecture with multi-scale token aggregation. Then, we perform domain-adaptive pretraining on CT-3M, a curated collection of 3.87M axial CT slices, using a multi-stage DINOv3 recipe to learn robust dense features. MedDINOv3 matches or exceeds state-of-the-art performance across four segmentation benchmarks, demonstrating the potential of vision foundation models as unified backbones for medical image segmentation. The code is available at https://github.com/ricklisz/MedDINOv3.

Quantifying the Social Costs of Power Outages and Restoration Disparities Across Four U.S. Hurricanes

Authors:Xiangpeng Li, Junwei Ma, Bo Li, Ali Mostafavi

Date:2025-09-02 14:32:08

The multifaceted nature of disaster impact shows that densely populated areas contribute more to aggregate burden, while sparsely populated but heavily affected regions suffer disproportionately at the individual level. This study introduces a framework for quantifying the societal impacts of power outages by translating customer weighted outage exposure into deprivation measures, integrating welfare metrics with three recovery indicators, average outage days per customer, restoration duration, and relative restoration rate, computed from sequential EAGLE I observations and linked to Zip Code Tabulation Area demographics. Applied to four United States hurricanes, Beryl 2024 Texas, Helene 2024 Florida, Milton 2024 Florida, and Ida 2021 Louisiana, this standardized pipeline provides the first cross event, fine scale evaluation of outage impacts and their drivers. Results demonstrate regressive patterns with greater burdens in lower income areas, mechanistic analysis shows deprivation increases with longer restoration durations and decreases with faster restoration rates, explainable modeling identifies restoration duration as the dominant driver, and clustering reveals distinct recovery typologies not captured by conventional reliability metrics. This framework delivers a transferable method for assessing outage impacts and equity, comparative cross event evidence linking restoration dynamics to social outcomes, and actionable spatial analyses that support equity informed restoration planning and resilience investment.

Why Do MLLMs Struggle with Spatial Understanding? A Systematic Analysis from Data to Architecture

Authors:Wanyue Zhang, Yibin Huang, Yangbin Xu, JingJing Huang, Helu Zhi, Shuo Ren, Wang Xu, Jiajun Zhang

Date:2025-09-02 14:22:43

Spatial understanding is essential for Multimodal Large Language Models (MLLMs) to support perception, reasoning, and planning in embodied environments. Despite recent progress, existing studies reveal that MLLMs still struggle with spatial understanding. However, existing research lacks a comprehensive and systematic evaluation of these limitations, often restricted to isolated scenarios, such as single-view or video. In this work, we present a systematic analysis of spatial understanding from both data and architectural perspectives across three representative scenarios: single-view, multi-view, and video. We propose a benchmark named MulSeT (Multi-view Spatial Understanding Tasks), and design a series of experiments to analyze the spatial reasoning capabilities of MLLMs. From the data perspective, the performance of spatial understanding converges quickly as the training data increases, and the upper bound is relatively low, especially for tasks that require spatial imagination. This indicates that merely expanding training data is insufficient to achieve satisfactory performance. From the architectural perspective, we find that spatial understanding relies more heavily on the positional encoding within the visual encoder than within the language model, in both cascaded and native MLLMs. Moreover, we explore reasoning injection and envision future improvements through architectural design to optimize spatial understanding. These insights shed light on the limitations of current MLLMs and suggest new directions for improving spatial reasoning capabilities through data scaling and architectural tuning.

Stability-Aware Joint Communication and Control for Nonlinear Control-Non-Affine Wireless Networked Control Systems

Authors:Rasika Vijithasena, Rafaela Scaciota, Mehdi Bennis, Sumudu Samarakoon

Date:2025-09-02 12:16:42

Ensuring the stability of wireless networked control systems (WNCS) with nonlinear and control-non-affine dynamics, where system behavior is nonlinear with respect to both states and control decisions, poses a significant challenge, particularly under limited resources. However, it is essential in the context of 6G, which is expected to support reliable communication to enable real-time autonomous systems. This paper proposes a joint communication and control solution consisting of: i) a deep Koopman model capable of learning and mapping complex nonlinear dynamics into linear representations in an embedding space, predicting missing states, and planning control actions over a future time horizon; and ii) a scheduling algorithm that schedules sensor-controller communication based on Lyapunov optimization, which dynamically allocates communication resources based on system stability and available resources. Control actions are computed within this embedding space using a linear quadratic regulator (LQR) to ensure system stability. The proposed model is evaluated under varying conditions and its performance is compared against two baseline models; one that assumes systems are control-affine, and another that assumes identical control actions in the embedding and original spaces. The evaluation results demonstrate that the proposed model outperforms both baselines, by achieving stability while requiring fewer transmissions.

Towards Multi-Aspect Diversification of News Recommendations Using Neuro-Symbolic AI for Individual and Societal Benefit

Authors:Markus Reiter-Haas, Elisabeth Lex

Date:2025-09-02 11:40:52

News recommendations are complex, with diversity playing a vital role. So far, existing literature predominantly focuses on specific aspects of news diversity, such as viewpoints. In this paper, we introduce multi-aspect diversification in four distinct recommendation modes and outline the nuanced challenges in diversifying lists, sequences, summaries, and interactions. Our proposed research direction combines symbolic and subsymbolic artificial intelligence, leveraging both knowledge graphs and rule learning. We plan to evaluate our models using user studies to not only capture behavior but also their perceived experience. Our vision to balance news consumption points to other positive effects for users (e.g., increased serendipity) and society (e.g., decreased polarization).

Task and Motion Planning of Dynamic Systems using Hyperproperties for Signal Temporal Logics

Authors:Jianing Zhao, Bowen Ye, Xinyi Yu, Rupak Majumdar, Xiang Yin

Date:2025-09-02 10:47:14

We investigate the task and motion planning problem for dynamical systems under signal temporal logic (STL) specifications. Existing works on STL control synthesis mainly focus on generating plans that satisfy properties over a single executed trajectory. In this work, we consider the planning problem for hyperproperties evaluated over a set of possible trajectories, which naturally arise in information-flow control problems. Specifically, we study discrete-time dynamical systems and employ the recently developed temporal logic HyperSTL as the new objective for planning. To solve this problem, we propose a novel recursive counterexample-guided synthesis approach capable of effectively handling HyperSTL specifications with multiple alternating quantifiers. The proposed method is not only applicable to planning but also extends to HyperSTL model checking for discrete-time dynamical systems. Finally, we present case studies on security-preserving planning and ambiguity-free planning to demonstrate the effectiveness of the proposed HyperSTL planning framework.

Systematic Evaluation of Trade-Offs in Motion Planning Algorithms for Optimal Industrial Robotic Work Cell Design

Authors:G. de Mathelin, C. Hartl-Nesic, A. Kugi

Date:2025-09-02 09:49:21

The performance of industrial robotic work cells depends on optimizing various hyperparameters referring to the cell layout, such as robot base placement, tool placement, and kinematic design. Achieving this requires a bilevel optimization approach, where the high-level optimization adjusts these hyperparameters, and the low-level optimization computes robot motions. However, computing the optimal robot motion is computationally infeasible, introducing trade-offs in motion planning to make the problem tractable. These trade-offs significantly impact the overall performance of the bilevel optimization, but their effects still need to be systematically evaluated. In this paper, we introduce metrics to assess these trade-offs regarding optimality, time gain, robustness, and consistency. Through extensive simulation studies, we investigate how simplifications in motion-level optimization affect the high-level optimization outcomes, balancing computational complexity with solution quality. The proposed algorithms are applied to find the time-optimal kinematic design for a modular robot in two palletization scenarios.

Learning Social Heuristics for Human-Aware Path Planning

Authors:Andrea Eirale, Matteo Leonetti, Marcello Chiaberge

Date:2025-09-02 09:36:11

Social robotic navigation has been at the center of numerous studies in recent years. Most of the research has focused on driving the robotic agent along obstacle-free trajectories, respecting social distances from humans, and predicting their movements to optimize navigation. However, in order to really be socially accepted, the robots must be able to attain certain social norms that cannot arise from conventional navigation, but require a dedicated learning process. We propose Heuristic Planning with Learned Social Value (HPLSV), a method to learn a value function encapsulating the cost of social navigation, and use it as an additional heuristic in heuristic-search path planning. In this preliminary work, we apply the methodology to the common social scenario of joining a queue of people, with the intention of generalizing to further human activities.

Forecasting Future DDoS Attacks Using Long Short Term Memory (LSTM) Model

Authors:Kong Mun Yeen, Rafidah Md Noor, Wahidah Md Shah, Aslinda Hassan, Muhammad Umair Munir

Date:2025-09-02 08:26:51

This paper forecasts future Distributed Denial of Service (DDoS) attacks using deep learning models. Although several studies address forecasting DDoS attacks, they remain relatively limited compared to detection-focused research. By studying the current trends and forecasting based on newer and updated datasets, mitigation plans against the attacks can be planned and formulated. The methodology used in this research work conforms to the Cross Industry Standard Process for Data Mining (CRISP-DM) model.

Multi-period line planning for varying railway passenger demand with asymmetric lines

Authors:Renate J. H. van der Knaap, Niels van Oort, Menno de Bruyn, Rob M. P. Goverde

Date:2025-09-02 07:46:00

A line plan is an important aspect of the quality of the service provided to railway passengers. Although it is well-known that railway demand is varying throughout the day in volume and structure, the line plan is often still fixed throughout the day. To better match this varying railway demand, we propose a mixed-integer linear programming model for multi-period line planning. This model for railway networks incorporates selection of routes, stopping patterns, frequencies, transfers, and the possibility of asymmetric lines to deal with spatially unbalanced demand. The Epsilon-constraint method is used to determine Pareto optimal solutions. The proposed model and solution method are tested on a case study of part of the Dutch railway network. The results show that allowing for changes to the line plan during the day can reduce the total generalised journey time by up to 4.26%, especially when asymmetric lines are used.

AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

Authors:Zhenlong Yuan, Jing Tang, Jinguo Luo, Rui Chen, Chengxuan Qian, Lei Sun, Xiangxiang Chu, Yujun Cai, Dapeng Zhang, Shuo Li

Date:2025-09-02 04:32:24

Vision-Language-Action (VLA) models in autonomous driving systems have recently demonstrated transformative potential by integrating multimodal perception with decision-making capabilities. However, the interpretability and coherence of the decision process and the plausibility of action sequences remain largely underexplored. To address these issues, we propose AutoDrive-R$^2$, a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems through chain-of-thought (CoT) processing and reinforcement learning (RL). Specifically, we first propose an innovative CoT dataset named nuScenesR$^2$-6K for supervised fine-tuning, which effectively builds cognitive bridges between input information and output trajectories through a four-step logical chain with self-reflection for validation. Moreover, to maximize both reasoning and self-reflection during the RL stage, we further employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework that incorporates spatial alignment, vehicle dynamic, and temporal smoothness criteria to ensure reliable and realistic trajectory planning. Extensive evaluation results across both nuScenes and Waymo datasets demonstrates the state-of-the-art performance and robust generalization capacity of our proposed method.

Dynamic Speculative Agent Planning

Authors:Yilin Guan, Wenyue Hua, Qingfeng Lan, Sun Fei, Dujian Ding, Devang Acharya, Chi Wang, William Yang Wang

Date:2025-09-02 03:34:36

Despite their remarkable success in complex tasks propelling widespread adoption, large language-model-based agents still face critical deployment challenges due to prohibitive latency and inference costs. While recent work has explored various methods to accelerate inference, existing approaches suffer from significant limitations: they either fail to preserve performance fidelity, require extensive offline training of router modules, or incur excessive operational costs. Moreover, they provide minimal user control over the tradeoff between acceleration and other performance metrics. To address these gaps, we introduce Dynamic Speculative Planning (DSP), an asynchronous online reinforcement learning framework that provides lossless acceleration with substantially reduced costs without requiring additional pre-deployment preparation. DSP explicitly optimizes a joint objective balancing end-to-end latency against dollar cost, allowing practitioners to adjust a single parameter that steers the system toward faster responses, cheaper operation, or any point along this continuum. Experiments on two standard agent benchmarks demonstrate that DSP achieves comparable efficiency to the fastest lossless acceleration method while reducing total cost by 30% and unnecessary cost up to 60%. Our code and data are available through https://github.com/guanyilin428/Dynamic-Speculative-Planning.