planning - 2025-07-29

Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision

Authors:Xiao Fang, Minhyek Jeon, Zheyang Qin, Stanislav Panev, Celso de Melo, Shuowen Hu, Shayok Chakraborty, Fernando De la Torre
Date:2025-07-28 16:38:06

Detecting vehicles in aerial imagery is a critical task with applications in traffic monitoring, urban planning, and defense intelligence. Deep learning methods have provided state-of-the-art (SOTA) results for this application. However, a significant challenge arises when models trained on data from one geographic region fail to generalize effectively to other areas. Variability in factors such as environmental conditions, urban layouts, road networks, vehicle types, and image acquisition parameters (e.g., resolution, lighting, and angle) leads to domain shifts that degrade model performance. This paper proposes a novel method that uses generative AI to synthesize high-quality aerial images and their labels, improving detector training through data augmentation. Our key contribution is the development of a multi-stage, multi-modal knowledge transfer framework utilizing fine-tuned latent diffusion models (LDMs) to mitigate the distribution gap between the source and target environments. Extensive experiments across diverse aerial imagery domains show consistent performance improvements in AP50 over supervised learning on source domain data, weakly supervised adaptation methods, unsupervised domain adaptation methods, and open-set object detectors by 4-23%, 6-10%, 7-40%, and more than 50%, respectively. Furthermore, we introduce two newly annotated aerial datasets from New Zealand and Utah to support further research in this field. Project page is available at: https://humansensinglab.github.io/AGenDA

On the Limits of Hierarchically Embedded Logic in Classical Neural Networks

Authors:Bill Cochran
Date:2025-07-28 16:13:41

We propose a formal model of reasoning limitations in large neural net models for language, grounded in the depth of their neural architecture. By treating neural networks as linear operators over logic predicate space we show that each layer can encode at most one additional level of logical reasoning. We prove that a neural network of depth a particular depth cannot faithfully represent predicates in a one higher order logic, such as simple counting over complex predicates, implying a strict upper bound on logical expressiveness. This structure induces a nontrivial null space during tokenization and embedding, excluding higher-order predicates from representability. Our framework offers a natural explanation for phenomena such as hallucination, repetition, and limited planning, while also providing a foundation for understanding how approximations to higher-order logic may emerge. These results motivate architectural extensions and interpretability strategies in future development of language models.

Benamou-Brenier and Kantorovich are equivalent on sub-Riemannian manifolds with no abnormal geodesics

Authors:Giovanna Citti, Mattia Galeotti, Andrea Pinamonti
Date:2025-07-28 16:10:33

We prove that the Benamou-Brenier formulation of the Optimal Transport problem and the Kantorovich formulation are equivalent on a sub-Riemannian connected and complete manifold $M$ without boundary and with no abnormal geodesics, when the problems are considered between two measures of compact supports. Furthermore, we prove the existence of a minimizer for the Benamou-Brenier formulation and link it to the optimal transport plan.

Partially Observable Monte-Carlo Graph Search

Authors:Yang You, Vincent Thomas, Alex Schutz, Robert Skilton, Nick Hawes, Olivier Buffet
Date:2025-07-28 16:02:36

Currently, large partially observable Markov decision processes (POMDPs) are often solved by sampling-based online methods which interleave planning and execution phases. However, a pre-computed offline policy is more desirable in POMDP applications with time or energy constraints. But previous offline algorithms are not able to scale up to large POMDPs. In this article, we propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline. Different from many online POMDP methods, which progressively develop a tree while performing (Monte-Carlo) simulations, POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced, and users can analyze and validate the policy prior to embedding and executing it. Moreover, POMCGS, together with action progressive widening and observation clustering methods provided in this article, is able to address certain continuous POMDPs. Through experiments, we demonstrate that POMCGS can generate policies on the most challenging POMDPs, which cannot be computed by previous offline algorithms, and these policies' values are competitive compared with the state-of-the-art online POMDP algorithms.

SCORPION: Addressing Scanner-Induced Variability in Histopathology

Authors:Jeongun Ryu, Heon Song, Seungeun Lee, Soo Ick Cho, Jiwon Shin, Kyunghyun Paeng, Sérgio Pereira
Date:2025-07-28 15:00:49

Ensuring reliable model performance across diverse domains is a critical challenge in computational pathology. A particular source of variability in Whole-Slide Images is introduced by differences in digital scanners, thus calling for better scanner generalization. This is critical for the real-world adoption of computational pathology, where the scanning devices may differ per institution or hospital, and the model should not be dependent on scanner-induced details, which can ultimately affect the patient's diagnosis and treatment planning. However, past efforts have primarily focused on standard domain generalization settings, evaluating on unseen scanners during training, without directly evaluating consistency across scanners for the same tissue. To overcome this limitation, we introduce SCORPION, a new dataset explicitly designed to evaluate model reliability under scanner variability. SCORPION includes 480 tissue samples, each scanned with 5 scanners, yielding 2,400 spatially aligned patches. This scanner-paired design allows for the isolation of scanner-induced variability, enabling a rigorous evaluation of model consistency while controlling for differences in tissue composition. Furthermore, we propose SimCons, a flexible framework that combines augmentation-based domain generalization techniques with a consistency loss to explicitly address scanner generalization. We empirically show that SimCons improves model consistency on varying scanners without compromising task-specific performance. By releasing the SCORPION dataset and proposing SimCons, we provide the research community with a crucial resource for evaluating and improving model consistency across diverse scanners, setting a new standard for reliability testing.

PixelNav: Towards Model-based Vision-Only Navigation with Topological Graphs

Authors:Sergey Bakulin, Timur Akhtyamov, Denis Fatykhov, German Devchich, Gonzalo Ferrer
Date:2025-07-28 14:44:36

This work proposes a novel hybrid approach for vision-only navigation of mobile robots, which combines advances of both deep learning approaches and classical model-based planning algorithms. Today, purely data-driven end-to-end models are dominant solutions to this problem. Despite advantages such as flexibility and adaptability, the requirement of a large amount of training data and limited interpretability are the main bottlenecks for their practical applications. To address these limitations, we propose a hierarchical system that utilizes recent advances in model predictive control, traversability estimation, visual place recognition, and pose estimation, employing topological graphs as a representation of the target environment. Using such a combination, we provide a scalable system with a higher level of interpretability compared to end-to-end approaches. Extensive real-world experiments show the efficiency of the proposed method.

Uncertainty-aware Planning with Inaccurate Models for Robotized Liquid Handling

Authors:Marco Faroni, Carlo Odesco, Andrea Zanchettin, Paolo Rocco
Date:2025-07-28 14:15:10

Physics-based simulations and learning-based models are vital for complex robotics tasks like deformable object manipulation and liquid handling. However, these models often struggle with accuracy due to epistemic uncertainty or the sim-to-real gap. For instance, accurately pouring liquid from one container to another poses challenges, particularly when models are trained on limited demonstrations and may perform poorly in novel situations. This paper proposes an uncertainty-aware Monte Carlo Tree Search (MCTS) algorithm designed to mitigate these inaccuracies. By incorporating estimates of model uncertainty, the proposed MCTS strategy biases the search towards actions with lower predicted uncertainty. This approach enhances the reliability of planning under uncertain conditions. Applied to a liquid pouring task, our method demonstrates improved success rates even with models trained on minimal data, outperforming traditional methods and showcasing its potential for robust decision-making in robotics.

Free Energy-Inspired Cognitive Risk Integration for AV Navigation in Pedestrian-Rich Environments

Authors:Meiting Dang, Yanping Wu, Yafei Wang, Dezong Zhao, David Flynn, Chongfeng Wei
Date:2025-07-28 14:02:00

Recent advances in autonomous vehicle (AV) behavior planning have shown impressive social interaction capabilities when interacting with other road users. However, achieving human-like prediction and decision-making in interactions with vulnerable road users remains a key challenge in complex multi-agent interactive environments. Existing research focuses primarily on crowd navigation for small mobile robots, which cannot be directly applied to AVs due to inherent differences in their decision-making strategies and dynamic boundaries. Moreover, pedestrians in these multi-agent simulations follow fixed behavior patterns that cannot dynamically respond to AV actions. To overcome these limitations, this paper proposes a novel framework for modeling interactions between the AV and multiple pedestrians. In this framework, a cognitive process modeling approach inspired by the Free Energy Principle is integrated into both the AV and pedestrian models to simulate more realistic interaction dynamics. Specifically, the proposed pedestrian Cognitive-Risk Social Force Model adjusts goal-directed and repulsive forces using a fused measure of cognitive uncertainty and physical risk to produce human-like trajectories. Meanwhile, the AV leverages this fused risk to construct a dynamic, risk-aware adjacency matrix for a Graph Convolutional Network within a Soft Actor-Critic architecture, allowing it to make more reasonable and informed decisions. Simulation results indicate that our proposed framework effectively improves safety, efficiency, and smoothness of AV navigation compared to the state-of-the-art method.

Route Optimization Over Scheduled Services For Large-Scale Package Delivery Networks

Authors:Mohammed Faisal Ahmed, Pascal Van Hentenryck, Ahmed El Nashar
Date:2025-07-28 13:53:17

This paper introduces the Trailer Path Optimization with Schedule Services Problem (TPOSSP) and proposes a column-generation heuristic (CG-heuristic) to find high-quality solutions to large-scale instances. The TPOSSP aims at determining trailer routes over a time-dependent network using existing scheduled services, while considering tractor capacity constraints and time windows for trailer pickups and deliveries. The objective is to minimize both the number of schedules used and the total miles traveled. To address the large scale of industrial instances, the paper proposes a network reduction technique that identifies the set of feasible schedule-legs for each requests. Moreover, to address the resulting MIP models, that still contains hundred of millions variables, the paper proposes a stabilized column-generation, whose pricing problem is a time-dependent shortest path. The approach is evaluated on industrial instances both for tactical planning where requests for the entire network are re-optimized and for real-time operations where new requests are inserted. In the tactical planning setting, the column-generation heuristic returns solutions with a 3.7%-5.7% optimality gap (based on a MIP relaxation) in under 1.7-10.3 hours, and improves the current practice by 2.3-3.2%, with translates into savings of tens of millions of dollars a year. In the real-time setting, the column-generation heuristic returns solution within 3% of optimality in under 1 minute, which makes it adequate for real-time deployment. The results also show that the network reduction decreases run times by 85% for the column-generation heuristic.

Hanging Around: Cognitive Inspired Reasoning for Reactive Robotics

Authors:Mihai Pomarlan, Stefano De Giorgis, Rachel Ringe, Maria M. Hedblom, Nikolaos Tsiogkas
Date:2025-07-28 13:39:03

Situationally-aware artificial agents operating with competence in natural environments face several challenges: spatial awareness, object affordance detection, dynamic changes and unpredictability. A critical challenge is the agent's ability to identify and monitor environmental elements pertinent to its objectives. Our research introduces a neurosymbolic modular architecture for reactive robotics. Our system combines a neural component performing object recognition over the environment and image processing techniques such as optical flow, with symbolic representation and reasoning. The reasoning system is grounded in the embodied cognition paradigm, via integrating image schematic knowledge in an ontological structure. The ontology is operatively used to create queries for the perception system, decide on actions, and infer entities' capabilities derived from perceptual data. The combination of reasoning and image processing allows the agent to focus its perception for normal operation as well as discover new concepts for parts of objects involved in particular interactions. The discovered concepts allow the robot to autonomously acquire training data and adjust its subsymbolic perception to recognize the parts, as well as making planning for more complex tasks feasible by focusing search on those relevant object parts. We demonstrate our approach in a simulated world, in which an agent learns to recognize parts of objects involved in support relations. While the agent has no concept of handle initially, by observing examples of supported objects hanging from a hook it learns to recognize the parts involved in establishing support and becomes able to plan the establishment/destruction of the support relation. This underscores the agent's capability to expand its knowledge through observation in a systematic way, and illustrates the potential of combining deep reasoning [...].

SCANet: Split Coordinate Attention Network for Building Footprint Extraction

Authors:Chunshi Wang, Bin Zhao, Shuxue Ding
Date:2025-07-28 13:21:08

Building footprint extraction holds immense significance in remote sensing image analysis and has great value in urban planning, land use, environmental protection and disaster assessment. Despite the progress made by conventional and deep learning approaches in this field, they continue to encounter significant challenges. This paper introduces a novel plug-and-play attention module, Split Coordinate Attention (SCA), which ingeniously captures spatially remote interactions by employing two spatial range of pooling kernels, strategically encoding each channel along x and y planes, and separately performs a series of split operations for each feature group, thus enabling more efficient semantic feature extraction. By inserting into a 2D CNN to form an effective SCANet, our SCANet outperforms recent SOTA methods on the public Wuhan University (WHU) Building Dataset and Massachusetts Building Dataset in terms of various metrics. Particularly SCANet achieves the best IoU, 91.61% and 75.49% for the two datasets. Our code is available at https://github.com/AiEson/SCANet

A General Framework for Dynamic MAPF using Multi-Shot ASP and Tunnels

Authors:Aysu Bogatarkan, Esra Erdem
Date:2025-07-28 10:55:31

MAPF problem aims to find plans for multiple agents in an environment within a given time, such that the agents do not collide with each other or obstacles. Motivated by the execution and monitoring of these plans, we study Dynamic MAPF (D-MAPF) problem, which allows changes such as agents entering/leaving the environment or obstacles being removed/moved. Considering the requirements of real-world applications in warehouses with the presence of humans, we introduce 1) a general definition for D-MAPF (applicable to variations of D-MAPF), 2) a new framework to solve D-MAPF (utilizing multi-shot computation, and allowing different methods to solve D-MAPF), and 3) a new ASP-based method to solve D-MAPF (combining advantages of replanning and repairing methods, with a novel concept of tunnels to specify where agents can move). We have illustrated the strengths and weaknesses of this method by experimental evaluations, from the perspectives of computational performance and quality of solutions.

Energy recovery from Ginkgo biloba urban pruning wastes: pyrolysis optimization and fuel property enhancement for high grade charcoal productions

Authors:Padam Prasad Paudel, Sunyong Park, Kwang Cheol Oh, Seok Jun Kim, Seon Yeop Kim, Kyeong Sik Kang, Dae Hyun Kim
Date:2025-07-28 10:07:06

Ginkgo biloba trees are widely planted in urban areas of developed countries for their resilience, longevity and aesthetic appeal. Annual pruning to control tree size, shape and interference with traffic and pedestrians generates large volumes of unutilized Ginkgo biomass. This study aimed to valorize these pruning residues into charcoal by optimizing pyrolysis conditions and evaluating its fuel properties. The pyrolysis experiment was conducted at 400 to 600 degrees Celsius, after oven drying pretreatment. The mass yield of charcoal was found to vary from 27.33 to 32.05 percent and the approximate volume shrinkage was found to be 41.19 to 49.97 percent. The fuel properties of the charcoals were evaluated using the moisture absorption test, proximate and ultimate analysis, thermogravimetry, calorimetry and inductively coupled plasma optical emission spectrometry. The calorific value improved from 20.76 to 34.26 MJ per kg with energy yield up to 46.75 percent. Charcoal exhibited superior thermal stability and better combustion performance. The results revealed satisfactory properties compared with other biomass, coal and biochar standards. The product complied with first grade standards at 550 and 600 degrees Celsius and second grade wood charcoal standards at other temperatures. However, higher concentrations of some heavy metals like Zn indicate the need for pretreatment and further research on copyrolysis for resource optimization. This study highlights the dual benefits of waste management and renewable energy, providing insights for urban planning and policymaking.

FMimic: Foundation Models are Fine-grained Action Learners from Human Videos

Authors:Guangyan Chen, Meiling Wang, Te Cui, Yao Mu, Haoyang Lu, Zicai Peng, Mengxiao Hu, Tianxing Zhou, Mengyin Fu, Yi Yang, Yufeng Yue
Date:2025-07-28 08:36:01

Visual imitation learning (VIL) provides an efficient and intuitive strategy for robotic systems to acquire novel skills. Recent advancements in foundation models, particularly Vision Language Models (VLMs), have demonstrated remarkable capabilities in visual and linguistic reasoning for VIL tasks. Despite this progress, existing approaches primarily utilize these models for learning high-level plans from human demonstrations, relying on pre-defined motion primitives for executing physical interactions, which remains a major bottleneck for robotic systems. In this work, we present FMimic, a novel paradigm that harnesses foundation models to directly learn generalizable skills at even fine-grained action levels, using only a limited number of human videos. Extensive experiments demonstrate that our FMimic delivers strong performance with a single human video, and significantly outperforms all other methods with five videos. Furthermore, our method exhibits significant improvements of over 39% and 29% in RLBench multi-task experiments and real-world manipulation tasks, respectively, and exceeds baselines by more than 34% in high-precision tasks and 47% in long-horizon tasks.

Intention-Driven Generation of Project-Specific Test Cases

Authors:Binhang Qi, Yun Lin, Xinyi Weng, Yuhuan Huang, Chenyan Liu, Hailong Sun, Jin Song Dong
Date:2025-07-28 08:35:04

Test cases are valuable assets for maintaining software quality. While numerous automated techniques have been proposed for generating tests (either by maximizing code coverage or by translating focal code into test code), practical tests are seldom driven by coverage alone. In real projects, each test reflects a developer's validation intention for a specific behaviour and embodies rich, project-specific knowledge: which specific APIs to call and what assertions truly matter. Without considering such knowledge, tests can hardly pass code review and be integrated into the software product. In this work, we propose IntentionTest, which generates project-specific tests with validation intention as a structured description. Our design is motivated by two insights: (1) a description of validation intention, compared to coverage and focal code, carries more crucial information about what to test; and (2) practical tests exhibit high code duplication, indicating that domain knowledge is highly reusable for writing new tests. Given a focal code and a description of validation intention (in the form of either an informal comment or a formal test plan), IntentionTest retrieves a referable test in the project to guide test generation. Moreover, IntentionTest reduces the test generation problem into an editing problem on the test code regarding the validation intention. It generates a test including both test prefix and oracle, which aims to be executable and semantically correct. We evaluate IntentionTest against state-of-the-art baselines on 4,146 test cases from 13 open-source projects. Specifically, compared to ChatTester, IntentionTest can (1) generate significantly more semantically correct tests, improving common mutation scores by 39.03% and coverage overlap with ground-truth tests by 40.14%; (2) generate 21.30% more successful passing tests.

Model-Structured Neural Networks to Control the Steering Dynamics of Autonomous Race Cars

Authors:Mattia Piccinini, Aniello Mungiello, Georg Jank, Gastone Pietro Rosati Papini, Francesco Biral, Johannes Betz
Date:2025-07-27 21:56:01

Autonomous racing has gained increasing attention in recent years, as a safe environment to accelerate the development of motion planning and control methods for autonomous driving. Deep learning models, predominantly based on neural networks (NNs), have demonstrated significant potential in modeling the vehicle dynamics and in performing various tasks in autonomous driving. However, their black-box nature is critical in the context of autonomous racing, where safety and robustness demand a thorough understanding of the decision-making algorithms. To address this challenge, this paper proposes MS-NN-steer, a new Model-Structured Neural Network for vehicle steering control, integrating the prior knowledge of the nonlinear vehicle dynamics into the neural architecture. The proposed controller is validated using real-world data from the Abu Dhabi Autonomous Racing League (A2RL) competition, with full-scale autonomous race cars. In comparison with general-purpose NNs, MS-NN-steer is shown to achieve better accuracy and generalization with small training datasets, while being less sensitive to the weights' initialization. Also, MS-NN-steer outperforms the steering controller used by the A2RL winning team. Our implementation is available open-source in a GitHub repository.

Joint Fiber and Free Space Optical Infrastructure Planning for Hybrid Integrated Access and Backhaul Networks

Authors:Charitha Madapatha, Piotr Lechowicz, Carlos Natalino, Paolo Monti, Tommy Svensson
Date:2025-07-27 17:51:25

Integrated access and backhaul (IAB) is one of the promising techniques for 5G networks and beyond (6G), in which the same node/hardware is used to provide both backhaul and cellular services in a multi-hop architecture. Due to the sensitivity of the backhaul links with high rate/reliability demands, proper network planning is needed to ensure the IAB network performs with the desired performance levels. In this paper, we study the effect of infrastructure planning and optimization on the coverage of IAB networks. We concentrate on the cases where the fiber connectivity to the nodes is constrained due to cost. Thereby, we study the performance gains and energy efficiency in the presence of free-space optical (FSO) communication links. Our results indicate hybrid fiber/FSO deployments offer substantial cost savings compared to fully fibered networks, suggesting a beneficial trade-off for strategic link deployment while improving the service coverage probability. As we show, with proper network planning, the service coverage, energy efficiency, and cost efficiency can be improved.

Strategic Motivators for Ethical AI System Development: An Empirical and Holistic Model

Authors:Muhammad Azeem Akbar, Arif Ali Khan, Saima Rafi, Damian Kedziora, Sami Hyrynsalmi
Date:2025-07-27 10:49:05

Artificial Intelligence (AI) presents transformative opportunities for industries and society, but its responsible development is essential to prevent unintended consequences. Ethically sound AI systems demand strategic planning, strong governance, and an understanding of the key drivers that promote responsible practices. This study aims to identify and prioritize the motivators that drive the ethical development of AI systems. A Multivocal Literature Review (MLR) and a questionnaire-based survey were conducted to capture current practices in ethical AI. We applied Interpretive Structure Modeling (ISM) to explore the relationships between motivator categories, followed by MICMAC analysis to classify them by their driving and dependence power. Fuzzy TOPSIS was used to rank these motivators by importance. Twenty key motivators were identified and grouped into eight categories: Human Resource, Knowledge Integration, Coordination, Project Administration, Standards, Technology Factor, Stakeholders, and Strategy & Matrices. ISM results showed that 'Human Resource' and 'Coordination' heavily influence other factors. MICMAC analysis placed categories like Human Resource (CA1), Coordination (CA3), Stakeholders (CA7), and Strategy & Matrices (CA8) in the independent cluster, indicating high driving but low dependence power. Fuzzy TOPSIS ranked motivators such as promoting team diversity, establishing AI governance bodies, appointing oversight leaders, and ensuring data privacy as most critical. To support ethical AI adoption, organizations should align their strategies with these motivators and integrate them into their policies, governance models, and development frameworks.

Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots

Authors:Wei Cui, Haoyu Wang, Wenkang Qin, Yijie Guo, Gang Han, Wen Zhao, Jiahang Cao, Zhang Zhang, Jiaru Zhong, Jingkai Sun, Pihai Sun, Shuai Shi, Botuo Jiang, Jiahao Ma, Jiaxu Wang, Hao Cheng, Zhichao Liu, Yang Wang, Zheng Zhu, Guan Huang, Jian Tang, Qiang Zhang
Date:2025-07-27 10:47:00

Humanoid robot technology is advancing rapidly, with manufacturers introducing diverse heterogeneous visual perception modules tailored to specific scenarios. Among various perception paradigms, occupancy-based representation has become widely recognized as particularly suitable for humanoid robots, as it provides both rich semantic and 3D geometric information essential for comprehensive environmental understanding. In this work, we present Humanoid Occupancy, a generalized multimodal occupancy perception system that integrates hardware and software components, data acquisition devices, and a dedicated annotation pipeline. Our framework employs advanced multi-modal fusion techniques to generate grid-based occupancy outputs encoding both occupancy status and semantic labels, thereby enabling holistic environmental understanding for downstream tasks such as task planning and navigation. To address the unique challenges of humanoid robots, we overcome issues such as kinematic interference and occlusion, and establish an effective sensor layout strategy. Furthermore, we have developed the first panoramic occupancy dataset specifically for humanoid robots, offering a valuable benchmark and resource for future research and development in this domain. The network architecture incorporates multi-modal feature fusion and temporal information integration to ensure robust perception. Overall, Humanoid Occupancy delivers effective environmental perception for humanoid robots and establishes a technical foundation for standardizing universal visual modules, paving the way for the widespread deployment of humanoid robots in complex real-world scenarios.

Dual-Stream Global-Local Feature Collaborative Representation Network for Scene Classification of Mining Area

Authors:Shuqi Fan, Haoyi Wang, Xianju Li
Date:2025-07-27 10:45:58

Scene classification of mining areas provides accurate foundational data for geological environment monitoring and resource development planning. This study fuses multi-source data to construct a multi-modal mine land cover scene classification dataset. A significant challenge in mining area classification lies in the complex spatial layout and multi-scale characteristics. By extracting global and local features, it becomes possible to comprehensively reflect the spatial distribution, thereby enabling a more accurate capture of the holistic characteristics of mining scenes. We propose a dual-branch fusion model utilizing collaborative representation to decompose global features into a set of key semantic vectors. This model comprises three key components:(1) Multi-scale Global Transformer Branch: It leverages adjacent large-scale features to generate global channel attention features for small-scale features, effectively capturing the multi-scale feature relationships. (2) Local Enhancement Collaborative Representation Branch: It refines the attention weights by leveraging local features and reconstructed key semantic sets, ensuring that the local context and detailed characteristics of the mining area are effectively integrated. This enhances the model's sensitivity to fine-grained spatial variations. (3) Dual-Branch Deep Feature Fusion Module: It fuses the complementary features of the two branches to incorporate more scene information. This fusion strengthens the model's ability to distinguish and classify complex mining landscapes. Finally, this study employs multi-loss computation to ensure a balanced integration of the modules. The overall accuracy of this model is 83.63%, which outperforms other comparative models. Additionally, it achieves the best performance across all other evaluation metrics.

RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters

Authors:Xiaolin Liu, Tianyi Zhou, Hongbo Kang, Jian Ma, Ziwen Wang, Jing Huang, Wenguo Weng, Yu-Kun Lai, Kun Li
Date:2025-07-27 03:50:18

Crowd evacuation simulation is critical for enhancing public safety, and demanded for realistic virtual environments. Current mainstream evacuation models overlook the complex human behaviors that occur during evacuation, such as pedestrian collisions, interpersonal interactions, and variations in behavior influenced by terrain types or individual body shapes. This results in the failure to accurately simulate the escape of people in the real world. In this paper, aligned with the sensory-decision-motor (SDM) flow of the human brain, we propose a real-time 3D crowd evacuation simulation framework that integrates a 3D-adaptive SFM (Social Force Model) Decision Mechanism and a Personalized Gait Control Motor. This framework allows multiple agents to move in parallel and is suitable for various scenarios, with dynamic crowd awareness. Additionally, we introduce Part-level Force Visualization to assist in evacuation analysis. Experimental results demonstrate that our framework supports dynamic trajectory planning and personalized behavior for each agent throughout the evacuation process, and is compatible with uneven terrain. Visually, our method generates evacuation results that are more realistic and plausible, providing enhanced insights for crowd simulation. The code is available at http://cic.tju.edu.cn/faculty/likun/projects/RESCUE.

Robot Excavation and Manipulation of Geometrically Cohesive Granular Media

Authors:Laura Treers, Daniel Soto, Joonha Hwang, Michael A. D. Goodisman, Daniel I. Goldman
Date:2025-07-26 16:34:30

Construction throughout history typically assumes that its blueprints and building blocks are pre-determined. However, recent work suggests that alternative approaches can enable new paradigms for structure formation. Aleatory architectures, or those which rely on the properties of their granular building blocks rather than pre-planned design or computation, have thus far relied on human intervention for their creation. We imagine that robotic swarms could be valuable to create such aleatory structures by manipulating and forming structures from entangled granular materials. To discover principles by which robotic systems can effectively manipulate soft matter, we develop a robophysical model for interaction with geometrically cohesive granular media composed of u-shape particles. This robotic platform uses environmental signals to autonomously coordinate excavation, transport, and deposition of material. We test the effect of substrate initial conditions by characterizing robot performance in two different material compaction states and observe as much as a 75% change in transported mass depending on initial substrate compressive loading. These discrepancies suggest the functional role that material properties such as packing and cohesion/entanglement play in excavation and construction. To better understand these material properties, we develop an apparatus for tensile testing of the geometrically cohesive substrates, which reveals how entangled material strength responds strongly to initial compressive loading. These results explain the variation observed in robotic performance and point to future directions for better understanding robotic interaction mechanics with entangled materials.

CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints

Authors:Yuhong Deng, Chao Tang, Cunjun Yu, Linfeng Li, David Hsu
Date:2025-07-26 15:43:25

Clothes manipulation, such as folding or hanging, is a critical capability for home service robots. Despite recent advances, most existing methods remain limited to specific tasks and clothes types, due to the complex, high-dimensional geometry of clothes. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over different clothes types, T-shirts, shorts, skirts, long dresses, ... , as well as different tasks, folding, flattening, hanging, ... . The core idea of CLASP is semantic keypoints -- e.g., ''left sleeve'', ''right shoulder'', etc. -- a sparse spatial-semantic representation that is salient for both perception and action. Semantic keypoints of clothes can be reliably extracted from RGB-D images and provide an effective intermediate representation of clothes manipulation policies. CLASP uses semantic keypoints to bridge high-level task planning and low-level action execution. At the high level, it exploits vision language models (VLMs) to predict task plans over the semantic keypoints. At the low level, it executes the plans with the help of a simple pre-built manipulation skill library. Extensive simulation experiments show that CLASP outperforms state-of-the-art baseline methods on multiple tasks across diverse clothes types, demonstrating strong performance and generalization. Further experiments with a Franka dual-arm system on four distinct tasks -- folding, flattening, hanging, and placing -- confirm CLASP's performance on a real robot.

Predicting Locations of Cell Towers for Network Capacity Expansion

Authors:Sowmiyan Morri, Joy Bose, L Raghunatha Reddy, Sai Hareesh Anamandra
Date:2025-07-26 12:04:54

Network capacity expansion is a critical challenge for telecom operators, requiring strategic placement of new cell sites to ensure optimal coverage and performance. Traditional approaches, such as manual drive tests and static optimization, often fail to consider key real-world factors including user density, terrain features, and financial constraints. In this paper, we propose a machine learning-based framework that combines deep neural networks for signal coverage prediction with spatial clustering to recommend new tower locations in underserved areas. The system integrates geospatial, demographic, and infrastructural data, and incorporates budget-aware constraints to prioritize deployments. Operating within an iterative planning loop, the framework refines coverage estimates after each proposed installation, enabling adaptive and cost-effective expansion. While full-scale simulation was limited by data availability, the architecture is modular, robust to missing inputs, and generalizable across diverse deployment scenarios. This approach advances radio network planning by offering a scalable, data-driven alternative to manual methods.

From South to North: Leveraging Ground-Based LATs for Full-Sky CMB Delensing and Constraints on $r$

Authors:Wen-Zheng Chen, Yang Liu, Yi-Ming Wang, Hong Li
Date:2025-07-26 09:52:56

Delensing--the process of mitigating the lensing-induced B-mode contamination in cosmic microwave background (CMB) observations--will be a pivotal challenge for next-generation CMB experiments seeking to detect primordial gravitational waves (PGWs) through B-mode polarization. This process requires an accurate lensing tracer, which can be obtained either through internal reconstruction from high-resolution CMB observations or from external large-scale structure (LSS) surveys. Ground-based large-aperture telescopes (LATs) are crucial for internal reconstruction, yet existing and planned facilities are confined to the southern hemisphere, limiting effective delensing to that region. In this work, we assess the impact of introducing a northern hemisphere LAT, assumed to be situated near AliCPT (hence termed Ali-like LAT, or LATN), on delensing performance and PGW detection, using simulations. Our baseline setup includes a space-based small-aperture mission (LiteBIRD-like, SAT) and a southern LAT (SO-like, LATS). External LSS tracers, which have been shown to play an important role in delensing before the availability of ultra-sensitive polarization data, are also considered. We find that southern-hemisphere internal delensing reduces the uncertainty in r by approximately 21% compared to the no-delensing case. Adding LATN enables full-sky internal delensing, achieving a further ~19% reduction--comparable to that from including LSS tracers (~17%). Once LATN is included, the marginal benefit of LSS tracers drops to ~10%. These results highlight the significant role of LATN in advancing delensing capabilities and improving PGW constraints.

Homotopy-aware Multi-agent Navigation via Distributed Model Predictive Control

Authors:Haoze Dong, Meng Guo, Chengyi He, Zhongkui Li
Date:2025-07-26 08:26:36

Multi-agent trajectory planning requires ensuring both safety and efficiency, yet deadlocks remain a significant challenge, especially in obstacle-dense environments. Such deadlocks frequently occur when multiple agents attempt to traverse the same long and narrow corridor simultaneously. To address this, we propose a novel distributed trajectory planning framework that bridges the gap between global path and local trajectory cooperation. At the global level, a homotopy-aware optimal path planning algorithm is proposed, which fully leverages the topological structure of the environment. A reference path is chosen from distinct homotopy classes by considering both its spatial and temporal properties, leading to improved coordination among agents globally. At the local level, a model predictive control-based trajectory optimization method is used to generate dynamically feasible and collision-free trajectories. Additionally, an online replanning strategy ensures its adaptability to dynamic environments. Simulations and experiments validate the effectiveness of our approach in mitigating deadlocks. Ablation studies demonstrate that by incorporating time-aware homotopic properties into the underlying global paths, our method can significantly reduce deadlocks and improve the average success rate from 4%-13% to over 90% in randomly generated dense scenarios.

FM-LC: A Hierarchical Framework for Urban Flood Mapping by Land Cover Identification Models

Authors:Xin Hong, Longchao Da, Hua Wei
Date:2025-07-26 06:25:53

Urban flooding in arid regions poses severe risks to infrastructure and communities. Accurate, fine-scale mapping of flood extents and recovery trajectories is therefore essential for improving emergency response and resilience planning. However, arid environments often exhibit limited spectral contrast between water and adjacent surfaces, rapid hydrological dynamics, and highly heterogeneous urban land covers, which challenge traditional flood-mapping approaches. High-resolution, daily PlanetScope imagery provides the temporal and spatial detail needed. In this work, we introduce FM-LC, a hierarchical framework for Flood Mapping by Land Cover identification, for this challenging task. Through a three-stage process, it first uses an initial multi-class U-Net to segment imagery into water, vegetation, built area, and bare ground classes. We identify that this method has confusion between spectrally similar categories (e.g., water vs. vegetation). Second, by early checking, the class with the major misclassified area is flagged, and a lightweight binary expert segmentation model is trained to distinguish the flagged class from the rest. Third, a Bayesian smoothing step refines boundaries and removes spurious noise by leveraging nearby pixel information. We validate the framework on the April 2024 Dubai storm event, using pre- and post-rainfall PlanetScope composites. Experimental results demonstrate average F1-score improvements of up to 29% across all land-cover classes and notably sharper flood delineations, significantly outperforming conventional single-stage U-Net baselines.

Computing optimal policies for managing inventories with noisy observations

Authors:Eugene Feinberg, Jefferson Huang, Pavlo Kasyanov, Thomas O'Neill
Date:2025-07-26 03:39:17

This paper implements the Deep Deterministic Policy Gradient (DDPG) algorithm for computing optimal policies for partially observable single-product periodic review inventory control problems with setup costs and backorders. The decision maker does not know the exact inventory level, but can obtain noise-corrupted observations of them. The goal is to maximize the expected total discounted costs incurred over a finite planning horizon. We also investigate the Gaussian version of this problem with normally distributed initial inventories, demands, and observation noise. We show that expected posterior observations of inventory levels, also called mean beliefs, provide sufficient statistics for the Gaussian problem. Moreover, they can be represented in the form of a Markov Decision Processes for an inventory control system with time-dependent holding costs and demands. Thus, for a Gaussian problem, the there exist (s_t,S_t)-optimal policies based on mean beliefs, and this fact explains the structure of the approximately optimal policies computed by DDPG. For the Gaussian case, we also numerically compare the performance of policies derived from DDPG to optimal policies for discretized versions of the original continuous problem.

PhysVarMix: Physics-Informed Variational Mixture Model for Multi-Modal Trajectory Prediction

Authors:Haichuan Li, Tomi Westerlund
Date:2025-07-25 22:45:42

Accurate prediction of future agent trajectories is a critical challenge for ensuring safe and efficient autonomous navigation, particularly in complex urban environments characterized by multiple plausible future scenarios. In this paper, we present a novel hybrid approach that integrates learning-based with physics-based constraints to address the multi-modality inherent in trajectory prediction. Our method employs a variational Bayesian mixture model to effectively capture the diverse range of potential future behaviors, moving beyond traditional unimodal assumptions. Unlike prior approaches that predominantly treat trajectory prediction as a data-driven regression task, our framework incorporates physical realism through sector-specific boundary conditions and Model Predictive Control (MPC)-based smoothing. These constraints ensure that predicted trajectories are not only data-consistent but also physically plausible, adhering to kinematic and dynamic principles. Furthermore, our method produces interpretable and diverse trajectory predictions, enabling enhanced downstream decision-making and planning in autonomous driving systems. We evaluate our approach on two benchmark datasets, demonstrating superior performance compared to existing methods. Comprehensive ablation studies validate the contributions of each component and highlight their synergistic impact on prediction accuracy and reliability. By balancing data-driven insights with physics-informed constraints, our approach offers a robust and scalable solution for navigating the uncertainties of real-world urban environments.

NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology

Authors:Yazeed Alrubyli, Omar Alomeir, Abrar Wafa, Diána Hidvégi, Hend Alrasheed, Mohsen Bahrami
Date:2025-07-25 22:31:45

Understanding where people go after visiting one business is crucial for urban planning, retail analytics, and location-based services. However, predicting these co-visitation patterns across millions of venues remains challenging due to extreme data sparsity and the complex interplay between spatial proximity and business relationships. Traditional approaches using only geographic distance fail to capture why coffee shops attract different customer flows than fine dining restaurants, even when co-located. We introduce NAICS-aware GraphSAGE, a novel graph neural network that integrates business taxonomy knowledge through learnable embeddings to predict population-scale co-visitation patterns. Our key insight is that business semantics, captured through detailed industry codes, provide crucial signals that pure spatial models cannot explain. The approach scales to massive datasets (4.2 billion potential venue pairs) through efficient state-wise decomposition while combining spatial, temporal, and socioeconomic features in an end-to-end framework. Evaluated on our POI-Graph dataset comprising 94.9 million co-visitation records across 92,486 brands and 48 US states, our method achieves significant improvements over state-of-the-art baselines: the R-squared value increases from 0.243 to 0.625 (a 157 percent improvement), with strong gains in ranking quality (32 percent improvement in NDCG at 10).