planning - 2025-03-18

MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation

Authors:Zhenyu Wu, Yuheng Zhou, Xiuwei Xu, Ziwei Wang, Haibin Yan

Date:2025-03-17 17:59:52

Mobile manipulation is the fundamental challenge for robotics to assist humans with diverse tasks and environments in everyday life. However, conventional mobile manipulation approaches often struggle to generalize across different tasks and environments because of the lack of large-scale training. In contrast, recent advances in vision-language-action (VLA) models have shown impressive generalization capabilities, but these foundation models are developed for fixed-base manipulation tasks. Therefore, we propose an efficient policy adaptation framework named MoManipVLA to transfer pre-trained VLA models of fix-base manipulation to mobile manipulation, so that high generalization ability across tasks and environments can be achieved in mobile manipulation policy. Specifically, we utilize pre-trained VLA models to generate waypoints of the end-effector with high generalization ability. We design motion planning objectives for the mobile base and the robot arm, which aim at maximizing the physical feasibility of the trajectory. Finally, we present an efficient bi-level objective optimization framework for trajectory generation, where the upper-level optimization predicts waypoints for base movement to enhance the manipulator policy space, and the lower-level optimization selects the optimal end-effector trajectory to complete the manipulation task. In this way, MoManipVLA can adjust the position of the robot base in a zero-shot manner, thus making the waypoints predicted from the fixed-base VLA models feasible. Extensive experimental results on OVMM and the real world demonstrate that MoManipVLA achieves a 4.2% higher success rate than the state-of-the-art mobile manipulation, and only requires 50 training cost for real world deployment due to the strong generalization ability in the pre-trained VLA models.

TriDF: Triplane-Accelerated Density Fields for Few-Shot Remote Sensing Novel View Synthesis

Authors:Jiaming Kang, Keyan Chen, Zhengxia Zou, Zhenwei Shi

Date:2025-03-17 16:25:39

Remote sensing novel view synthesis (NVS) offers significant potential for 3D interpretation of remote sensing scenes, with important applications in urban planning and environmental monitoring. However, remote sensing scenes frequently lack sufficient multi-view images due to acquisition constraints. While existing NVS methods tend to overfit when processing limited input views, advanced few-shot NVS methods are computationally intensive and perform sub-optimally in remote sensing scenes. This paper presents TriDF, an efficient hybrid 3D representation for fast remote sensing NVS from as few as 3 input views. Our approach decouples color and volume density information, modeling them independently to reduce the computational burden on implicit radiance fields and accelerate reconstruction. We explore the potential of the triplane representation in few-shot NVS tasks by mapping high-frequency color information onto this compact structure, and the direct optimization of feature planes significantly speeds up convergence. Volume density is modeled as continuous density fields, incorporating reference features from neighboring views through image-based rendering to compensate for limited input data. Additionally, we introduce depth-guided optimization based on point clouds, which effectively mitigates the overfitting problem in few-shot NVS. Comprehensive experiments across multiple remote sensing scenes demonstrate that our hybrid representation achieves a 30x speed increase compared to NeRF-based methods, while simultaneously improving rendering quality metrics over advanced few-shot methods (7.4% increase in PSNR, 12.2% in SSIM, and 18.7% in LPIPS). The code is publicly available at https://github.com/kanehub/TriDF

Biodiversity conservation and strategies of public awareness, case study: The natural landscape of central Tunisia

Authors:Islem Saadaoui, Christopher Robin Bryant, Hichem Rejeb, Alexandru-Ionuţ Petrişor

Date:2025-03-17 14:49:23

This research examines global issues concerning the development of mountain areas considered as territories difficult to manage. The case study area is part of the sub-region of High Alpine Steppes belonging to the Tunisian Ridge and reaching Tebessa Mountains in Algeria. The central question of this article is based on the analysis of the links between the representations produced by mountain landscapes and the construction of a border line that must meet the requirements of sustainable development. Eco-landscape determinants and the role of public authorities and population must be better defined so that the products of this space provide a better quality of life endowed with the alternatives of local and sustainable development. Our hypothesis is that the mountain areas of West Central Tunisia still have a real ecological potential little disturbed by a chimerical development, and can constitute assets for the territorial development of the area. The approach adopted by this work is a scoping audit based on the floristic richness and the monitoring of its spatiotemporal dynamics. The results of this research allowed us to draw rich conclusions; the phyto-ecology approach has shown a relative floristic richness that remains highly dependent on the climatic cycles and intervention of human action; this area must be considered as a priority of the public planning policies aimed at improving the quality of lives in these fragile zones in the context of sustainable development.

Prioritized Planning for Continuous-time Lifelong Multi-agent Pathfinding

Authors:Alvin Combrink, Sabino Francesco Roselli, Martin Fabian

Date:2025-03-17 13:52:03

Multi-agent Path Finding (MAPF) is the problem of planning collision-free movements of agents such that they get from where they are to where they need to be. Commonly, agents are located on a graph and can traverse edges. This problem has many variations and has been studied for decades. Two such variations are the continuous-time and the lifelong MAPF problems. In the continuous-time MAPF problem, edges can have non-unit lengths and agents can traverse them at any real-valued time. Additionally, agent volumes are often included. In the lifelong MAPF problem, agents must attend to a continuous stream of incoming tasks. Much work has been devoted to designing solution methods within these two areas. However, to our knowledge, the combined problem of continuous-time lifelong MAPF has yet to be addressed. This work addresses continuous-time lifelong MAPF with agent volumes by presenting the fast and sub-optimal Continuous-time Prioritized Lifelong Planner (CPLP). CPLP continuously re-prioritizes tasks, assigns agents to them, and computes agent plans using a combination of two path planners; one based on CCBS and the other on SIPP. Experimental results with up to $400$ agents on graphs with $4000$ vertices demonstrate average computation times below $20$ ms per call. In online settings where available time to compute plans is limited, CPLP ensures collision-free movement even when failing to meet these time limits. Therefore, the robustness of CPLP highlights its potential for real-world applications.

HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning

Authors:Wensheng Wang, Ning Tan

Date:2025-03-17 13:49:43

The acquisition of large-scale and diverse demonstration data are essential for improving robotic imitation learning generalization. However, generating such data for complex manipulations is challenging in real-world settings. We introduce HybridGen, an automated framework that integrates Vision-Language Model (VLM) and hybrid planning. HybridGen uses a two-stage pipeline: first, VLM to parse expert demonstrations, decomposing tasks into expert-dependent (object-centric pose transformations for precise control) and plannable segments (synthesizing diverse trajectories via path planning); second, pose transformations substantially expand the first-stage data. Crucially, HybridGen generates a large volume of training data without requiring specific data formats, making it broadly applicable to a wide range of imitation learning algorithms, a characteristic which we also demonstrate empirically across multiple algorithms. Evaluations across seven tasks and their variants demonstrate that agents trained with HybridGen achieve substantial performance and generalization gains, averaging a 5% improvement over state-of-the-art methods. Notably, in the most challenging task variants, HybridGen achieves significant improvement, reaching a 59.7% average success rate, significantly outperforming Mimicgen's 49.5%. These results demonstrating its effectiveness and practicality.

MIXPINN: Mixed-Material Simulations by Physics-Informed Neural Network

Authors:Xintian Yuan, Yunke Ao, Boqi Chen, Philipp Fuernstahl

Date:2025-03-17 12:48:29

Simulating the complex interactions between soft tissues and rigid anatomy is critical for applications in surgical training, planning, and robotic-assisted interventions. Traditional Finite Element Method (FEM)-based simulations, while accurate, are computationally expensive and impractical for real-time scenarios. Learning-based approaches have shown promise in accelerating predictions but have fallen short in modeling soft-rigid interactions effectively. We introduce MIXPINN, a physics-informed Graph Neural Network (GNN) framework for mixed-material simulations, explicitly capturing soft-rigid interactions using graph-based augmentations. Our approach integrates Virtual Nodes (VNs) and Virtual Edges (VEs) to enhance rigid body constraint satisfaction while preserving computational efficiency. By leveraging a graph-based representation of biomechanical structures, MIXPINN learns high-fidelity deformations from FEM-generated data and achieves real-time inference with sub-millimeter accuracy. We validate our method in a realistic clinical scenario, demonstrating superior performance compared to baseline GNN models and traditional FEM methods. Our results show that MIXPINN reduces computational cost by an order of magnitude while maintaining high physical accuracy, making it a viable solution for real-time surgical simulation and robotic-assisted procedures.

Optimal mixed fleet and charging infrastructure planning to electrify demand responsive feeder services with target CO2 emission constraints

Authors:Haruko Nakao, Tai-Yu Ma, Richard D. Connors, Francesco Viti

Date:2025-03-17 11:45:54

Electrifying demand-responsive transport systems need to plan the charging infrastructure carefully, considering the trade-offs of charging efficiency and charging infrastructure costs. Earlier studies assume a fully electrified fleet and overlook the planning issue in the transition period. This study addresses the joint fleet size and charging infrastructure planning for a demand-responsive feeder service under stochastic demand, given a user-defined targeted CO2 emission reduction policy. We propose a bi-level optimization model where the upper-level determines charging station configuration given stochastic demand patterns, whereas the lower-level solves a mixed fleet dial-a-ride routing problem under the CO2 emission and capacitated charging station constraints. An efficient deterministic annealing metaheuristic is proposed to solve the CO2-constrained mixed fleet routing problem. The performance of the algorithm is validated by a series of numerical test instances with up to 500 requests. We apply the model for a real-world case study in Bettembourg, Luxembourg, with different demand and customised CO2 reduction targets. The results show that the proposed method provides a flexible tool for joint charging infrastructure and fleet size planning under different levels of demand and CO2 emission reduction targets.

Vision-based automatic fruit counting with UAV

Authors:Hubert Szolc, Mateusz Wasala, Remigiusz Mietla, Kacper Iwicki, Tomasz Kryjak

Date:2025-03-17 11:36:58

The use of unmanned aerial vehicles (UAVs) for smart agriculture is becoming increasingly popular. This is evidenced by recent scientific works, as well as the various competitions organised on this topic. Therefore, in this work we present a system for automatic fruit counting using UAVs. To detect them, our solution uses a vision algorithm that processes streams from an RGB camera and a depth sensor using classical image operations. Our system also allows the planning and execution of flight trajectories, taking into account the minimisation of flight time and distance covered. We tested the proposed solution in simulation and obtained an average score of 87.27/100 points from a total of 500 missions. We also submitted it to the UAV Competition organised as part of the ICUAS 2024 conference, where we achieved an average score of 84.83/100 points, placing 6th in a field of 23 teams and advancing to the finals.

WOW: Workflow-Aware Data Movement and Task Scheduling for Dynamic Scientific Workflows

Authors:Fabian Lehmann, Jonathan Bader, Friedrich Tschirpke, Ninon De Mecquenem, Ansgar Lößer, Soeren Becker, Katarzyna Ewa Lewińska, Lauritz Thamsen, Ulf Leser

Date:2025-03-17 11:24:16

Scientific workflows process extensive data sets over clusters of independent nodes, which requires a complex stack of infrastructure components, especially a resource manager (RM) for task-to-node assignment, a distributed file system (DFS) for data exchange between tasks, and a workflow engine to control task dependencies. To enable a decoupled development and installation of these components, current architectures place intermediate data files during workflow execution independently of the future workload. In data-intensive applications, this separation results in suboptimal schedules, as tasks are often assigned to nodes lacking input data, causing network traffic and bottlenecks. This paper presents WOW, a new scheduling approach for dynamic scientific workflow systems that steers both data movement and task scheduling to reduce network congestion and overall runtime. For this, WOW creates speculative copies of intermediate files to prepare the execution of subsequently scheduled tasks. WOW supports modern workflow systems that gain flexibility through the dynamic construction of execution plans. We prototypically implemented WOW for the popular workflow engine Nextflow using Kubernetes as a resource manager. In experiments with 16 synthetic and real workflows, WOW reduced makespan in all cases, with improvement of up to 94.5% for workflow patterns and up to 53.2% for real workflows, at a moderate increase of temporary storage space. It also has favorable effects on CPU allocation and scales well with increasing cluster size.

MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling

Authors:Robin Zbinden, Nina van Tiel, Gencer Sumbul, Chiara Vanalli, Benjamin Kellenberger, Devis Tuia

Date:2025-03-17 11:02:28

Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.

Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided, Self-Consistent MLLMs for Food Preparation Task Planning

Authors:Yu-Hong Shen, Chuan-Yu Wu, Yi-Ru Yang, Yen-Ling Tai, Yi-Ting Chen

Date:2025-03-17 11:01:02

We study Multimodal Large Language Models (MLLMs) with in-context learning for food preparation task planning. In this context, we identify two key challenges: cross-modal distraction and geometric feasibility. Cross-modal distraction occurs when the inclusion of visual input degrades the reasoning performance of a MLLM. Geometric feasibility refers to the ability of MLLMs to ensure that the selected skills are physically executable in the environment. To address these issues, we adapt Chain of Thought (CoT) with Self-Consistency to mitigate reasoning loss from cross-modal distractions and use affordance predictor as skill preconditions to guide MLLM on geometric feasibility. We construct a dataset to evaluate the ability of MLLMs on quantity estimation, reachability analysis, relative positioning and collision avoidance. We conducted a detailed evaluation to identify issues among different baselines and analyze the reasons for improvement, providing insights into each approach. Our method reaches a success rate of 76.7% on the entire dataset, showing a substantial improvement over the CoT baseline at 36.7%.

InsightDrive: Insight Scene Representation for End-to-End Autonomous Driving

Authors:Ruiqi Song, Xianda Guo, Hangbin Wu, Qinggong Wei, Long Chen

Date:2025-03-17 10:52:32

Directly generating planning results from raw sensors has become increasingly prevalent due to its adaptability and robustness in complex scenarios. Scene representation, as a key module in the pipeline, has traditionally relied on conventional perception, which focus on the global scene. However, in driving scenarios, human drivers typically focus only on regions that directly impact driving, which often coincide with those required for end-to-end autonomous driving. In this paper, a novel end-to-end autonomous driving method called InsightDrive is proposed, which organizes perception by language-guided scene representation. We introduce an instance-centric scene tokenizer that transforms the surrounding environment into map- and object-aware instance tokens. Scene attention language descriptions, which highlight key regions and obstacles affecting the ego vehicle's movement, are generated by a vision-language model that leverages the cognitive reasoning capabilities of foundation models. We then align scene descriptions with visual features using the vision-language model, guiding visual attention through these descriptions to give effectively scene representation. Furthermore, we employ self-attention and cross-attention mechanisms to model the ego-agents and ego-map relationships to comprehensively build the topological relationships of the scene. Finally, based on scene understanding, we jointly perform motion prediction and planning. Extensive experiments on the widely used nuScenes benchmark demonstrate that the proposed InsightDrive achieves state-of-the-art performance in end-to-end autonomous driving. The code is available at https://github.com/songruiqi/InsightDrive

Exploring 3D Activity Reasoning and Planning: From Implicit Human Intentions to Route-Aware Planning

Authors:Xueying Jiang, Wenhao Li, Xiaoqin Zhang, Ling Shao, Shijian Lu

Date:2025-03-17 09:33:58

3D activity reasoning and planning has attracted increasing attention in human-robot interaction and embodied AI thanks to the recent advance in multimodal learning. However, most existing works share two constraints: 1) heavy reliance on explicit instructions with little reasoning on implicit user intention; 2) negligence of inter-step route planning on robot moves. To bridge the gaps, we propose 3D activity reasoning and planning, a novel 3D task that reasons the intended activities from implicit instructions and decomposes them into steps with inter-step routes and planning under the guidance of fine-grained 3D object shapes and locations from scene segmentation. We tackle the new 3D task from two perspectives. First, we construct ReasonPlan3D, a large-scale benchmark that covers diverse 3D scenes with rich implicit instructions and detailed annotations for multi-step task planning, inter-step route planning, and fine-grained segmentation. Second, we design a novel framework that introduces progressive plan generation with contextual consistency across multiple steps, as well as a scene graph that is updated dynamically for capturing critical objects and their spatial relations. Extensive experiments demonstrate the effectiveness of our benchmark and framework in reasoning activities from implicit human instructions, producing accurate stepwise task plans, and seamlessly integrating route planning for multi-step moves. The dataset and code will be released.

OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering

Authors:Guanhua Ding, Yuxuan Xia, Runwei Guan, Qinchen Wu, Tao Huang, Weiping Ding, Jinping Sun, Guoqiang Mao

Date:2025-03-17 09:24:26

Accurate 3D multi-object tracking (MOT) is crucial for autonomous driving, as it enables robust perception, navigation, and planning in complex environments. While deep learning-based solutions have demonstrated impressive 3D MOT performance, model-based approaches remain appealing for their simplicity, interpretability, and data efficiency. Conventional model-based trackers typically rely on random vector-based Bayesian filters within the tracking-by-detection (TBD) framework but face limitations due to heuristic data association and track management schemes. In contrast, random finite set (RFS)-based Bayesian filtering handles object birth, survival, and death in a theoretically sound manner, facilitating interpretability and parameter tuning. In this paper, we present OptiPMB, a novel RFS-based 3D MOT method that employs an optimized Poisson multi-Bernoulli (PMB) filter while incorporating several key innovative designs within the TBD framework. Specifically, we propose a measurement-driven hybrid adaptive birth model for improved track initialization, employ adaptive detection probability parameters to effectively maintain tracks for occluded objects, and optimize density pruning and track extraction modules to further enhance overall tracking performance. Extensive evaluations on nuScenes and KITTI datasets show that OptiPMB achieves superior tracking accuracy compared with state-of-the-art methods, thereby establishing a new benchmark for model-based 3D MOT and offering valuable insights for future research on RFS-based trackers in autonomous driving.

Initial acquisition requirements for optical cavities in the space gravitational wave antennae DECIGO and B-DECIGO

Authors:Yuta Michimura, Koji Nagano, Kentaro Komori, Kiwamu Izumi, Takahiro Ito, Satoshi Ikari, Tomotada Akutsu, Masaki Ando, Isao Kawano, Mitsuru Musha, Shuichi Sato

Date:2025-03-17 09:15:20

DECIGO (DECi-hertz Interferometer Gravitational Wave Observatory) is a space-based gravitational wave antenna concept targeting the 0.1-10 Hz band. It consists of three spacecraft arranged in an equilateral triangle with 1,000 km sides, forming Fabry-P\'erot cavities between them. A precursor mission, B-DECIGO, is also planned, featuring a smaller 100 km triangle. Operating these cavities requires ultra-precise formation flying, where inter-mirror distance and alignment must be precisely controlled. Achieving this necessitates a sequential improvement in precision using various sensors and actuators, from the deployment of the spacecraft to laser link acquisition and ultimately to the control of the Fabry-P\'erot cavities to maintain resonance. In this paper, we derive the precision requirements at each stage and discuss the feasibility of achieving them. We show that the relative speed between cavity mirrors must be controlled at the sub-micrometer-per-second level and that relative alignment must be maintained at the sub-microradian level to obtain control signals from the Fabry-P\'erot cavities of DECIGO and B-DECIGO.

HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding

Authors:Jiahe Zhao, Ruibing Hou, Zejie Tian, Hong Chang, Shiguang Shan

Date:2025-03-17 09:10:50

We propose a new task to benchmark human-in-scene understanding for embodied agents: Human-In-Scene Question Answering (HIS-QA). Given a human motion within a 3D scene, HIS-QA requires the agent to comprehend human states and behaviors, reason about its surrounding environment, and answer human-related questions within the scene. To support this new task, we present HIS-Bench, a multimodal benchmark that systematically evaluates HIS understanding across a broad spectrum, from basic perception to commonsense reasoning and planning. Our evaluation of various vision-language models on HIS-Bench reveals significant limitations in their ability to handle HIS-QA tasks. To this end, we propose HIS-GPT, the first foundation model for HIS understanding. HIS-GPT integrates 3D scene context and human motion dynamics into large language models while incorporating specialized mechanisms to capture human-scene interactions. Extensive experiments demonstrate that HIS-GPT sets a new state-of-the-art on HIS-QA tasks. We hope this work inspires future research on human behavior analysis in 3D scenes, advancing embodied AI and world models.

Shoot-through layers in upright proton arcs unlock advantages in plan quality and range verification

Authors:Erik Engwall, Victor Mikhalev, Johan Sundström, Otte Marthin, Viktor Wase

Date:2025-03-17 08:10:32

Background and purpose: Upright proton therapy with compact delivery systems has the potential to reduce costs for treatments but could also lead to broadening of the beam penumbra. This study aims at combining upright static proton arcs with additional layers of shoot-through (ST) protons to sharpen the beam penumbra and improve plan quality for such systems. An additional advantage of the method is that it provides a straightforward approach for range verification. Methods: We examined various treatment plans for a virtual phantom: 3-beam IMPT, static arc (Arc) with/without ST (Arc+ST), and with/without collimation (+Coll). In the virtual phantom three different targets were utilized to study the effect on conformity index (CI), homogeneity index (HI), robustness and mean dose to the phantom volume. The phantom study was complemented with a head-and-neck (H&N) patient case with a similar set of plans. A range verification concept that determines residual ranges of the ST protons was studied in simulated scenarios for the H&N case. Results: The Arc+ST plans show superior CI, HI and target robustness compared to the Arc+Coll plans. For the Arc plans without ST, the collimated plans perform better than the uncollimated plans. For Arc+ST, on the other hand, collimation has little impact on CI, HI and robustness. However, a small increase in the mean dose to the phantom volume is seen without collimation. For the H&N case, similar improvements for Arc+ST can be seen with only a marginal increase of the mean dose to the patient volume. The range verification simulation shows that the method is suitable to detect range errors. Conclusions: Combining proton arcs and ST layers can enhance compact upright proton solutions by improving plan quality. It is also tailored for the inclusion of a fast and straightforward residual range verification method.

A Hierarchical Region-Based Approach for Efficient Multi-Robot Exploration

Authors:Di Meng, Tianhao Zhao, Chaoyu Xue, Jun Wu, Qiuguo Zhu

Date:2025-03-17 07:13:55

Multi-robot autonomous exploration in an unknown environment is an important application in robotics.Traditional exploration methods only use information around frontier points or viewpoints, ignoring spatial information of unknown areas. Moreover, finding the exact optimal solution for multi-robot task allocation is NP-hard, resulting in significant computational time consumption. To address these issues, we present a hierarchical multi-robot exploration framework using a new modeling method called RegionGraph. The proposed approach makes two main contributions: 1) A new modeling method for unexplored areas that preserves their spatial information across the entire space in a weighted graph called RegionGraph. 2) A hierarchical multi-robot exploration framework that decomposes the global exploration task into smaller subtasks, reducing the frequency of global planning and enabling asynchronous exploration. The proposed method is validated through both simulation and real-world experiments, demonstrating a 20% improvement in efficiency compared to existing methods.

Robust Co-Optimization of Distribution Network Hardening and Mobile Resource Scheduling with Decision-Dependent Uncertainty

Authors:Donglai Ma, Xiaoyu Cao, Bo Zeng, Chen Chen, Qiaozhu Zhai, Qing-Shan Jia, Xiaohong Guan

Date:2025-03-17 06:48:19

This paper studies the robust co-planning of proactive network hardening and mobile hydrogen energy resources (MHERs) scheduling, which is to enhance the resilience of power distribution network (PDN) against the disastrous events. A decision-dependent robust optimization model is formulated with min-max resilience constraint and discrete recourse structure, which helps achieve the load survivability target considering endogenous uncertainties. Different from the traditional model with a fixed uncertainty set, we adopt a dynamic representation that explicitly captures the endogenous uncertainties of network contingency as well as the available hydrogen storage levels of MHERs, which induces a decision-dependent uncertainty (DDU) set. Also, the multi-period adaptive routing and energy scheduling of MHERs are modeled as a mixed-integer recourse problem for further decreasing the resilience cost. Then, a nested parametric column-and-constraint generation (N-PC&CG) algorithm is customized and developed to solve this challenging formulation. By leveraging the structural property of the DDU set as well as the combination of discrete recourse decisions and the corresponding extreme points, we derive a strengthened solution scheme with nontrivial enhancement strategies to realize efficient and exact computation. Numerical results on 14-bus test system and 56-bus real-world distribution network demonstrate the resilience benefits and economical feasibility of the proposed method under different damage severity levels. Moreover, the enhanced N-PC&CG shows a superior solution capability to support prompt decisions for resilient planning with DDU models.

Epidemic Forecasting with a Hybrid Deep Learning Method Using CNN LSTM With WOA GWO Optimization: Global COVID-19 Case Study

Authors:Mousa Alizadeh, Mohammad Hossein Samaei, Azam Seilsepour, Mohammad TH Beheshti

Date:2025-03-17 04:41:26

Effective epidemic modeling is essential for managing public health crises, requiring robust methods to predict disease spread and optimize resource allocation. This study introduces a novel deep learning framework that advances time series forecasting for infectious diseases, with its application to COVID 19 data as a critical case study. Our hybrid approach integrates Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM) models to capture spatial and temporal dynamics of disease transmission across diverse regions. The CNN extracts spatial features from raw epidemiological data, while the LSTM models temporal patterns, yielding precise and adaptable predictions. To maximize performance, we employ a hybrid optimization strategy combining the Whale Optimization Algorithm (WOA) and Gray Wolf Optimization (GWO) to fine tune hyperparameters, such as learning rates, batch sizes, and training epochs enhancing model efficiency and accuracy. Applied to COVID 19 case data from 24 countries across six continents, our method outperforms established benchmarks, including ARIMA and standalone LSTM models, with statistically significant gains in predictive accuracy (e.g., reduced RMSE). This framework demonstrates its potential as a versatile method for forecasting epidemic trends, offering insights for resource planning and decision making in both historical contexts, like the COVID 19 pandemic, and future outbreaks.

DART: Dual-level Autonomous Robotic Topology for Efficient Exploration in Unknown Environments

Authors:Qiming Wang, Yulong Gao, Yang Wang, Xiongwei Zhao, Yijiao Sun, Xiangyan Kong

Date:2025-03-17 03:34:33

Conventional algorithms in autonomous exploration face challenges due to their inability to accurately and efficiently identify the spatial distribution of convex regions in the real-time map. These methods often prioritize navigation toward the nearest or information-rich frontiers -- the boundaries between known and unknown areas -- resulting in incomplete convex region exploration and requiring excessive backtracking to revisit these missed areas. To address these limitations, this paper introduces an innovative dual-level topological analysis approach. First, we introduce a Low-level Topological Graph (LTG), generated through uniform sampling of the original map data, which captures essential geometric and connectivity details. Next, the LTG is transformed into a High-level Topological Graph (HTG), representing the spatial layout and exploration completeness of convex regions, prioritizing the exploration of convex regions that are not fully explored and minimizing unnecessary backtracking. Finally, an novel Local Artificial Potential Field (LAPF) method is employed for motion control, replacing conventional path planning and boosting overall efficiency. Experimental results highlight the effectiveness of our approach. Simulation tests reveal that our framework significantly reduces exploration time and travel distance, outperforming existing methods in both speed and efficiency. Ablation studies confirm the critical role of each framework component. Real-world tests demonstrate the robustness of our method in environments with poor mapping quality, surpassing other approaches in adaptability to mapping inaccuracies and inaccessible areas.

Navigating Heat Exposure: Simulation of Route Planning Based on Visual Language Model Agents

Authors:Haoran Ma, Kaihan Zhang, Jiannan Cai

Date:2025-03-17 01:49:46

Heat exposure significantly influences pedestrian routing behaviors. Existing methods such as agent-based modeling (ABM) and empirical measurements fail to account for individual physiological variations and environmental perception mechanisms under thermal stress. This results in a lack of human-centred, heat-adaptive routing suggestions. To address these limitations, we propose a novel Vision Language Model (VLM)-driven Persona-Perception-Planning-Memory (PPPM) framework that integrating street view imagery and urban network topology to simulate heat-adaptive pedestrian routing. Through structured prompt engineering on Gemini-2.0 model, eight distinct heat-sensitive personas were created to model mobility behaviors during heat exposure, with empirical validation through questionnaire survey. Results demonstrate that simulation outputs effectively capture inter-persona variations, achieving high significant congruence with observed route preferences and highlighting differences in the factors driving agents decisions. Our framework is highly cost-effective, with simulations costing 0.006USD and taking 47.81s per route. This Artificial Intelligence-Generated Content (AIGC) methodology advances urban climate adaptation research by enabling high-resolution simulation of thermal-responsive mobility patterns, providing actionable insights for climate-resilient urban planning.

AI Agents: Evolution, Architecture, and Real-World Applications

Authors:Naveen Krishnan

Date:2025-03-16 23:07:48

This paper examines the evolution, architecture, and practical applications of AI agents from their early, rule-based incarnations to modern sophisticated systems that integrate large language models with dedicated modules for perception, planning, and tool use. Emphasizing both theoretical foundations and real-world deployments, the paper reviews key agent paradigms, discusses limitations of current evaluation benchmarks, and proposes a holistic evaluation framework that balances task effectiveness, efficiency, robustness, and safety. Applications across enterprise, personal assistance, and specialized domains are analyzed, with insights into future research directions for more resilient and adaptive AI agent systems.

Automated Planning for Optimal Data Pipeline Instantiation

Authors:Leonardo Rosa Amado, Adriano Vogel, Dalvan Griebler, Gabriel Paludo Licks, Eric Simon, Felipe Meneguzzi

Date:2025-03-16 19:43:12

Data pipeline frameworks provide abstractions for implementing sequences of data-intensive transformation operators, automating the deployment and execution of such transformations in a cluster. Deploying a data pipeline, however, requires computing resources to be allocated in a data center, ideally minimizing the overhead for communicating data and executing operators in the pipeline while considering each operator's execution requirements. In this paper, we model the problem of optimal data pipeline deployment as planning with action costs, where we propose heuristics aiming to minimize total execution time. Experimental results indicate that the heuristics can outperform the baseline deployment and that a heuristic based on connections outperforms other strategies.

VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility

Authors:Yitian Shi, Di Wen, Guanqi Chen, Edgar Welte, Sheng Liu, Kunyu Peng, Rainer Stiefelhagen, Rania Rayyes

Date:2025-03-16 18:46:54

We propose VISO-Grasp, a novel vision-language-informed system designed to systematically address visibility constraints for grasping in severely occluded environments. By leveraging Foundation Models (FMs) for spatial reasoning and active view planning, our framework constructs and updates an instance-centric representation of spatial relationships, enhancing grasp success under challenging occlusions. Furthermore, this representation facilitates active Next-Best-View (NBV) planning and optimizes sequential grasping strategies when direct grasping is infeasible. Additionally, we introduce a multi-view uncertainty-driven grasp fusion mechanism that refines grasp confidence and directional uncertainty in real-time, ensuring robust and stable grasp execution. Extensive real-world experiments demonstrate that VISO-Grasp achieves a success rate of $87.5\%$ in target-oriented grasping with the fewest grasp attempts outperforming baselines. To the best of our knowledge, VISO-Grasp is the first unified framework integrating FMs into target-aware active view planning and 6-DoF grasping in environments with severe occlusions and entire invisibility constraints.

EmoBipedNav: Emotion-aware Social Navigation for Bipedal Robots with Deep Reinforcement Learning

Authors:Wei Zhu, Abirath Raju, Abdulaziz Shamsah, Anqi Wu, Seth Hutchinson, Ye Zhao

Date:2025-03-16 15:11:57

This study presents an emotion-aware navigation framework -- EmoBipedNav -- using deep reinforcement learning (DRL) for bipedal robots walking in socially interactive environments. The inherent locomotion constraints of bipedal robots challenge their safe maneuvering capabilities in dynamic environments. When combined with the intricacies of social environments, including pedestrian interactions and social cues, such as emotions, these challenges become even more pronounced. To address these coupled problems, we propose a two-stage pipeline that considers both bipedal locomotion constraints and complex social environments. Specifically, social navigation scenarios are represented using sequential LiDAR grid maps (LGMs), from which we extract latent features, including collision regions, emotion-related discomfort zones, social interactions, and the spatio-temporal dynamics of evolving environments. The extracted features are directly mapped to the actions of reduced-order models (ROMs) through a DRL architecture. Furthermore, the proposed framework incorporates full-order dynamics and locomotion constraints during training, effectively accounting for tracking errors and restrictions of the locomotion controller while planning the trajectory with ROMs. Comprehensive experiments demonstrate that our approach exceeds both model-based planners and DRL-based baselines. The hardware videos and open-source code are available at https://gatech-lidar.github.io/emobipednav.github.io/.

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Authors:Haoqi Yuan, Yu Bai, Yuhui Fu, Bohan Zhou, Yicheng Feng, Xinrun Xu, Yi Zhan, Börje F. Karlsson, Zongqing Lu

Date:2025-03-16 14:53:53

Building autonomous robotic agents capable of achieving human-level performance in real-world embodied tasks is an ultimate goal in humanoid robot research. Recent advances have made significant progress in high-level cognition with Foundation Models (FMs) and low-level skill development for humanoid robots. However, directly combining these components often results in poor robustness and efficiency due to compounding errors in long-horizon tasks and the varied latency of different modules. We introduce Being-0, a hierarchical agent framework that integrates an FM with a modular skill library. The FM handles high-level cognitive tasks such as instruction understanding, task planning, and reasoning, while the skill library provides stable locomotion and dexterous manipulation for low-level control. To bridge the gap between these levels, we propose a novel Connector module, powered by a lightweight vision-language model (VLM). The Connector enhances the FM's embodied capabilities by translating language-based plans into actionable skill commands and dynamically coordinating locomotion and manipulation to improve task success. With all components, except the FM, deployable on low-cost onboard computation devices, Being-0 achieves efficient, real-time performance on a full-sized humanoid robot equipped with dexterous hands and active vision. Extensive experiments in large indoor environments demonstrate Being-0's effectiveness in solving complex, long-horizon tasks that require challenging navigation and manipulation subtasks. For further details and videos, visit https://beingbeyond.github.io/being-0.

Enhanced Approximation Algorithms for the Capacitated Location Routing Problem

Authors:Jingyang Zhao, Mingyu Xiao, Shunwang Wang

Date:2025-03-16 13:37:58

The Capacitated Location Routing Problem is an important planning and routing problem in logistics, which generalizes the capacitated vehicle routing problem and the uncapacitated facility location problem. In this problem, we are given a set of depots and a set of customers where each depot has an opening cost and each customer has a demand. The goal is to open some depots and route capacitated vehicles from the opened depots to satisfy all customers' demand, while minimizing the total cost. In this paper, we propose a $4.169$-approximation algorithm for this problem, improving the best-known $4.38$-approximation ratio. Moreover, if the demand of each customer is allowed to be delivered by multiple tours, we propose a more refined $4.091$-approximation algorithm. Experimental study on benchmark instances shows that the quality of our computed solutions is better than that of the previous algorithm and is also much closer to optimality than the provable approximation factor.

Iterative Motion Planning in Multi-agent Systems with Opportunistic Communication under Disturbance

Authors:Neelanga Thelasingha, Agung Julius, James Humann, James Dotterweich

Date:2025-03-16 11:20:47

In complex multi-agent systems involving heterogeneous teams, uncertainty arises from numerous sources like environmental disturbances, model inaccuracies, and changing tasks. This causes planned trajectories to become infeasible, requiring replanning. Further, different communication architectures used in multi-agent systems give rise to asymmetric knowledge of planned trajectories across the agents. In such systems, replanning must be done in a communication-aware fashion. This paper establishes the conditions for synchronization and feasibility in epistemic planning scenarios introduced by opportunistic communication architectures. We also establish conditions on task satisfaction based on quantified recoverability of disturbances in an iterative planning scheme. We further validate these theoretical results experimentally in a UAV--UGV task assignment problem.

Investigation of the semileptonic decays $Ξ^{(')}_{b}\rightarrow Ξ^{(')}_{c}{\ell}\barν_{\ell}$

Authors:Z. Neishabouri, K. Azizi

Date:2025-03-16 07:33:40

We study the semileptonic decays of $\Xi^{(')}_{b}\rightarrow\Xi^{(')}_{c}{\ell}\bar\nu_{\ell}$ in all lepton channels. To do this, we first obtain the form factors defining these decay modes within the framework of QCD sum rules. Then, using the transferred momentum squared-dependent form factors, we compute the decay widths and branching fractions for all lepton channels and compare the results of our calculations with those obtained from other theoretical methods. We also estimate the branching ratios and the ratio of branching fractions at different leptonic channels to provide useful information for future experiments may be planned at different Colliders. Such comparison will provide valuable information about the consistency/inconsistency of the SM theory predictions with experimental data in weak semileptonic single heavy baryon decays.