planning - 2025-09-09

Deep Reactive Policy: Learning Reactive Manipulator Motion Planning for Dynamic Environments

Authors:Jiahui Yang, Jason Jingzhou Liu, Yulong Li, Youssef Khaky, Kenneth Shaw, Deepak Pathak

Date:2025-09-08 17:59:35

Generating collision-free motion in dynamic, partially observable environments is a fundamental challenge for robotic manipulators. Classical motion planners can compute globally optimal trajectories but require full environment knowledge and are typically too slow for dynamic scenes. Neural motion policies offer a promising alternative by operating in closed-loop directly on raw sensory inputs but often struggle to generalize in complex or dynamic settings. We propose Deep Reactive Policy (DRP), a visuo-motor neural motion policy designed for reactive motion generation in diverse dynamic environments, operating directly on point cloud sensory input. At its core is IMPACT, a transformer-based neural motion policy pretrained on 10 million generated expert trajectories across diverse simulation scenarios. We further improve IMPACT's static obstacle avoidance through iterative student-teacher finetuning. We additionally enhance the policy's dynamic obstacle avoidance at inference time using DCP-RMP, a locally reactive goal-proposal module. We evaluate DRP on challenging tasks featuring cluttered scenes, dynamic moving obstacles, and goal obstructions. DRP achieves strong generalization, outperforming prior classical and neural methods in success rate across both simulated and real-world settings. Video results and code available at https://deep-reactive-policy.com

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Authors:Qi Lv, Weijie Kong, Hao Li, Jia Zeng, Zherui Qiu, Delin Qu, Haoming Song, Qizhi Chen, Xiang Deng, Jiangmiao Pang

Date:2025-09-08 17:58:30

Executing language-conditioned tasks in dynamic visual environments remains a central challenge in embodied AI. Existing Vision-Language-Action (VLA) models predominantly adopt reactive state-to-action mappings, often leading to short-sighted behaviors and poor robustness in dynamic scenes. In this paper, we introduce F1, a pretrained VLA framework which integrates the visual foresight generation into decision-making pipeline. F1 adopts a Mixture-of-Transformer architecture with dedicated modules for perception, foresight generation, and control, thereby bridging understanding, generation, and actions. At its core, F1 employs a next-scale prediction mechanism to synthesize goal-conditioned visual foresight as explicit planning targets. By forecasting plausible future visual states, F1 reformulates action generation as a foresight-guided inverse dynamics problem, enabling actions that implicitly achieve visual goals. To endow F1 with robust and generalizable capabilities, we propose a three-stage training recipe on an extensive dataset comprising over 330k trajectories across 136 diverse tasks. This training scheme enhances modular reasoning and equips the model with transferable visual foresight, which is critical for complex and dynamic environments. Extensive evaluations on real-world tasks and simulation benchmarks demonstrate F1 consistently outperforms existing approaches, achieving substantial gains in both task success rate and generalization ability.

Safe Robust Predictive Control-based Motion Planning of Automated Surface Vessels in Inland Waterways

Authors:Sajad Ahmadi, Hossein Nejatbakhsh Esfahani, Javad Mohammadpour Velni

Date:2025-09-08 13:43:09

Deploying self-navigating surface vessels in inland waterways offers a sustainable alternative to reduce road traffic congestion and emissions. However, navigating confined waterways presents unique challenges, including narrow channels, higher traffic density, and hydrodynamic disturbances. Existing methods for autonomous vessel navigation often lack the robustness or precision required for such environments. This paper presents a new motion planning approach for Automated Surface Vessels (ASVs) using Robust Model Predictive Control (RMPC) combined with Control Barrier Functions (CBFs). By incorporating channel borders and obstacles as safety constraints within the control design framework, the proposed method ensures both collision avoidance and robust navigation on complex waterways. Simulation results demonstrate the efficacy of the proposed method in safely guiding ASVs under realistic conditions, highlighting its improved safety and adaptability compared to the state-of-the-art.

An Adaptive Coverage Control Approach for Multiple Autonomous Off-road Vehicles in Dynamic Agricultural Fields

Authors:Sajad Ahmadi, Mohammadreza Davoodi, Javad Mohammadpour Velni

Date:2025-09-08 13:38:39

This paper presents an adaptive coverage control method for a fleet of off-road and Unmanned Ground Vehicles (UGVs) operating in dynamic (time-varying) agricultural environments. Traditional coverage control approaches often assume static conditions, making them unsuitable for real-world farming scenarios where obstacles, such as moving machinery and uneven terrains, create continuous challenges. To address this, we propose a real-time path planning framework that integrates Unmanned Aerial Vehicles (UAVs) for obstacle detection and terrain assessment, allowing UGVs to dynamically adjust their coverage paths. The environment is modeled as a weighted directed graph, where the edge weights are continuously updated based on the UAV observations to reflect obstacle motion and terrain variations. The proposed approach incorporates Voronoi-based partitioning, adaptive edge weight assignment, and cost-based path optimization to enhance navigation efficiency. Simulation results demonstrate the effectiveness of the proposed method in improving path planning, reducing traversal costs, and maintaining robust coverage in the presence of dynamic obstacles and muddy terrains.

Online Clustering of Seafloor Imagery for Interpretation during Long-Term AUV Operations

Authors:Cailei Liang, Adrian Bodenmann, Sam Fenton, Blair Thornton

Date:2025-09-08 13:36:27

As long-endurance and seafloor-resident AUVs become more capable, there is an increasing need for extended, real-time interpretation of seafloor imagery to enable adaptive missions and optimise communication efficiency. Although offline image analysis methods are well established, they rely on access to complete datasets and human-labelled examples to manage the strong influence of environmental and operational conditions on seafloor image appearance-requirements that cannot be met in real-time settings. To address this, we introduce an online clustering framework (OCF) capable of interpreting seafloor imagery without supervision, which is designed to operate in real-time on continuous data streams in a scalable, adaptive, and self-consistent manner. The method enables the efficient review and consolidation of common patterns across the entire data history in constant time by identifying and maintaining a set of representative samples that capture the evolving feature distribution, supporting dynamic cluster merging and splitting without reprocessing the full image history. We evaluate the framework on three diverse seafloor image datasets, analysing the impact of different representative sampling strategies on both clustering accuracy and computational cost. The OCF achieves the highest average F1 score of 0.68 across the three datasets among all comparative online clustering approaches, with a standard deviation of 3% across three distinct survey trajectories, demonstrating its superior clustering capability and robustness to trajectory variation. In addition, it maintains consistently lower and bounded computational time as the data volume increases. These properties are beneficial for generating survey data summaries and supporting informative path planning in long-term, persistent autonomous marine exploration.

CogGuide: Human-Like Guidance for Zero-Shot Omni-Modal Reasoning

Authors:Zhou-Peng Shou, Zhi-Qiang You, Fang Wang, Hai-Bo Liu

Date:2025-09-08 12:57:02

Targeting the issues of "shortcuts" and insufficient contextual understanding in complex cross-modal reasoning of multimodal large models, this paper proposes a zero-shot multimodal reasoning component guided by human-like cognitive strategies centered on an "intent sketch". The component comprises a plug-and-play three-module pipeline-Intent Perceiver, Strategy Generator, and Strategy Selector-that explicitly constructs a "understand-plan-select" cognitive process. By generating and filtering "intent sketch" strategies to guide the final reasoning, it requires no parameter fine-tuning and achieves cross-model transfer solely through in-context engineering. Information-theoretic analysis shows that this process can reduce conditional entropy and improve information utilization efficiency, thereby suppressing unintended shortcut reasoning. Experiments on IntentBench, WorldSense, and Daily-Omni validate the method's generality and robust gains; compared with their respective baselines, the complete "three-module" scheme yields consistent improvements across different reasoning engines and pipeline combinations, with gains up to approximately 9.51 percentage points, demonstrating the practical value and portability of the "intent sketch" reasoning component in zero-shot scenarios.

A Robust Approach for LiDAR-Inertial Odometry Without Sensor-Specific Modeling

Authors:Meher V. R. Malladi, Tiziano Guadagnino, Luca Lobefaro, Cyrill Stachniss

Date:2025-09-08 12:04:12

Accurate odometry is a critical component in a robotic navigation stack, and subsequent modules such as planning and control often rely on an estimate of the robot's motion. Sensor-based odometry approaches should be robust across sensor types and deployable in different target domains, from solid-state LiDARs mounted on cars in urban-driving scenarios to spinning LiDARs on handheld packages used in unstructured natural environments. In this paper, we propose a robust LiDAR-inertial odometry system that does not rely on sensor-specific modeling. Sensor fusion techniques for LiDAR and inertial measurement unit (IMU) data typically integrate IMU data iteratively in a Kalman filter or use pre-integration in a factor graph framework, combined with LiDAR scan matching often exploiting some form of feature extraction. We propose an alternative strategy that only requires a simplified motion model for IMU integration and directly registers LiDAR scans in a scan-to-map approach. Our approach allows us to impose a novel regularization on the LiDAR registration, improving the overall odometry performance. We detail extensive experiments on a number of datasets covering a wide array of commonly used robotic sensors and platforms. We show that our approach works with the exact same configuration in all these scenarios, demonstrating its robustness. We have open-sourced our implementation so that the community can build further on our work and use it in their navigation stacks.

Five Blind Men and the Internet: Towards an Understanding of Internet Traffic

Authors:Ege Cem Kirci, Ayush Mishra, Laurent Vanbever

Date:2025-09-08 10:20:42

The Internet, the world's largest and most pervasive network, lacks a transparent, granular view of its traffic patterns, volumes, and growth trends, hindering the networking community's understanding of its dynamics. This paper leverages publicly available Internet Exchange Point traffic statistics to address this gap, presenting a comprehensive two-year study (2023-2024) from 472 IXPs worldwide, capturing approximately 300 Tbps of peak daily aggregate traffic by late 2024. Our analysis reveals a 49.2% global traffic increase (24.5% annualized), uncovers regionally distinct diurnal patterns and event-driven anomalies, and demonstrates stable utilization rates, reflecting predictable infrastructure scaling. By analyzing biases and confirming high self-similarity, we establish IXP traffic as a robust proxy for overall Internet growth and usage behavior. With transparent, replicable data--covering 87% of the worldwide IXP port capacity--and plans to release our dataset, this study offers a verifiable foundation for long-term Internet traffic monitoring. In particular, our findings shed light on the interplay between network design and function, providing an accessible framework for researchers and operators to explore the Internet's evolving ecosystem.

A Statistical 3D Stomach Shape Model for Anatomical Analysis

Authors:Erez Posner, Ore Shtalrid, Oded Erell, Daniel Noy, Moshe Bouhnik

Date:2025-09-08 09:23:11

Realistic and parameterized 3D models of human anatomy have become invaluable in research, diagnostics, and surgical planning. However, the development of detailed models for internal organs, such as the stomach, has been limited by data availability and methodological challenges. In this paper, we propose a novel pipeline for the generation of synthetic 3D stomach models, enabling the creation of anatomically diverse morphologies informed by established studies on stomach shape variability. Using this pipeline, we construct a dataset of synthetic stomachs. Building on this dataset, we develop a 3D statistical shape model of the stomach, trained to capture natural anatomical variability in a low-dimensional shape space. The model is further refined using CT meshes derived from publicly available datasets through a semi-supervised alignment process, enhancing its ability to generalize to unseen anatomical variations. We evaluated the model on a held-out test set of real stomach CT scans, demonstrating robust generalization and fit accuracy. We make the statistical shape model along with the synthetic dataset publicly available on GitLab: https://gitlab.com/Erez.Posner/stomach_pytorch to facilitate further research. This work introduces the first statistical 3D shape model of the stomach, with applications ranging from surgical simulation and pre-operative planning to medical education and computational modeling. By combining synthetic data generation, parametric modeling, and real-world validation, our approach represents a significant advancement in organ modeling and opens new possibilities for personalized healthcare solutions.

Network-Aware Control of AGVs in an Industrial Scenario: A Simulation Study Based on ROS 2 and Gazebo

Authors:Filippo Bragato, Tullia Fontana, Marco Giordani, Malte Schellmann, Josef Eichinger, Michele Zorzi

Date:2025-09-08 08:55:44

Networked Control System (NCS) is a paradigm where sensors, controllers, and actuators communicate over a shared network. One promising application of NCS is the control of Automated Guided Vehicles (AGVs) in the industrial environment, for example to transport goods efficiently and to autonomously follow predefined paths or routes. In this context, communication and control are tightly correlated, a paradigm referred to as Joint Communication and Control (JCC), since network issues such as delays or errors can lead to significant deviations of the AGVs from the planned trajectory. In this paper, we present a simulation framework based on Gazebo and Robot Operating System 2 (ROS 2) to simulate and visualize, respectively, the complex interaction between the control of AGVs and the underlying communication network. This framework explicitly incorporates communication metrics, such as delay and packet loss, and control metrics, especially the Mean Squared Error (MSE) between the optimal/desired and actual path of the AGV in response to driving commands. Our results shed light into the correlation between the network performance, particularly Packet Reception Ratio (PRR), and accuracy of control.

MAPF-HD: Multi-Agent Path Finding in High-Density Environments

Authors:Hiroya Makino, Seigo Ito

Date:2025-09-08 06:59:46

Multi-agent path finding (MAPF) involves planning efficient paths for multiple agents to move simultaneously while avoiding collisions. In typical warehouse environments, agents are often sparsely distributed along aisles. However, increasing the agent density can improve space efficiency. When the agent density is high, we must optimize the paths not only for goal-assigned agents but also for those obstructing them. This study proposes a novel MAPF framework for high-density environments (MAPF-HD). Several studies have explored MAPF in similar settings using integer linear programming (ILP). However, ILP-based methods require substantial computation time to optimize all agent paths simultaneously. Even in small grid-based environments with fewer than $100$ cells, these computations can incur tens to hundreds of seconds. These high computational costs render these methods impractical for large-scale applications such as automated warehouses and valet parking. To address these limitations, we introduce the phased null-agent swapping (PHANS) method. PHANS employs a heuristic approach to incrementally swap positions between agents and empty vertices. This method solves the MAPF-HD problem within seconds to tens of seconds, even in large environments containing more than $700$ cells. The proposed method can potentially improve efficiency in various real-world applications such as warehouse logistics, traffic management, or crowd control. Code is available at https://github.com/ToyotaCRDL/MAPF-in-High-Density-Envs.

A data-driven discretized CS:GO simulation environment to facilitate strategic multi-agent planning research

Authors:Yunzhe Wang, Volkan Ustun, Chris McGroarty

Date:2025-09-08 06:02:59

Modern simulation environments for complex multi-agent interactions must balance high-fidelity detail with computational efficiency. We present DECOY, a novel multi-agent simulator that abstracts strategic, long-horizon planning in 3D terrains into high-level discretized simulation while preserving low-level environmental fidelity. Using Counter-Strike: Global Offensive (CS:GO) as a testbed, our framework accurately simulates gameplay using only movement decisions as tactical positioning -- without explicitly modeling low-level mechanics such as aiming and shooting. Central to our approach is a waypoint system that simplifies and discretizes continuous states and actions, paired with neural predictive and generative models trained on real CS:GO tournament data to reconstruct event outcomes. Extensive evaluations show that replays generated from human data in DECOY closely match those observed in the original game. Our publicly available simulation environment provides a valuable tool for advancing research in strategic multi-agent planning and behavior generation.

A Deep SETI Search for Technosignatures in the TRAPPIST-1 System with FAST

Authors:Guang-Yuan Song, Zhen-Zhao Tao, Bo-Lun Huang, Yan Cui, Bo Yu, Tong-Jie Zhang

Date:2025-09-08 03:23:39

The Five-hundred-meter Aperture Spherical Telescope (FAST) is the world's largest single-dish radio telescope, and the search for extraterrestrial intelligence (SETI) is one of its five key science objectives. We conducted a targeted narrowband search toward the TRAPPIST-1 system using FAST. The observations consisted of five independent L-band pointings, each with a 20-minute integration, for a total on-source time of 1.67h. The frequency coverage spanned 1.05--1.45GHz with a spectral resolution of ~7.5Hz. We searched for narrowband drifting signals with Doppler drift rates within +_4Hz/s and a signal-to-noise ratio threshold of S/N>10 in two orthogonal linear polarizations separately.Based on the system parameters adopted in this work, we estimate a minimum detectable equivalent isotropic radiated power of 2.04x10^10W, placing one of the most stringent constraints to date on persistent or high-duty-cycle narrowband transmitters in this system. No credible technosignature candidates were identified within the searched parameter space. Nevertheless,TRAPPIST-1 remains a compelling target for future SETI efforts. We plan to extend our search to other signal types, such as periodic or transient transmitters, and to carry out broader surveys of nearby exoplanetary systems with FAST.

A Spatio-Temporal Graph Neural Networks Approach for Predicting Silent Data Corruption inducing Circuit-Level Faults

Authors:Shaoqi Wei, Senling Wang, Hiroshi Kai, Yoshinobu Higami, Ruijun Ma, Tianming Ni, Xiaoqing Wen, Hiroshi Takahashi

Date:2025-09-08 02:23:51

Silent Data Errors (SDEs) from time-zero defects and aging degrade safety-critical systems. Functional testing detects SDE-related faults but is expensive to simulate. We present a unified spatio-temporal graph convolutional network (ST-GCN) for fast, accurate prediction of long-cycle fault impact probabilities (FIPs) in large sequential circuits, supporting quantitative risk assessment. Gate-level netlists are modeled as spatio-temporal graphs to capture topology and signal timing; dedicated spatial and temporal encoders predict multi-cycle FIPs efficiently. On ISCAS-89 benchmarks, the method reduces simulation time by more than 10x while maintaining high accuracy (mean absolute error 0.024 for 5-cycle predictions). The framework accepts features from testability metrics or fault simulation, allowing efficiency-accuracy trade-offs. A test-point selection study shows that choosing observation points by predicted FIPs improves detection of long-cycle, hard-to-detect faults. The approach scales to SoC-level test strategy optimization and fits downstream electronic design automation flows.

Minimum-Cost Synthetic Genome Planning: An Algorithmic Framework

Authors:Michail Patsakis, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

Date:2025-09-07 22:46:57

As synthetic genomics scales toward the construction of increasingly larger genomes, computational strategies are needed to address technical feasibility. We introduce an algorithmic framework for the Minimum-Cost Synthetic Genome Planning problem, aiming to identify the most cost-effective strategy to assemble a target genome from a source genome through a combination of reuse, synthesis, and join operations. By comparing dynamic programming and greedy heuristic strategies under diverse cost regimes, we demonstrate how algorithmic choices influence the cost-efficiency of large-scale genome construction. In parallel, solving the Minimum-Cost Synthetic Genome Planning problem can help us better understand genome architecture and evolution. We applied our framework in case studies on viral genomes, including SARS-CoV-2, to examine how source-target genome similarity shapes construction costs. Our analyses revealed that conserved regions such as ORF1ab can be reconstructed cost-effectively from related templates, while highly variable regions such as the S (spike) gene are more reliant on DNA synthesis, highlighting the biological and economic trade-offs of genome design.

AI Governance in Higher Education: A course design exploring regulatory, ethical and practical considerations

Authors:Zsolt Almási, Hannah Bleher, Johannes Bleher, Rozanne Tuesday Flores, Guo Xuanyang, Paweł Pujszo, Raphaël Weuts

Date:2025-09-07 19:09:12

As artificial intelligence (AI) systems permeate critical sectors, the need for professionals who can address ethical, legal and governance challenges has become urgent. Current AI ethics education remains fragmented, often siloed by discipline and disconnected from practice. This paper synthesizes literature and regulatory developments to propose a modular, interdisciplinary curriculum that integrates technical foundations with ethics, law and policy. We highlight recurring operational failures in AI - bias, misspecified objectives, generalization errors, misuse and governance breakdowns - and link them to pedagogical strategies for teaching AI governance. Drawing on perspectives from the EU, China and international frameworks, we outline a semester plan that emphasizes integrated ethics, stakeholder engagement and experiential learning. The curriculum aims to prepare students to diagnose risks, navigate regulation and engage diverse stakeholders, fostering adaptive and ethically grounded professionals for responsible AI governance.

Box Allocation Optimization in Meal Kit Delivery

Authors:Thi Minh Thu Nguyen, Loic Genest, Alain Zemkoho

Date:2025-09-07 17:57:17

This study introduces the Box Allocation Problem (BAP), a novel optimization challenge in the $1.4 billion UK meal kit delivery market. BAP involves assigning orders across multiple production facilities to minimize daily recipe variations while adhering to capacity and eligibility constraints over a 15-day planning horizon. We formulate BAP as a mixed-integer linear programming (MILP) problem and systematically compare the performance of the COIN-OR Branch and Cut (CBC) solver with heuristic methods, including Tabu Search and Iterative Targeted Pairwise Swap. Scalability experiment on instances with up to 100,000 orders show that CBC consistently achieves optimal solutions in under two minutes, maintaining optimality even under dynamic conditions with fluctuating factory capacities and changing customer orders. By reducing day-to-day recipe discrepancies, this approach supports more accurate ingredient forecasting, decreases food waste, and improves operational efficiency across multi-factory network. These results provide the first comprehensive solution framework for temporal allocation problems in meal kit delivery operations.

The Optical and Mechanical Design of POEMMA Balloon with Radio

Authors:Eric Mayotte, Austin Cummings, Paul Degarate, Neville DeWitt Pierrat, Johannes Eser, William Finch, Julia Burton-Heibges, Tobias Heibges, Eric Mentzell, Stephan Meyer, Conrad Shay, Benjamin Stillwell, Yoshiyuki Takizawa, Luke Wanner, Lawrence Wiencke

Date:2025-09-07 17:15:22

POEMMA Balloon with Radio (PBR) is a NASA super-pressure balloon mission building toward the proposed Probe Of Extreme Multi-Messenger Astrophysics (POEMMA) dual satellite mission. In its planned 2027 launch, PBR will study Ultra-High-Energy Cosmic Rays, Neutrinos, and High-Altitude Horizontal Airshowers from 33 km above the Earth. By operating at balloon altitudes, PBR will provide a novel vantage point to study air-shower physics while offering competitive instantaneous exposure to neutrinos from transient astrophysical phenomena. The payload's optical instrument is a 0.95 m$^2$ aperture hybrid Schmidt telescope with a 3.81 m$^2$ segmented mirror focusing light onto a Fluorescence Camera and a bi-focalized Cherenkov Camera. The payload will also feature a Radio Instrument consisting of two sinuous antennas based on the Payload for Ultrahigh Energy Observations (PUEO) low-frequency instrument. A combined gamma ray/x-ray detector and IR cloud camera round out the instrumentation package, meaning PBR will be the first multi-hybrid balloon-borne multi-messenger observatory flown. This extensive instrumentation must be combined into a radio quiet payload that satisfies the scientific needs and can operate in near vacuum at extreme temperatures, all while meeting NASA safety requirements and weighing no more than 3000 lbs (1361 kg). Accomplishing these tasks together will mark a significant step toward establishing technological readiness for the POEMMA satellite mission. We present an overview of PBR's mechanical and optical systems, additionally detailing our strategies to mitigate electromagnetic interference for the radio instrument and prepare for the harsh near-space environment.

The three-dimensional structure of population density in world cities

Authors:Gaëtan Laziou, Rémi Lemoy

Date:2025-09-07 17:13:21

A good understanding of cities is crucial to implement urban planning policies leading to social and economic sustainability and an efficient use of resources. While urban concentration has been associated with both positive and negative effects, echoing debates on compact cities, few studies have documented how density evolves with city size. We fill this gap by investigating how the population density radial structure changes across the urban hierarchy. Our results uncover strong regularities in urban settlements. In terms of density, cities can be seen as exponential cones which evolve homothetically with city population. This rather simple but universal geometric structure of cities provides a new spatial scaling law, which is an important step forward in understanding how cities work and grow. Some deviations can be observed, which mainly oppose dense cities in the developing world and sprawled cities in high-income countries, associated with high energy use per capita. This suggests that urban lifestyle in wealthiest countries has come at the price of negative impacts on environmental outcomes. This research has a broad range of applications as it provides a powerful tool to compare cities of different sizes.

SpecSwin3D: Generating Hyperspectral Imagery from Multispectral Data via Transformer Networks

Authors:Tang Sui, Songxi Yang, Qunying Huang

Date:2025-09-07 16:18:31

Multispectral and hyperspectral imagery are widely used in agriculture, environmental monitoring, and urban planning due to their complementary spatial and spectral characteristics. A fundamental trade-off persists: multispectral imagery offers high spatial but limited spectral resolution, while hyperspectral imagery provides rich spectra at lower spatial resolution. Prior hyperspectral generation approaches (e.g., pan-sharpening variants, matrix factorization, CNNs) often struggle to jointly preserve spatial detail and spectral fidelity. In response, we propose SpecSwin3D, a transformer-based model that generates hyperspectral imagery from multispectral inputs while preserving both spatial and spectral quality. Specifically, SpecSwin3D takes five multispectral bands as input and reconstructs 224 hyperspectral bands at the same spatial resolution. In addition, we observe that reconstruction errors grow for hyperspectral bands spectrally distant from the input bands. To address this, we introduce a cascade training strategy that progressively expands the spectral range to stabilize learning and improve fidelity. Moreover, we design an optimized band sequence that strategically repeats and orders the five selected multispectral bands to better capture pairwise relations within a 3D shifted-window transformer framework. Quantitatively, our model achieves a PSNR of 35.82 dB, SAM of 2.40{\deg}, and SSIM of 0.96, outperforming the baseline MHF-Net by +5.6 dB in PSNR and reducing ERGAS by more than half. Beyond reconstruction, we further demonstrate the practical value of SpecSwin3D on two downstream tasks, including land use classification and burnt area segmentation.

Hybrid A* Path Planning with Multi-Modal Motion Extension for Four-Wheel Steering Mobile Robots

Authors:Runjiao Bao, Lin Zhang, Tianwei Niu, Haoyu Yuan, Shoukun Wang

Date:2025-09-07 15:55:12

Four-wheel independent steering (4WIS) systems provide mobile robots with a rich set of motion modes, such as Ackermann steering, lateral steering, and parallel movement, offering superior maneuverability in constrained environments. However, existing path planning methods generally assume a single kinematic model and thus fail to fully exploit the multi-modal capabilities of 4WIS platforms. To address this limitation, we propose an extended Hybrid A* framework that operates in a four-dimensional state space incorporating both spatial states and motion modes. Within this framework, we design multi-modal Reeds-Shepp curves tailored to the distinct kinematic constraints of each motion mode, develop an enhanced heuristic function that accounts for mode-switching costs, and introduce a terminal connection strategy with intelligent mode selection to ensure smooth transitions between different steering patterns. The proposed planner enables seamless integration of multiple motion modalities within a single path, significantly improving flexibility and adaptability in complex environments. Results demonstrate significantly improved planning performance for 4WIS robots in complex environments.

Advancing Resource Extraction Systems in Martian Volcanic Terrain: Rover Design, Power Consumption and Hazard Analysis

Authors:Divij Gupta, Arkajit Aich

Date:2025-09-07 15:34:19

This study proposes a schematic plan for in-situ resource utilization (ISRU) in Martian volcanic terrains. The work investigated the complexity of volcanic terrains and Martian environmental hazards and suggested comprehensive engineering strategies to overcome the odds and establish a successful mining program in Martian volcanic regions. Slope stabilization methods - such as terracing and anchored drilling rigs - with terrain-adaptive rovers capable of autonomous operations on steep unstable slopes has been suggested as feasible solutions to navigate the complex geological terrains of Martian volcanoes. The mid range rover design with a mass of approximately 2.1 t, proposed here for mining operations, incorporates a six-wheel rocker-bogie suspension, anchoring-enabled drilling arm, dust-mitigation solar arrays, and advanced sensing systems for hazard detection and navigation. A comparative analysis regarding choice of roads and rails for building transport infrastructure has also been performed. We have also looked into the energy requirement of the rover to work under extreme environmental conditions of Mars and suggested a combination of solar and nuclear power to account for the huge energy requirements of sustained operations on Mars. The results demonstrate that mission success in these environments depends on integrating mechanical resilience, environmental adaptability, and operational autonomy, enabling sustainable access to resources in one of Mars' most geologically challenging settings.

Asymmetry Vulnerability and Physical Attacks on Online Map Construction for Autonomous Driving

Authors:Yang Lou, Haibo Hu, Qun Song, Qian Xu, Yi Zhu, Rui Tan, Wei-Bin Lee, Jianping Wang

Date:2025-09-07 14:26:21

High-definition maps provide precise environmental information essential for prediction and planning in autonomous driving systems. Due to the high cost of labeling and maintenance, recent research has turned to online HD map construction using onboard sensor data, offering wider coverage and more timely updates for autonomous vehicles. However, the robustness of online map construction under adversarial conditions remains underexplored. In this paper, we present a systematic vulnerability analysis of online map construction models, which reveals that these models exhibit an inherent bias toward predicting symmetric road structures. In asymmetric scenes like forks or merges, this bias often causes the model to mistakenly predict a straight boundary that mirrors the opposite side. We demonstrate that this vulnerability persists in the real-world and can be reliably triggered by obstruction or targeted interference. Leveraging this vulnerability, we propose a novel two-stage attack framework capable of manipulating online constructed maps. First, our method identifies vulnerable asymmetric scenes along the victim AV's potential route. Then, we optimize the location and pattern of camera-blinding attacks and adversarial patch attacks. Evaluations on a public AD dataset demonstrate that our attacks can degrade mapping accuracy by up to 9.9%, render up to 44% of targeted routes unreachable, and increase unsafe planned trajectory rates, colliding with real-world road boundaries, by up to 27%. These attacks are also validated on a real-world testbed vehicle. We further analyze root causes of the symmetry bias, attributing them to training data imbalance, model architecture, and map element representation. To the best of our knowledge, this study presents the first vulnerability assessment of online map construction models and introduces the first digital and physical attack against them.

Energy-Efficient Path Planning with Multi-Location Object Pickup for Mobile Robots on Uneven Terrain

Authors:Faiza Babakano, Ahmed Fahmin, Bojie Shen, Muhammad Aamir Cheema, Isma Farah Siddiqui

Date:2025-09-07 13:57:43

Autonomous Mobile Robots (AMRs) operate on battery power, making energy efficiency a critical consideration, particularly in outdoor environments where terrain variations affect energy consumption. While prior research has primarily focused on computing energy-efficient paths from a source to a destination, these approaches often overlook practical scenarios where a robot needs to pick up an object en route - an action that can significantly impact energy consumption due to changes in payload. This paper introduces the Object-Pickup Minimum Energy Path Problem (OMEPP), which addresses energy-efficient route planning for AMRs required to pick up an object from one of many possible locations and deliver it to a destination. To address OMEPP, we first introduce a baseline algorithm that employs the Z star algorithm, a variant of A star tailored for energy-efficient routing, to iteratively visit each pickup point. While this approach guarantees optimality, it suffers from high computational cost due to repeated searches at each pickup location. To mitigate this inefficiency, we propose a concurrent PCPD search that manages multiple Z star searches simultaneously across all pickup points. Central to our solution is the Payload-Constrained Path Database (PCPD), an extension of the Compressed Path Database (CPD) that incorporates payload constraints. We demonstrate that PCPD significantly reduces branching factors during search, improving overall performance. Although the concurrent PCPD search may produce slightly suboptimal solutions, extensive experiments on real-world datasets show it achieves near-optimal performance while being one to two orders of magnitude faster than the baseline algorithm.

Khana: A Comprehensive Indian Cuisine Dataset

Authors:Omkar Prabhu

Date:2025-09-07 10:43:29

As global interest in diverse culinary experiences grows, food image models are essential for improving food-related applications by enabling accurate food recognition, recipe suggestions, dietary tracking, and automated meal planning. Despite the abundance of food datasets, a noticeable gap remains in capturing the nuances of Indian cuisine due to its vast regional diversity, complex preparations, and the lack of comprehensive labeled datasets that cover its full breadth. Through this exploration, we uncover Khana, a new benchmark dataset for food image classification, segmentation, and retrieval of dishes from Indian cuisine. Khana fills the gap by establishing a taxonomy of Indian cuisine and offering around 131K images in the dataset spread across 80 labels, each with a resolution of 500x500 pixels. This paper describes the dataset creation process and evaluates state-of-the-art models on classification, segmentation, and retrieval as baselines. Khana bridges the gap between research and development by providing a comprehensive and challenging benchmark for researchers while also serving as a valuable resource for developers creating real-world applications that leverage the rich tapestry of Indian cuisine. Webpage: https://khana.omkar.xyz

Integrated charging scheduling for electric buses with time-of-use tariffs, peak power, V2G, battery ageing, and renewables

Authors:Louise Caustur, Penelope Hertoghe, Tai-Yu Ma, Martina Vandebroek

Date:2025-09-07 06:08:13

The rapid electrification of city bus fleets offers significant environmental benefits, including reduced greenhouse gas emissions and air pollution. However, it also introduces complex challenges in energy management and infrastructure planning for public transport operators (PTOs). This study develops a novel mixed-integer linear programming (MILP) approach to minimize daily operational costs for electric bus (EB) networks. The model integrates on-site photovoltaic (PV) generation, energy storage systems (ESS), and Vehicle-to-Grid (V2G) capabilities, while explicitly accounting for dynamic electricity tariffs, peak demand charges, and battery degradation costs. A discrete-event optimization (DEO) scheme is employed to balance computational efficiency with operational accuracy. The framework is applied to a real-world case in Brussels involving 28 articulated electric buses (EBs) and 232 trips over a 24-hour horizon. A scenario-based analysis is conducted to evaluate the impacts of the extended components. Key findings show that incorporating demand charges into the optimization reduces daily costs by 5% and decreases the share of peak power costs by 9%, underlining the importance of load management. Integrating PV and ESS leads to a total net cost reduction of up to 56%, with ESS primarily used for energy arbitrage rather than direct bus charging. V2G participation is highly sensitive to battery degradation costs and policy incentives: it can become economically viable under high tariff margins and decreased replacement costs. When all extensions are combined, the model achieves a 58% reduction in total operational expenses compared to the baseline, highlighting the substantial value of smart (dis)charging optimization tools for PTOs.

What is the Best Way to Do Something? A Discreet Tour of Discrete Optimization

Authors:Thiago Serra

Date:2025-09-07 05:41:56

In mathematical optimization, we want to find the best possible solution for a decision-making problem. Curiously, these problems are harder to solve if they have discrete decisions. Imagine that you would like to buy chocolate: you can buy no chocolate or one chocolate bar, but typically you cannot buy just half of a bar. Now imagine that you could also buy many other items, and that you need to meet nutritional needs while minimizing the grocery bill. With more options and more demands, finding the best solution becomes trickier. But since many real-world settings benefit from mathematical optimization, such as scheduling trains and flights, planning truck deliveries, and making better investment decisions, these problems are widely studied in a branch of mathematics called Operations Research (OR). Sometimes we can simply write the mathematical model and find an optimal solution with OR software, but for larger problems we may need to develop new mathematical models and even write our own algorithms. We explore both cases with a simple and well-known problem (the traveling salesperson problem), some computer programming (in Python), and software that is free for academic use (Gurobi).

Beyond ATE: Multi-Criteria Design for A/B Testing

Authors:Jiachun Li, Kaining Shi, David Simchi-Levi

Date:2025-09-06 23:42:22

A/B testing is a widely adopted methodology for estimating conditional average treatment effects (CATEs) in both clinical trials and online platforms. While most existing research has focused primarily on maximizing estimation accuracy, practical applications must also account for additional objectives-most notably welfare or revenue loss. In many settings, it is critical to administer treatments that improve patient outcomes or to implement plans that generate greater revenue from customers. Within a machine learning framework, such objectives are naturally captured through the notion of cumulative regret. In this paper, we investigate the fundamental trade-off between social welfare loss and statistical accuracy in (adaptive) experiments with heterogeneous treatment effects. We establish matching upper and lower bounds for the resulting multi-objective optimization problem and employ the concept of Pareto optimality to characterize the necessary and sufficient conditions for optimal experimental designs. Beyond estimating CATEs, practitioners often aim to deploy treatment policies that maximize welfare across the entire population. We demonstrate that our Pareto-optimal adaptive design achieves optimal post-experiment welfare, irrespective of the in-experiment trade-off between accuracy and welfare. Furthermore, since clinical and commercial data are often highly sensitive, it is essential to incorporate robust privacy guarantees into any treatment-allocation mechanism. To this end, we develop differentially private algorithms that continue to achieve our established lower bounds, showing that privacy can be attained at negligible cost.

A*-PRM: A Dynamic Weight-Based Probabilistic Roadmap Algorithm

Authors:Siyuan Wang, Shuyi Zhang, Zhen Tian, Yuheng Yao, Gongsen Wang, Yu Zhao

Date:2025-09-06 12:33:10

Robot path planning is a fundamental challenge in enhancing the environmental adaptability of autonomous navigation systems. This paper presents a hybrid path planning algorithm, A-star PRM, which incorporates dynamic weights. By embedding the Manhattan distance heuristic of the A-star algorithm into the random sampling process of PRM, the algorithm achieves a balanced optimization of path quality and computational efficiency. The approach uses a hierarchical sampling strategy and a dynamic connection mechanism, greatly improving adaptability to complex obstacle distributions. Experiments show that under a baseline configuration with one thousand sampled vertices, the path length of A-star PRM is 1073.23 plus or minus 14.8 meters and is 42.3 percent shorter than that of PRM with p value less than 0.01. With high-density sampling using three thousand vertices, the path length is reduced by 0.94 percent, 1036.61 meters compared with 1046.42 meters, while the increase in computational time is cut to about one tenth of the PRM increase, 71 percent compared with 785 percent. These results confirm the comprehensive advantages of A-star PRM in path quality, stability, and computational efficiency. Compared with existing hybrid algorithms, the proposed method shows clear benefits, especially in narrow channels and scenarios with dynamic obstacles.

OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision

Authors:Ruixun Liu, Lingyu Kong, Derun Li, Hang Zhao

Date:2025-09-06 03:47:21

Multimodal large language models (MLLMs) have shown strong vision-language reasoning abilities but still lack robust 3D spatial understanding, which is critical for autonomous driving. This limitation stems from two key challenges: (1) the difficulty of constructing accessible yet effective 3D representations without expensive manual annotations, and (2) the loss of fine-grained spatial details in VLMs due to the absence of large-scale 3D vision-language pretraining. To address these challenges, we propose OccVLA, a novel framework that integrates 3D occupancy representations into a unified multimodal reasoning process. Unlike prior approaches that rely on explicit 3D inputs, OccVLA treats dense 3D occupancy as both a predictive output and a supervisory signal, enabling the model to learn fine-grained spatial structures directly from 2D visual inputs. The occupancy predictions are regarded as implicit reasoning processes and can be skipped during inference without performance degradation, thereby adding no extra computational overhead. OccVLA achieves state-of-the-art results on the nuScenes benchmark for trajectory planning and demonstrates superior performance on 3D visual question-answering tasks, offering a scalable, interpretable, and fully vision-based solution for autonomous driving.