planning - 2025-07-24

Online Submission and Evaluation System Design for Competition Operations

Authors:Zhe Chen, Daniel Harabor, Ryan Hechnenberger, Nathan R. Sturtevant

Date:2025-07-23 17:44:10

Research communities have developed benchmark datasets across domains to compare the performance of algorithms and techniques However, tracking the progress in these research areas is not easy, as publications appear in different venues at the same time, and many of them claim to represent the state-of-the-art. To address this, research communities often organise periodic competitions to evaluate the performance of various algorithms and techniques, thereby tracking advancements in the field. However, these competitions pose a significant operational burden. The organisers must manage and evaluate a large volume of submissions. Furthermore, participants typically develop their solutions in diverse environments, leading to compatibility issues during the evaluation of their submissions. This paper presents an online competition system that automates the submission and evaluation process for a competition. The competition system allows organisers to manage large numbers of submissions efficiently, utilising isolated environments to evaluate submissions. This system has already been used successfully for several competitions, including the Grid-Based Pathfinding Competition and the League of Robot Runners competition.

Layout optimization for the LUXE-NPOD experiment

Authors:Melissa Almanza Soto, Oleksandr Borysov, Torben Ferber, Shan Huang, Adrián Irles, Markus Klute, Jesús P. Márquez Hernández, Josep Pérez Segura, Raquel Quishpe, Yotam Soreq, Noam Tal Hod, Nicolò Trevisani

Date:2025-07-23 17:28:15

Beam dump experiments represent an effective way to probe new physics in a parameter space, where new particles have feeble couplings to the Standard Model sector and masses below the GeV scale. The LUXE experiment, designed primarily to study strong-field quantum electrodynamics, can be used also as a photon beam dump experiment with a unique reach for new spin-0 particles in the $10-350~\mathrm{MeV}$ mass and $10^{-6}-10^{-3}~\mathrm{GeV}^{-1}$ couplings to photons ranges. This is achieved via the ``New Physics search with Optical Dump'' (NPOD) concept. While prior estimations were obtained with a simplified model of the experimental setup, in this work we present a systematic study of the new physics reach in the full, realistic experimental apparatus, including an existing detector to be used in the LUXE NPOD context. We furthermore investigate updated scenarios of LUXE's experimental plan and confirm that our results are in agreement with the original estimations of a background-free operation.

Safety Assurance for Quadrotor Kinodynamic Motion Planning

Authors:Theodoros Tavoulareas, Marzia Cescon

Date:2025-07-23 16:42:12

Autonomous drones have gained considerable attention for applications in real-world scenarios, such as search and rescue, inspection, and delivery. As their use becomes ever more pervasive in civilian applications, failure to ensure safe operation can lead to physical damage to the system, environmental pollution, and even loss of human life. Recent work has demonstrated that motion planning techniques effectively generate a collision-free trajectory during navigation. However, these methods, while creating the motion plans, do not inherently consider the safe operational region of the system, leading to potential safety constraints violation during deployment. In this paper, we propose a method that leverages run time safety assurance in a kinodynamic motion planning scheme to satisfy the system's operational constraints. First, we use a sampling-based geometric planner to determine a high-level collision-free path within a user-defined space. Second, we design a low-level safety assurance filter to provide safety guarantees to the control input of a Linear Quadratic Regulator (LQR) designed with the purpose of trajectory tracking. We demonstrate our proposed approach in a restricted 3D simulation environment using a model of the Crazyflie 2.0 drone.

Impact of Medium and Heavy-Duty Electric Vehicle Electrification on Distribution System Stability

Authors:Ali Hassan, Wanshi Hong, Bin Wang, Wencong Su

Date:2025-07-23 16:29:47

Medium and heavy-duty (MHD) commercial vehicles contribute significantly to carbon emissions, accounting for 21\% of the total emissions in the transportation sector. To curb this, U.S. government is increasingly focusing on achieving 100\% fleet electrification over the next decade. However, the integration of megawatt-scale charging stations designed for MHD vehicles poses challenges to the stability of secondary distribution systems. This study investigates the impact of megawatt-scale charging station loads on a benchmark IEEE 33-bus distribution system using real data from the HEVI-LOAD software for MHD electrification planning developed by Lawrence Berkeley National Laboratory (LBNL). The results reveal significant violations of per-unit (p.u.) voltage values at various nodes of the distribution system, indicating that substantial upgrades to the distribution infrastructure will be necessary to accommodate the projected MHDEV charging loads and meet electrification targets.

PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

Authors:Maciej K. Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt

Date:2025-07-23 15:28:23

While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We demonstrate through comprehensive experiments that PRIX achieves state-of-the-art performance on the NavSim and nuScenes benchmarks, matching the capabilities of larger, multimodal diffusion planners while being significantly more efficient in terms of inference speed and model size, making it a practical solution for real-world deployment. Our work is open-source and the code will be at https://maxiuw.github.io/prix.

A Joint Planning Model for Fixed and Mobile Electric Vehicle Charging Stations Considering Flexible Capacity Strategy

Authors:Zhe Yu, Xue Hu, Qin Wang

Date:2025-07-23 15:22:04

The widespread adoption of electric vehicles (EVs) has significantly increased demand on both transportation and power systems, posing challenges to their stable operation. To support the growing need for EV charging, both fixed charging stations (FCSs) and mobile charging stations (MCSs) have been introduced, serving as key interfaces between the power grid and traffic network. Recognizing the importance of collaborative planning across these sectors, this paper presents a two-stage joint planning model for FCSs and MCSs, utilizing an improved alternating direction method of multipliers (ADMM) algorithm. The primary goal of the proposed model is to transform the potential negative impacts of large-scale EV integration into positive outcomes, thereby enhancing social welfare through collaboration among multiple stakeholders. In the first stage, we develop a framework for evaluating FCS locations, incorporating assessments of EV hosting capacity and voltage stability. The second stage introduces a joint planning model for FCSs and MCSs, aiming to minimize the overall social costs of the EV charging system while maintaining a reliable power supply. To solve the planning problem, we employ a combination of mixed-integer linear programming, queueing theory, and sequential quadratic programming. The improved ADMM algorithm couples the siting and sizing decisions consistently by introducing coupling constraints, and supports a distributed optimization framework that coordinates the interests of EV users, MCS operators, and distribution system operators. Additionally, a flexible capacity planning strategy that accounts for the multi-period development potential of EVCS is proposed to reduce both the complexity and the investment required for FCS construction. Finally, a case study with comparative experiments demonstrates the effectiveness of the proposed models and solution methods.

Terrain-Aware Adaptation for Two-Dimensional UAV Path Planners

Authors:Kostas Karakontis, Thanos Petsanis, Athanasios Ch. Kapoutsis, Pavlos Ch. Kapoutsis, Elias B. Kosmatopoulos

Date:2025-07-23 13:55:37

Multi-UAV Coverage Path Planning (mCPP) algorithms in popular commercial software typically treat a Region of Interest (RoI) only as a 2D plane, ignoring important3D structure characteristics. This leads to incomplete 3Dreconstructions, especially around occluded or vertical surfaces. In this paper, we propose a modular algorithm that can extend commercial two-dimensional path planners to facilitate terrain-aware planning by adjusting altitude and camera orientations. To demonstrate it, we extend the well-known DARP (Divide Areas for Optimal Multi-Robot Coverage Path Planning) algorithm and produce DARP-3D. We present simulation results in multiple 3D environments and a real-world flight test using DJI hardware. Compared to baseline, our approach consistently captures improved 3D reconstructions, particularly in areas with significant vertical features. An open-source implementation of the algorithm is available here:https://github.com/konskara/TerraPlan

Joint Multi-Target Detection-Tracking in Cognitive Massive MIMO Radar via POMCP

Authors:Imad Bouhou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux

Date:2025-07-23 13:43:29

This correspondence presents a power-aware cognitive radar framework for joint detection and tracking of multiple targets in a massive multiple-input multiple-output (MIMO) radar environment. Building on a previous single-target algorithm based on Partially Observable Monte Carlo Planning (POMCP), we extend it to the multi-target case by assigning each target an independent POMCP tree, enabling scalable and efficient planning. Departing from uniform power allocation-which is often suboptimal with varying signal-to-noise ratios (SNRs)-our approach predicts each target's future angular position and expected received power, based on its estimated range and radar cross-section (RCS). These predictions guide adaptive waveform design via a constrained optimization problem that allocates transmit energy to enhance the detectability of weaker or distant targets, while ensuring sufficient power for high-SNR targets. The reward function in the underlying partially observable Markov decision process (POMDP) is also modified to prioritize accurate spatial and power estimation. Simulations involving multiple targets with different SNRs confirm the effectiveness of our method. The proposed framework for the cognitive radar improves detection probability for low-SNR targets and achieves more accurate tracking compared to approaches using uniform or orthogonal waveforms. These results demonstrate the potential of the POMCP-based framework for adaptive, efficient multi-target radar systems.

IndoorBEV: Joint Detection and Footprint Completion of Objects via Mask-based Prediction in Indoor Scenarios for Bird's-Eye View Perception

Authors:Haichuan Li, Changda Tian, Panos Trahanias, Tomi Westerlund

Date:2025-07-23 12:07:21

Detecting diverse objects within complex indoor 3D point clouds presents significant challenges for robotic perception, particularly with varied object shapes, clutter, and the co-existence of static and dynamic elements where traditional bounding box methods falter. To address these limitations, we propose IndoorBEV, a novel mask-based Bird's-Eye View (BEV) method for indoor mobile robots. In a BEV method, a 3D scene is projected into a 2D BEV grid which handles naturally occlusions and provides a consistent top-down view aiding to distinguish static obstacles from dynamic agents. The obtained 2D BEV results is directly usable to downstream robotic tasks like navigation, motion prediction, and planning. Our architecture utilizes an axis compact encoder and a window-based backbone to extract rich spatial features from this BEV map. A query-based decoder head then employs learned object queries to concurrently predict object classes and instance masks in the BEV space. This mask-centric formulation effectively captures the footprint of both static and dynamic objects regardless of their shape, offering a robust alternative to bounding box regression. We demonstrate the effectiveness of IndoorBEV on a custom indoor dataset featuring diverse object classes including static objects and dynamic elements like robots and miscellaneous items, showcasing its potential for robust indoor scene understanding.

Optimizing Car Resequencing on Mixed-Model Assembly Lines: Algorithm Development and Deployment

Authors:Andreas Karrenbauer, Bernd Kuhn, Kurt Mehlhorn, Paolo Luigi Rinaldi

Date:2025-07-23 11:29:44

The mixed-model assembly line (MMAL) is a production system used in the automobile industry to manufacture different car models on the same conveyor, offering a high degree of product customization and flexibility. However, the MMAL also poses challenges, such as finding optimal sequences of models satisfying multiple constraints and objectives related to production performance, quality, and delivery -- including minimizing the number of color changeovers in the Paint Shop, balancing the workload and setup times on the assembly line, and meeting customer demand and delivery deadlines. We propose a multi-objective algorithm to solve the MMAL resequencing problem under consideration of all these aspects simultaneously. We also present empirical results obtained from recorded event data of the production process over $4$ weeks following the deployment of our algorithm in the Saarlouis plant of Ford-Werke GmbH. We achieved an improvement of the average batch size of about $30\%$ over the old control software translating to a $23\%$ reduction of color changeovers. Moreover, we reduced the spread of cars planned for a specific date by $10\%$, reducing the risk of delays in delivery. We discuss effectiveness and robustness of our algorithm in improving production performance and quality as well as trade-offs and limitations.

DeMo++: Motion Decoupling for Autonomous Driving

Authors:Bozhou Zhang, Nan Song, Xiatian Zhu, Li Zhang

Date:2025-07-23 09:11:25

Motion forecasting and planning are tasked with estimating the trajectories of traffic agents and the ego vehicle, respectively, to ensure the safety and efficiency of autonomous driving systems in dynamically changing environments. State-of-the-art methods typically adopt a one-query-one-trajectory paradigm, where each query corresponds to a unique trajectory for predicting multi-mode trajectories. While this paradigm can produce diverse motion intentions, it often falls short in modeling the intricate spatiotemporal evolution of trajectories, which can lead to collisions or suboptimal outcomes. To overcome this limitation, we propose DeMo++, a framework that decouples motion estimation into two distinct components: holistic motion intentions to capture the diverse potential directions of movement, and fine spatiotemporal states to track the agent's dynamic progress within the scene and enable a self-refinement capability. Further, we introduce a cross-scene trajectory interaction mechanism to explore the relationships between motions in adjacent scenes. This allows DeMo++ to comprehensively model both the diversity of motion intentions and the spatiotemporal evolution of each trajectory. To effectively implement this framework, we developed a hybrid model combining Attention and Mamba. This architecture leverages the strengths of both mechanisms for efficient scene information aggregation and precise trajectory state sequence modeling. Extensive experiments demonstrate that DeMo++ achieves state-of-the-art performance across various benchmarks, including motion forecasting (Argoverse 2 and nuScenes), motion planning (nuPlan), and end-to-end planning (NAVSIM).

EarthLink: Interpreting Climate Signals with Self-Evolving AI Agents

Authors:Zijie Guo, Jiong Wang, Xiaoyu Yue, Wangxu Wei, Zhe Jiang, Wanghan Xu, Ben Fei, Wenlong Zhang, Xinyu Gu, Lijing Cheng, Jing-Jia Luo, Chao Li, Yaqiang Wang, Tao Chen, Wanli Ouyang, Fenghua Ling, Lei Bai

Date:2025-07-23 08:29:25

Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change.

VLA-Touch: Enhancing Vision-Language-Action Models with Dual-Level Tactile Feedback

Authors:Jianxin Bi, Kevin Yuchen Ma, Ce Hao, Mike Zheng Shou, Harold Soh

Date:2025-07-23 07:54:10

Tactile feedback is generally recognized to be crucial for effective interaction with the physical world. However, state-of-the-art Vision-Language-Action (VLA) models lack the ability to interpret and use tactile signals, limiting their effectiveness in contact-rich tasks. Incorporating tactile feedback into these systems is challenging due to the absence of large multi-modal datasets. We present VLA-Touch, an approach that enhances generalist robot policies with tactile sensing \emph{without fine-tuning} the base VLA. Our method introduces two key innovations: (1) a pipeline that leverages a pretrained tactile-language model that provides semantic tactile feedback for high-level task planning, and (2) a diffusion-based controller that refines VLA-generated actions with tactile signals for contact-rich manipulation. Through real-world experiments, we demonstrate that our dual-level integration of tactile feedback improves task planning efficiency while enhancing execution precision. Code is open-sourced at \href{https://github.com/jxbi1010/VLA-Touch}{this URL}.

Investigating State-of-the-Art Planning Strategies for Electric Vehicle Charging Infrastructures in Coupled Transport and Power Networks: A Comprehensive Review

Authors:Jinhao Li, Arlena Chew, Hao Wang

Date:2025-07-23 07:29:06

Electric vehicles (EVs) have emerged as a pivotal solution to reduce greenhouse gas emissions paving a pathway to net zero. As the adoption of EVs continues to grow, countries are proactively formulating systematic plans for nationwide electric vehicle charging infrastructure (EVCI) to keep pace with the accelerating shift towards EVs. This comprehensive review aims to thoroughly examine current global practices in EVCI planning and explore state-of-the-art methodologies for designing EVCI planning strategies. Despite remarkable efforts by influential players in the global EV market, such as China, the United States, and the European Union, the progress in EVCI rollout has been notably slower than anticipated in the rest of the world. This delay can be attributable to three major impediments: inadequate EVCI charging services, low utilization rates of public EVCI facilities, and the non-trivial integration of EVCI into the electric grid. This review dissects the interests of these stakeholders, clarifying their respective roles and expectations in the context of EVCI planning. This review also provides insights into level 1, 2, and 3 chargers with explorations of their applications in different geographical locations for diverse EV charging patterns. Finally, a thorough review of node-based and flow-based approaches to EV planning is presented. The modeling of placing charging stations is broadly categorized into set coverage, maximum coverage, flow-capturing, and flow-refueling location models. In conclusion, this review identifies several research gaps, including the dynamic modeling of EV charging demand and the coordination of vehicle electrification with grid decarbonization. This paper calls for further contributions to bridge these gaps and drive the advancement of EVCI planning.

Lessons from a Big-Bang Integration: Challenges in Edge Computing and Machine Learning

Authors:Alessandro Aneggi, Andrea Janes

Date:2025-07-23 07:16:45

This experience report analyses a one year project focused on building a distributed real-time analytics system using edge computing and machine learning. The project faced critical setbacks due to a big-bang integration approach, where all components developed by multiple geographically dispersed partners were merged at the final stage. The integration effort resulted in only six minutes of system functionality, far below the expected 40 minutes. Through root cause analysis, the study identifies technical and organisational barriers, including poor communication, lack of early integration testing, and resistance to topdown planning. It also considers psychological factors such as a bias toward fully developed components over mockups. The paper advocates for early mock based deployment, robust communication infrastructures, and the adoption of topdown thinking to manage complexity and reduce risk in reactive, distributed projects. These findings underscore the limitations of traditional Agile methods in such contexts and propose simulation-driven engineering and structured integration cycles as key enablers for future success.

A "watch your replay videos" reflection assignment on comparing programming without versus with generative AI: learning about programming, critical AI use and limitations, and reflection

Authors:Sarah "Magz" Fernandez, Greg L Nelson

Date:2025-07-23 05:47:46

Generative AI is disrupting computing education. Most interventions focus on teaching GenAI use rather than helping students understand how AI changes their programming process. We designed and deployed a novel comparative video reflection assignment adapting the Describe, Examine, then Articulate Learning (DEAL) framework. In an introductory software engineering course, students recorded themselves programming during their team project two times: first without, then with using generative AI. Students then analyzed their own videos using a scaffolded set of reflection questions, including on their programming process and human, internet, and AI help-seeking. We conducted a qualitative thematic analysis of the reflections, finding students developed insights about planning, debugging, and help-seeking behaviors that transcended AI use. Students reported learning to slow down and understand before writing or generating code, recognized patterns in their problem-solving approaches, and articulated specific process improvements. Students also learned and reflected on AI limits and downsides, and strategies to use AI more critically, including better prompting but also to benefit their learning instead of just completing tasks. Unexpectedly, the comparative reflection also scaffolded reflection on programming not involving AI use, and even led to students spontaneously setting future goals to adopt video and other regular reflection. This work demonstrates structured reflection on programming session videos can develop metacognitive skills essential for programming with and without generative AI and also lifelong learning in our evolving field.

Optimal Calibrated Signaling in Digital Auctions

Authors:Zhicheng Du, Wei Tang, Zihe Wang, Shuo Zhang

Date:2025-07-23 04:22:22

In digital advertising, online platforms allocate ad impressions through real-time auctions, where advertisers typically rely on autobidding agents to optimize bids on their behalf. Unlike traditional auctions for physical goods, the value of an ad impression is uncertain and depends on the unknown click-through rate (CTR). While platforms can estimate CTRs more accurately using proprietary machine learning algorithms, these estimates/algorithms remain opaque to advertisers. This information asymmetry naturally raises the following questions: how can platforms disclose information in a way that is both credible and revenue-optimal? We address these questions through calibrated signaling, where each prior-free bidder receives a private signal that truthfully reflects the conditional expected CTR of the ad impression. Such signals are trustworthy and allow bidders to form unbiased value estimates, even without access to the platform's internal algorithms. We study the design of platform-optimal calibrated signaling in the context of second-price auction. Our first main result fully characterizes the structure of the optimal calibrated signaling, which can also be computed efficiently. We show that this signaling can extract the full surplus -- or even exceed it -- depending on a specific market condition. Our second main result is an FPTAS for computing an approximately optimal calibrated signaling that satisfies an IR condition. Our main technical contributions are: a reformulation of the platform's problem as a two-stage optimization problem that involves optimal transport subject to calibration feasibility constraints on the bidders' marginal bid distributions; and a novel correlation plan that constructs the optimal distribution over second-highest bids.

Maintenance-free condition monitoring system based on lora

Authors:Honglin Zhang, Mingtong Chen, Zhengbao Yang

Date:2025-07-23 02:42:50

With the rising volume of railroad transportation, the traditional track inspection mainly relies on manual inspection and large-scale inspection equipment, which not only has low inspection frequency and lagging response, but also has the defects of high risk, high cost and easy to miss inspection. To this end, this study designs and realizes a maintenance-free railroad track wireless monitoring system based on LoRa module LM401. Each monitoring node consists of an STM32 microcontroller, an LM401 LoRa transceiver, a low-power ADXL362 triaxial acceleration sensor, a digital temperature sensor (LMT85), and a digital barometric pressure sensor (RSCM17100KP101). The system collects vibration data through the SPI1 interface at the node end, periodically reads the temperature and barometric pressure information, and packages and sends the data to a centralized gateway within a range of 500 m using the LoRa star topology; the gateway then uploads the data in real time to a cloud server through a 4G module, which supports the MQTT protocol. MQTT protocol is supported. Laboratory tests and field deployments show that the system can realize acceleration resolution of 0.01 g, reduce maintenance cost by about 70%, and improve monitoring efficiency by more than 5 times. The system provides a reliable means for intelligent rail health management, and in the future, it is planned to introduce RF energy collection technology to realize automatic wake-up without battery, and expand to urban bridges, tunnels and environmental monitoring and other multi-scenario applications.

Falconry-like palm landing by a flapping-wing drone based on the human gesture interaction and distance-aware flight planning

Authors:Kazuki Numazato, Keiichiro Kan, Masaki Kitagawa, Yunong Li, Johannes Kubel, Moju Zhao

Date:2025-07-23 02:25:03

Flapping-wing drones have attracted significant attention due to their biomimetic flight. They are considered more human-friendly due to their characteristics such as low noise and flexible wings, making them suitable for human-drone interactions. However, few studies have explored the practical interaction between humans and flapping-wing drones. On establishing a physical interaction system with flapping-wing drones, we can acquire inspirations from falconers who guide birds of prey to land on their arms. This interaction interprets the human body as a dynamic landing platform, which can be utilized in various scenarios such as crowded or spatially constrained environments. Thus, in this study, we propose a falconry-like interaction system in which a flapping-wing drone performs a palm landing motion on a human hand. To achieve a safe approach toward humans, we design a trajectory planning method that considers both physical and psychological factors of the human safety such as the drone's velocity and distance from the user. We use a commercial flapping platform with our implemented motion planning and conduct experiments to evaluate the palm landing performance and safety. The results demonstrate that our approach enables safe and smooth hand landing interactions. To the best of our knowledge, it is the first time to achieve a contact-based interaction between flapping-wing drones and humans.

Multi-Objective Trajectory Planning for a Robotic Arm in Curtain Wall Installation

Authors:Xiao Liu, Yunxiao Cheng, Weijun Wang, Tianlun Huang, Zhiyong Wang, Wei Feng

Date:2025-07-23 02:23:34

In the context of labor shortages and rising costs, construction robots are regarded as the key to revolutionizing traditional construction methods and improving efficiency and quality in the construction industry. In order to ensure that construction robots can perform tasks efficiently and accurately in complex construction environments, traditional single-objective trajectory optimization methods are difficult to meet the complex requirements of the changing construction environment. Therefore, we propose a multi-objective trajectory optimization for the robotic arm used in the curtain wall installation. First, we design a robotic arm for curtain wall installation, integrating serial, parallel, and folding arm elements, while considering its physical properties and motion characteristics. In addition, this paper proposes an NSGA-III-FO algorithm (NSGA-III with Focused Operator, NSGA-III-FO) that incorporates a focus operator screening mechanism to accelerate the convergence of the algorithm towards the Pareto front, thereby effectively balancing the multi-objective constraints of construction robots. The proposed algorithm is tested against NSGA-III, MOEA/D, and MSOPS-II in ten consecutive trials on the DTLZ3 and WFG3 test functions, showing significantly better convergence efficiency than the other algorithms. Finally, we conduct two sets of experiments on the designed robotic arm platform, which confirm the efficiency and practicality of the NSGA-III-FO algorithm in solving multi-objective trajectory planning problems for curtain wall installation tasks.

Transient Stability-Driven Planning for the Optimal Sizing of Resilient AC/DC Hybrid Microgrids

Authors:Yi Wang, Goran Strbac

Date:2025-07-23 01:18:31

This paper proposes a transient stability-driven planning framework for the optimal sizing problem of resilient AC/DC hybrid microgrids (HMGs) under different types of contingencies, capturing frequency and voltage stability requirements as well as the frequency-voltage coupling dynamics of AC/DC interlinking converters (ICs). The planning model is formulated into a defender-attacker-defender (DAD) architecture, which can be further merged into two levels, i.e., upper-level and low-level problems, and then iteratively solved by an enhanced genetic algorithm with sparsity calculation and local search. Regarding the operation stage, a novel transient stability-constrained optimal power flow (TSC-OPF) algorithm is proposed for static and transient operations of HMGs, capturing governor dynamics and automatic voltage regulator of conventional generators as well as the droop control dynamics of inverter-based resources (IBRs) for frequency control and voltage control, respectively. Furthermore, a Lyapunov optimisation approach is developed to capture the time-coupling property of energy storages (ESs) and then allow the TSC-OPF to be solved on an hourly basis with a second-scale resolution, achieving the co-optimisation of static and transient stability requirements. Case studies have been conducted to verify the effectiveness of the proposed planning framework in obtaining cost-effective investment decisions for various resources while respecting transient stability requirements under different contingencies.

Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots

Authors:Pablo Valle, Chengjie Lu, Shaukat Ali, Aitor Arrieta

Date:2025-07-22 22:15:59

Visual Language Action (VLA) models are a multi-modal class of Artificial Intelligence (AI) systems that integrate visual perception, natural language understanding, and action planning to enable agents to interpret their environment, comprehend instructions, and perform embodied tasks autonomously. Recently, significant progress has been made to advance this field. These kinds of models are typically evaluated through task success rates, which fail to capture the quality of task execution and the mode's confidence in its decisions. In this paper, we propose eight uncertainty metrics and five quality metrics specifically designed for VLA models for robotic manipulation tasks. We assess their effectiveness through a large-scale empirical study involving 908 successful task executions from three state-of-the-art VLA models across four representative robotic manipulation tasks. Human domain experts manually labeled task quality, allowing us to analyze the correlation between our proposed metrics and expert judgments. The results reveal that several metrics show moderate to strong correlation with human assessments, highlighting their utility for evaluating task quality and model confidence. Furthermore, we found that some of the metrics can discriminate between high-, medium-, and low-quality executions from unsuccessful tasks, which can be interesting when test oracles are not available. Our findings challenge the adequacy of current evaluation practices that rely solely on binary success rates and pave the way for improved real-time monitoring and adaptive enhancement of VLA-enabled robotic systems.

Transformer Based Building Boundary Reconstruction using Attraction Field Maps

Authors:Muhammad Kamran, Mohammad Moein Sheikholeslami, Andreas Wichmann, Gunho Sohn

Date:2025-07-22 21:53:03

In recent years, the number of remote satellites orbiting the Earth has grown significantly, streaming vast amounts of high-resolution visual data to support diverse applications across civil, public, and military domains. Among these applications, the generation and updating of spatial maps of the built environment have become critical due to the extensive coverage and detailed imagery provided by satellites. However, reconstructing spatial maps from satellite imagery is a complex computer vision task, requiring the creation of high-level object representations, such as primitives, to accurately capture the built environment. While the past decade has witnessed remarkable advancements in object detection and representation using visual data, primitives-based object representation remains a persistent challenge in computer vision. Consequently, high-quality spatial maps often rely on labor-intensive and manual processes. This paper introduces a novel deep learning methodology leveraging Graph Convolutional Networks (GCNs) to address these challenges in building footprint reconstruction. The proposed approach enhances performance by incorporating geometric regularity into building boundaries, integrating multi-scale and multi-resolution features, and embedding Attraction Field Maps into the network. These innovations provide a scalable and precise solution for automated building footprint extraction from a single satellite image, paving the way for impactful applications in urban planning, disaster management, and large-scale spatial analysis. Our model, Decoupled-PolyGCN, outperforms existing methods by 6% in AP and 10% in AR, demonstrating its ability to deliver accurate and regularized building footprints across diverse and challenging scenarios.

RAPTAR: Radar Radiation Pattern Acquisition through Automated Collaborative Robotics

Authors:Maaz Qureshi, Mohammad Omid Bagheri, Abdelrahman Elbadrawy, William Melek, George Shaker

Date:2025-07-22 19:52:05

Accurate characterization of modern on-chip antennas remains challenging, as current probe-station techniques offer limited angular coverage, rely on bespoke hardware, and require frequent manual alignment. This research introduces RAPTAR (Radiation Pattern Acquisition through Robotic Automation), a portable, state-of-the-art, and autonomous system based on collaborative robotics. RAPTAR enables 3D radiation-pattern measurement of integrated radar modules without dedicated anechoic facilities. The system is designed to address the challenges of testing radar modules mounted in diverse real-world configurations, including vehicles, UAVs, AR/VR headsets, and biomedical devices, where traditional measurement setups are impractical. A 7-degree-of-freedom Franka cobot holds the receiver probe and performs collision-free manipulation across a hemispherical spatial domain, guided by real-time motion planning and calibration accuracy with RMS error below 0.9 mm. The system achieves an angular resolution upto 2.5 degree and integrates seamlessly with RF instrumentation for near- and far-field power measurements. Experimental scans of a 60 GHz radar module show a mean absolute error of less than 2 dB compared to full-wave electromagnetic simulations ground truth. Benchmarking against baseline method demonstrates 36.5% lower mean absolute error, highlighting RAPTAR accuracy and repeatability.

Vacuum muon decay and interaction with laser pulses

Authors:B. King, D. Liu

Date:2025-07-22 18:00:01

Muons decay in vacuum mainly via the leptonic channel to an electron, an electron neutrino and a muon antineutrino. Previous investigations have concluded that muon decay can only be significantly altered in a strong electromagnetic field when the muonic strong-field parameter is of order unity, which is far beyond the reach of lab-based experiments at current and planned facilities. In this letter, an alternative mechanism is presented in which a laser pulse affects the vacuum decay rate of a muon outside the pulse. Quantum interference between the muon decaying with or without interacting with the pulse generates fringes in the electron momentum spectra and can increase the muon lifetime by up to a factor 2. The required parameters to observe this effect are available in experiments today.

Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Authors:Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum

Date:2025-07-22 16:35:36

Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual CoT performance, which hinders reinforcement learning, and (2) the lack of high-quality visual CoT training data. We introduce $\textbf{Zebra-CoT}$, a diverse large-scale dataset with 182,384 samples, containing logically coherent interleaved text-image reasoning traces. We focus on four categories of tasks where sketching or visual reasoning is especially natural, spanning scientific questions such as geometry, physics, and algorithms; 2D visual reasoning tasks like visual search and jigsaw puzzles; 3D reasoning tasks including 3D multi-hop inference, embodied and robot planning; visual logic problems and strategic games like chess. Fine-tuning the Anole-7B model on the Zebra-CoT training corpus results in an improvement of +12% in our test-set accuracy and yields up to +13% performance gain on standard VLM benchmark evaluations. Fine-tuning Bagel-7B yields a model that generates high-quality interleaved visual reasoning chains, underscoring Zebra-CoT's effectiveness for developing multimodal reasoning abilities. We open-source our dataset and models to support development and evaluation of visual CoT.

Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

Authors:Guowei Lan, Kaixian Qu, René Zurbrügg, Changan Chen, Christopher E. Mower, Haitham Bou-Ammar, Marco Hutter

Date:2025-07-22 15:48:49

Vision-language models (VLMs) have been widely adopted in robotics to enable autonomous planning. However, grounding VLMs, originally trained on internet data, to diverse real-world robots remains a challenge. This paper presents ExpTeach, a framework that grounds VLMs to physical robots by building a self-generated memory of real-world experiences. In ExpTeach, the VLM autonomously plans actions, verifies outcomes, reflects on failures, and adapts robot behaviors in a closed loop. The self-generated experiences during this process are then summarized into a long-term memory, enabling retrieval of learned knowledge to guide future tasks via retrieval-augmented generation (RAG). Additionally, ExpTeach enhances the spatial understanding of VLMs with an on-demand image annotation module. In experiments, we show that reflection improves success rates from 36% to 84% on four challenging robotic tasks and observe the emergence of intelligent object interactions, including creative tool use. Across extensive tests on 12 real-world scenarios (including eight unseen ones), we find that grounding with long-term memory boosts single-trial success rates from 22% to 80%, demonstrating the effectiveness and generalizability of ExpTeach.

Novel Multi-Agent Action Masked Deep Reinforcement Learning for General Industrial Assembly Lines Balancing Problems

Authors:Ali Mohamed Ali, Luca Tirel, Hashim A. Hashim

Date:2025-07-22 14:34:36

Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards, prevent project constraint violations, and achieve cost-effective operations. While exact solutions to such challenges can be obtained through Integer Programming (IP), the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios. Heuristic methods, such as Genetic Algorithms, can also be applied, but they frequently produce suboptimal solutions in extensive cases. This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process (MDP), without imposing assumptions on the type of assembly line a notable distinction from most existing models. The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning (DRL) agents to optimize task and resource scheduling. To enhance the efficiency of agent training, the paper proposes two innovative tools. The first is an action-masking technique, which ensures the agent selects only feasible actions, thereby reducing training time. The second is a multi-agent approach, where each workstation is managed by an individual agent, as a result, the state and action spaces were reduced. A centralized training framework with decentralized execution is adopted, offering a scalable learning architecture for optimizing industrial assembly lines. This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action. The effectiveness of the proposed scheme is validated through numerical simulations, demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach.

Augmenting Von Neumann's Architecture for an Intelligent Future

Authors:Rajpreet Singh, Vidhi Kothari

Date:2025-07-22 14:19:53

This work presents a novel computer architecture that extends the Von Neumann model with a dedicated Reasoning Unit (RU) to enable native artificial general intelligence capabilities. The RU functions as a specialized co-processor that executes symbolic inference, multi-agent coordination, and hybrid symbolic-neural computation as fundamental architectural primitives. This hardware-embedded approach allows autonomous agents to perform goal-directed planning, dynamic knowledge manipulation, and introspective reasoning directly within the computational substrate at system scale. The architecture incorporates a reasoning-specific instruction set architecture, parallel symbolic processing pipelines, agent-aware kernel abstractions, and a unified memory hierarchy that seamlessly integrates cognitive and numerical workloads. Through systematic co-design across hardware, operating system, and agent runtime layers, this architecture establishes a computational foundation where reasoning, learning, and adaptation emerge as intrinsic execution properties rather than software abstractions, potentially enabling the development of general-purpose intelligent machines.

A Robust 5G Terrestrial Positioning System with Sensor Fusion in GNSS-denied Scenarios

Authors:Hamed Talebian, Nazrul Mohamed Nazeer, Darius Chmieliauskas, Jakub Nikonowicz, Mehdi Haghshenas, Łukasz Matuszewski, Mairo Leier, Aamir Mahmood

Date:2025-07-22 13:57:34

This paper presents a terrestrial localization system based on 5G infrastructure as a viable alternative to GNSS, particularly in scenarios where GNSS signals are obstructed or unavailable. It discusses network planning aimed at enabling positioning as a primary service, in contrast to the traditional focus on communication services in terrestrial networks. Building on a network infrastructure optimized for positioning, the paper proposes a system that leverages carrier phase (CP) ranging in combination with trilateration to localize the user within the network when at least three base stations (BSs) provide line-of-sight (LOS) conditions. Achieving accurate CP-based positioning requires addressing three key challenges: integer ambiguity resolution, LOS/NLOS link identification, and localization under obstructed LOS conditions. To this end, the system employs a multi-carrier CP approach, which eliminates the need for explicit integer ambiguity estimation. Additionally, a deep learning model is developed to identify NLOS links and exclude them from the trilateration process. In cases where LOS is obstructed and CP ranging becomes unreliable, the system incorporates an error-state extended Kalman filter to fuse complementary data from other sensors, such as inertial measurement units (IMUs) and cameras. This hybrid approach enables robust tracking of moving users across diverse channel conditions. The performance of the proposed terrestrial positioning system is evaluated using the real-world KITTI dataset, featuring a moving vehicle in an urban environment. Simulation results show that the system can achieve a positioning error of less than 5 meters in the KITTI urban scenario--comparable to that of public commercial GNSS services--highlighting its potential as a resilient and accurate solution for GNSS-denied environments.