High-definition (HD) maps are crucial to autonomous driving, providing structured representations of road elements to support navigation and planning. However, existing query-based methods often employ random query initialization and depend on implicit temporal modeling, which lead to temporal inconsistencies and instabilities during the construction of a global map. To overcome these challenges, we introduce a novel end-to-end framework for consistent online HD vectorized map construction, which jointly performs map instance tracking and short-term prediction. First, we propose a Semantic-Aware Query Generator that initializes queries with spatially aligned semantic masks to capture scene-level context globally. Next, we design a History Rasterized Map Memory to store fine-grained instance-level maps for each tracked instance, enabling explicit historical priors. A History-Map Guidance Module then integrates rasterized map information into track queries, improving temporal continuity. Finally, we propose a Short-Term Future Guidance module to forecast the immediate motion of map instances based on the stored history trajectories. These predicted future locations serve as hints for tracked instances to further avoid implausible predictions and keep temporal consistency. Extensive experiments on the nuScenes and Argoverse2 datasets demonstrate that our proposed method outperforms state-of-the-art (SOTA) methods with good efficiency.
Measurements of the Pontecorvo-Maki-Nakagawa-Sakata (PMNS) neutrino mixing parameters have entered a precision era, enabling increasingly stringent tests of neutrino oscillations. Within the framework of quantum estimation theory, we investigate whether flavor measurements, the only observables currently accessible experimentally, are optimal for extracting the oscillation parameters. We compute the Quantum Fisher Information (QFI) and the classical Fisher Information (FI) associated with ideal flavor projections for all oscillation parameters, considering accelerator muon (anti)neutrino and reactor electron antineutrino beams propagating in vacuum. Two main results emerge. First, flavor measurements saturate the QFI at the first oscillation maximum for $θ_{13}$, $θ_{23}$, and $θ_{12}$, demonstrating their information-theoretic optimality for these parameters. In contrast, they are far from optimal for $δ_{CP}$. In particular, only a small fraction of the available information on $δ_{CP}$ is extracted at the first maximum; the sensitivity improves at the second maximum, in line with the strategy of ESS$ν$SB, a planned facility. Second, the QFI associated with $δ_{CP}$ is approximately one order of magnitude smaller than that of the mixing angles, indicating that the neutrino state intrinsically encodes less information about CP violation. Nevertheless, this quantum bound lies well below current experimental uncertainties, implying that the present precision on $δ_{CP}$ is not fundamentally limited. Our results provide a quantitative framework to disentangle fundamental from practical limitations and establish a benchmark for optimizing future neutrino facilities.
Reactive motion generation in dynamic and unstructured scenarios is typically subject to essentially static perception and system dynamics. Reliably modeling dynamic obstacles and optimizing collision-free trajectories under perceptive and control uncertainty are challenging. This article focuses on revealing tight connection between reactive planning and dynamic mapping for manipulators from a model-based perspective. To enable efficient particle-based perception with expressively dynamic property, we present a tensorized particle weight update scheme that explicitly maintains obstacle velocities and covariance meanwhile. Building upon this dynamic representation, we propose an obstacle-aware MPPI-based planning formulation that jointly propagates robot-obstacle dynamics, allowing future system motion to be predicted and evaluated under uncertainty. The model predictive method is shown to significantly improve safety and reactivity with dynamic surroundings. By applying our complete framework in simulated and noisy real-world environments, we demonstrate that explicit modeling of robot-obstacle dynamics consistently enhances performance over state-of-the-art MPPI-based perception-planning baselines avoiding multiple static and dynamic obstacles.
ATUS, the Astronomical Telescope of the University of Stuttgart, is a fully remote-controlled 0.6 m f/8.17 Ritchey-Chrétien telescope optimized for high-cadence, high-fidelity photometry of transient sources. Observations are time-referenced with very high accuracy and precision, making it an ideal platform for time-domain astronomy and space situational awareness. Initially conceived to support instrument developments and operations of SOFIA, the Stratospheric Observatory for Infrared Astronomy, it evolved into a scientific instrument for various use cases in instrument development, astronomical research, and teaching. This paper presents an overview of its development and optimization to achieve diffraction-limited images and highly accurate pointing and tracking, even at high speeds. The findings and lessons learned are universally applicable to other telescopes that are currently at the planning stage, or where similar issues might be encountered.
Multi-objective optimisation using BRIGHT has proven insightful and effective in prostate cancer brachytherapy treatment planning. BRachytherapy via artificially Intelligent GOMEA-Heuristic based Treatment planning (BRIGHT) generates multiple treatment plans, each with a different trade-off between tumour coverage and organs-at-risk sparing. BRIGHT was recently extended to cervical cancer brachytherapy. In this study, we present a novel, custom-developed graphical user interface (GUI) that enables plan navigation, pairwise comparisons, dose distribution visualisation, and possibility for adjustments - essential for efficient clinical use of BRIGHT. End-user validation of BRIGHT with the dedicated GUI was conducted for cervical cancer brachytherapy by emulating clinical practice in ten previously treated patients. A multidisciplinary brachytherapy team used BRIGHT to create new treatment plans. GUI usability was assessed using the System Usability Scale (SUS). BRIGHT plan quality was compared to clinical practice via blinded one-on-one comparisons. The GUI offered helpful features for plan navigation and evaluation, giving users quick insight into whether planning aims are achievable and what treatment options are available. The overall SUS score was 83.3, indicating an 'excellent' system. BRIGHT outperformed clinical practice in five out of ten patients regarding the coverage-sparing trade-off and performed equally well in the remaining five. The BRIGHT plan was preferred over the clinical plan in eight out of ten patients, four of which showed clinically relevant differences. The clinical plan was preferred in two patients, neither with clinically relevant differences. In conclusion, BRIGHT, with its dedicated GUI, is a clinically viable and user-friendly tool for treatment planning in cervical cancer brachytherapy.
Existing evaluations of agents with memory typically assess memorization and action in isolation. One class of benchmarks evaluates memorization by testing recall of past conversations or text but fails to capture how memory is used to guide future decisions. Another class focuses on agents acting in single-session tasks without the need for long-term memory. However, in realistic settings, memorization and action are tightly coupled: agents acquire memory while interacting with the environment, and subsequently rely on that memory to solve future tasks. To capture this setting, we introduce MemoryArena, a unified evaluation gym for benchmarking agent memory in multi-session Memory-Agent-Environment loops. The benchmark consists of human-crafted agentic tasks with explicitly interdependent subtasks, where agents must learn from earlier actions and feedback by distilling experiences into memory, and subsequently use that memory to guide later actions to solve the overall task. MemoryArena supports evaluation across web navigation, preference-constrained planning, progressive information search, and sequential formal reasoning, and reveals that agents with near-saturated performance on existing long-context memory benchmarks like LoCoMo perform poorly in our agentic setting, exposing a gap in current evaluations for agents with memory.
This note gives a self-contained overview of some important properties of the Gromov-Wasserstein (GW) distance, compared with the standard linear optimal transport (OT) framework. More specifically, I explore the following questions: are GW optimal transport plans sparse? Under what conditions are they supported on a permutation? Do they satisfy a form of cyclical monotonicity? In particular, I present the conditionally negative semi-definite property and show that, when it holds, there are GW optimal plans that are sparse and supported on a permutation.
We propose EasyControlEdge, adapting an image-generation foundation model to edge detection. In real-world edge detection (e.g., floor-plan walls, satellite roads/buildings, and medical organ boundaries), crispness and data efficiency are crucial, yet producing crisp raw edge maps with limited training samples remains challenging. Although image-generation foundation models perform well on many downstream tasks, their pretrained priors for data-efficient transfer and iterative refinement for high-frequency detail preservation remain underexploited for edge detection. To enable crisp and data-efficient edge detection using these capabilities, we introduce an edge-specialized adaptation of image-generation foundation models. To better specialize the foundation model for edge detection, we incorporate an edge-oriented objective with an efficient pixel-space loss. At inference, we introduce guidance based on unconditional dynamics, enabling a single model to control the edge density through a guidance scale. Experiments on BSDS500, NYUDv2, BIPED, and CubiCasa compare against state-of-the-art methods and show consistent gains, particularly under no-post-processing crispness evaluation and with limited training data.
Studies of the electromagnetic production of strange quarks began in the 1950s as something of a curiosity that puzzled experimentalists and theorists alike. As the datasets increased, concomitant advances in theoretical models were realized. A paradigm shift occurred in the 1990s with the development of second-generation facilities at ELSA, MAMI, SPring-8, and JLab, which brought nuclear physics experiments forward by orders of magnitude in counting statistics compared to the first-generation efforts. This was an utter boon to strangeness physics investigations, and to date, more than 50 dedicated experiments in kaon photo- and electroproduction have been completed at facilities around the world, leading to a host of experimental observables that have enabled significant advances in the exploration of strongly interacting systems that decay via $s\bar{s}$ quark pair creation. This review was designed to provide the first-ever in-depth overview of both the experimental and theoretical progress in the field of the electromagnetic production of strangeness. This work looks back over 70 years of past developments, discusses ongoing work and near-term plans, and details future possibilities being considered for third-generation facilities. Throughout this work, the primary impacts of these explorations are highlighted, along with connections to a wide range of related phenomenological applications. An important goal of this review is to provide a complete, self-contained guide into this field prepared at a level that is relevant for both new and seasoned scientists, whether experimentalists, phenomenologists, or theorists, to better understand what has been accomplished by so many dedicated folks-each building on what has come before-and to appreciate the exciting future potential for continued studies in this area. A more complete abstract is provided in the paper.
Money-back guarantees (MBGs) are features of pooled retirement income products that address bequest concerns by ensuring the initial premium is returned through lifetime payments or, upon early death, as a death benefit to the estate. This paper studies optimal retirement decumulation in an individual tontine account with an MBG overlay under international diversification and systematic longevity risk. The retiree chooses withdrawals and asset allocation dynamically to trade off expected total withdrawals (EW) against the Conditional Value-at-Risk (CVaR) of terminal wealth, subject to realistic investment constraints. The optimization is solved under a plan-to-live convention, while stochastic mortality affects outcomes through its impact on mortality credits at the pool level. We develop a neural-network based computational approach for the resulting high-dimensional, constrained control problem. The MBG is priced ex post under the induced EW--CVaR optimal policy via a simulation-based actuarial rule that combines expected guarantee costs with a prudential tail buffer. Using long-horizon historical return data expressed in real domestic-currency terms, we find that international diversification and longevity pooling jointly deliver the largest improvements in the EW--CVaR trade-off, while stochastic mortality shifts the frontier modestly in the expected direction. The optimal controls use foreign equity primarily as a state-dependent catch-up instrument, and implied MBG loads are driven mainly by tail outcomes (and the chosen prudential buffer) rather than by mean payouts.
The fate of cities under natural hazards depends not only on hazard intensity but also on the coupling of structural damage, a collective process that remains poorly understood. Here we show that urban structural damage exhibits phase-transition phenomena. As hazard intensity increases, the system can shift abruptly from a largely safe to a largely damaged state, analogous to a first-order phase transition in statistical physics. Higher diversity in the building portfolio smooths this transition, but multiscale damage clustering traps the system in an extended critical-like regime (analogous to a Griffiths phase), suppressing the emergence of a more predictable disordered (Gaussian) phase. These phenomenological patterns are characterized by a random-field Ising model, with the external field, disorder strength, and temperature interpreted as the effective hazard demand, structural diversity, and modeling uncertainty, respectively. Applying this framework to real urban inventories reveals that widely used engineering modeling practices can shift urban damage patterns between synchronized and volatile regimes, systematically biasing exceedance-based risk metrics by up to 50% under moderate earthquakes ($M_w \approx 5.5$--$6.0$), equivalent to a several-fold gap in repair costs. This phase-aware description turns the collective behavior of civil infrastructure damage into actionable diagnostics for urban risk assessment and planning.
In this work, we present an unmanned aerial vehicle (UAV) wireless dataset collected as part of the AERPAW Autonomous Aerial Data Mule (AADM) challenge, organized by the NSF Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) project. The AADM challenge was the second competition in which an autonomous UAV acted as a data mule, where the UAV downloaded data from multiple base stations (BSs) in a dynamic wireless environment. Participating teams designed flight control and decision-making algorithms for choosing which BSs to communicate with and how to plan flight trajectories to maximize data download within a mission completion time. The competition was conducted in two stages: Stage 1 involved development and experimentation using a digital twin (DT) environment, and in Stage 2, the final test run was conducted on the outdoor testbed. The total score for each team was compiled from both stages. The resulting dataset includes link quality and data download measurements, both in DT and physical environments. Along with the USRP measurements used in the contest, the dataset also includes UAV telemetry, Keysight RF sensors position estimates, link quality measurements from LoRa receivers, and Fortem radar measurements. It supports reproducible research on autonomous UAV networking, multi-cell association and scheduling, air-to-ground propagation modeling, DT-to-real-world transfer learning, and integrated sensing and communication, which serves as a benchmark for future autonomous wireless experimentation.
Autonomous vehicles (AVs) are being increasingly deployed in urban environments. In order to operate safely and reliably, AVs need to account for the inherent uncertainty associated with perceiving the world through sensor data and incorporate that into their decision-making process. Uncertainty-aware planners have recently been developed to account for upstream perception and prediction uncertainty. However, such planners may be sensitive to prediction uncertainty miscalibration, the magnitude of which has not yet been characterized. Towards this end, we perform a detailed analysis on the impact that perceptual uncertainty propagation and calibration has on perception-based motion planning. We do so by comparing two novel prediction-planning pipelines with varying levels of uncertainty propagation on the recently-released nuPlan planning benchmark. We study the impact of upstream uncertainty calibration using closed-loop evaluation on the nuPlan challenge scenarios. We find that the method incorporating upstream uncertainty propagation demonstrates superior generalization to complex closed-loop scenarios.
Endoscopy is essential in medical imaging, used for diagnosis, prognosis and treatment. Developing a robust dynamic 3D reconstruction pipeline for endoscopic videos could enhance visualization, improve diagnostic accuracy, aid in treatment planning, and guide surgery procedures. However, challenges arise due to the deformable nature of the tissues, the use of monocular cameras, illumination changes, occlusions and unknown camera trajectories. Inspired by neural rendering, we introduce NeRFscopy, a self-supervised pipeline for novel view synthesis and 3D reconstruction of deformable endoscopic tissues from a monocular video. NeRFscopy includes a deformable model with a canonical radiance field and a time-dependent deformation field parameterized by SE(3) transformations. In addition, the color images are efficiently exploited by introducing sophisticated terms to learn a 3D implicit model without assuming any template or pre-trained model, solely from data. NeRFscopy achieves accurate results in terms of novel view synthesis, outperforming competing methods across various challenging endoscopy scenes.
This paper presents the first demonstration of a viable, ultra-fast, radiation-hard machine learning (ML) application on FPGAs, which could be used in future high-energy physics experiments. We present a three-fold contribution, with the PicoCal calorimeter, planned for the LHCb Upgrade II experiment, used as a test case. First, we develop a lightweight autoencoder to compress a 32-sample timing readout, representative of that of the PicoCal, into a two-dimensional latent space. Second, we introduce a systematic, hardware-aware quantization strategy and show that the model can be reduced to 10-bit weights with minimal performance loss. Third, as a barrier to the adoption of on-detector ML is the lack of support for radiation-hard FPGAs in the High-Energy Physics community's standard ML synthesis tool, hls4ml, we develop a new backend for this library. This new back-end enables the automatic translation of ML models into High-Level Synthesis (HLS) projects for the Microchip PolarFire family of FPGAs, one of the few commercially available and radiation hard FPGAs. We present the synthesis of the autoencoder on a target PolarFire FPGA, which indicates that a latency of 25 ns can be achieved. We show that the resources utilized are low enough that the model can be placed within the inherently protected logic of the FPGA. Our extension to hls4ml is a significant contribution, paving the way for broader adoption of ML on FPGAs in high-radiation environments.
We present Lifelong Scalable Multi-Agent Realistic Testbed (LSMART), an open-source simulator to evaluate any Multi-Agent Path Finding (MAPF) algorithm in a Fleet Management System (FMS) with Automated Guided Vehicles (AGVs). MAPF aims to move a group of agents from their corresponding starting locations to their goals. Lifelong MAPF (LMAPF) is a variant of MAPF that continuously assigns new goals for agents to reach. LMAPF applications, such as autonomous warehouses, often require a centralized, lifelong system to coordinate the movement of a fleet of robots, typically AGVs. However, existing works on MAPF and LMAPF often assume simplified kinodynamic models, such as pebble motion, as well as perfect execution and communication for AGVs. Prior work has presented SMART, a software capable of evaluating any MAPF algorithms while considering agent kinodynamics, communication delays, and execution uncertainties. However, SMART is designed for MAPF, not LMAPF. Generalizing SMART to an FMS requires many more design choices. First, an FMS parallelizes planning and execution, raising the question of when to plan. Second, given planners with varying optimality and differing agent-model assumptions, one must decide how to plan. Third, when the planner fails to return valid solutions, the system must determine how to recover. In this paper, we first present LSMART, an open-source simulator that incorporates all these considerations to evaluate any MAPF algorithms in an FMS. We then provide experiment results based on state-of-the-art methods for each design choice, offering guidance on how to effectively design centralized lifelong AGV Fleet Management Systems. LSMART is available at https://smart-mapf.github.io/lifelong-smart.
Minimizing volatility and adjustment costs is of central importance in many economic environments, yet it is often complicated by evolving feasibility constraints. We study a decision maker who repeatedly selects an action from a stochastically evolving interval of feasible actions in order to minimize either average adjustment costs or variance. We show that for strictly convex adjustment costs (such as quadratic variation), the optimal decision rule is a reference rule in which the decision maker minimizes the distance to a target action. In general, the optimal target depends both on the previous action and the expectation of future constraints; but for the special case where the constraints follow a random walk, the optimal mechanism is to simply target the previous action. If the decision maker minimizes variance, the optimal policy is also a reference rule, but the target is a constant, which is not necessarily equal to the long-term average action. Compared to mid-point heuristics, these optimal rules may substantially reduce quadratic variation and variance, in natural environments by $50\%$ or more. Applied to stock market auctions, our results provide an explanation for the wide-spread use of reference price rules. We also apply our results to bilateral trade in over-the-counter markets, capacity planning in supply chains, and positioning in political agenda setting.
Carbon-ion radiotherapy provides high dose conformity for lung cancer, but its benefit is limited by two sources of uncertainties: interplay between scanned beam delivery and tumor motion, and dose modulation from heterogeneous lung tissue. This study quantifies the separate and combined dosimetric impact of these effects using the GSI TRiP4D treatment planning system. Eighteen lung cancer 4DCT datasets from TCIA were analyzed. A modulation power ($P_{\mathrm{mod}}$) was assigned to lung voxels. Three values were sampled from a Gaussian distribution ($200μ\mathrm{m} \pm 67μ\mathrm{m}$), and an extreme value of $750μ\mathrm{m}$ was tested. Interplay doses were computed by combining scanned-beam delivery with patient-specific respiratory motion. Four scenarios were studied: static, static with modulation, interplay, and interplay with modulation. Metrics included $D95\%$, $V95\%$, homogeneity index (HI), lung $V16\mathrm{Gy}$, and heart $V20\mathrm{Gy}$. Interplay reduced target coverage by $5.2 \pm 1.5$ pp ($D95\%$), $12.1 \pm 5.9$ pp ($V95\%$), and $8.3 \pm 2.4$ pp (HI). Extreme $P_{\mathrm{mod}}$ alone caused small degradations. When combined with interplay, it partially compensated the loss. This effect decreased with 4D optimization. Fractionation mitigated interplay, leaving lung modulation as the main residual effect.
Autonomous landing of Uncrewed Aerial Vehicles (UAVs) on oscillating marine platforms is severely constrained by wave-induced multi-frequency oscillations, wind disturbances, and prediction phase lags in motion prediction. Existing methods either treat platform motion as a general random process or lack explicit modeling of wave spectral characteristics, leading to suboptimal performance under dynamic sea conditions. To address these limitations, we propose SpecFuse: a novel spectral-temporal fusion predictive control framework that integrates frequency-domain wave decomposition with time-domain recursive state estimation for high-precision 6-DoF motion forecasting of Uncrewed Surface Vehicles (USVs). The framework explicitly models dominant wave harmonics to mitigate phase lags, refining predictions in real time via IMU data without relying on complex calibration. Additionally, we design a hierarchical control architecture featuring a sampling-based HPO-RRT* algorithm for dynamic trajectory planning under non-convex constraints and a learning-augmented predictive controller that fuses data-driven disturbance compensation with optimization-based execution. Extensive validations (2,000 simulations + 8 lake experiments) show our approach achieves a 3.2 cm prediction error, 4.46 cm landing deviation, 98.7% / 87.5% success rates (simulation / real-world), and 82 ms latency on embedded hardware, outperforming state-of-the-art methods by 44%-48% in accuracy. Its robustness to wave-wind coupling disturbances supports critical maritime missions such as search and rescue and environmental monitoring. All code, experimental configurations, and datasets will be released as open-source to facilitate reproducibility.
In June 2026, the UK government will set its carbon budget for the period 2038 to 2042, the seventh such carbon budget (CB7) since the Climate Change Act became law in 2008. For the first time, this carbon budget will be accompanied by a macroeconomic assessment of its impact on growth, employment, inflation and inequality. Researchers from the Institute of New Economic Thinking (INET) Oxford are working in partnership with the Department for Energy Security and Net Zero to deliver this assessment using our data-driven macroeconomic agent-based model (ABM). This extended abstract presents the work in progress towards this pioneering policymaking using our data-driven macroeconomic ABM. We are conducting our work in three work packages. By the time of the workshop, we hope to be able to present preliminary findings from the first two work packages. In WP1, we adapt an existing macro-ABM prototype and build a UK macroeconomic baseline. The main task for this is initialising the model with suitable UK household microdata. We present the options considered and the approach settled upon. In WP2, we conduct preliminary modelling that represents UK decarbonisation as an external shock to financial flows and technical coefficients. In order to present results in time to influence the June 2026 policy decision, this second work package exogenously forces the ABM to follow the CB7 green investment and associated technological change projections provided by the Climate Change Committee. Finally, we will implement more sophisticated social and technological learning packages in WP3, building our own projections of likely decarbonisation pathways that may diverge from UK government plans. For the workshop, we will present the progress of WP1 and WP2.
Vision-language model (VLM) shows promise for high-level planning in smart manufacturing, yet their deployment in dynamic workcells faces two critical challenges: (1) stateless operation, they cannot persistently track out-of-view states, causing world-state drift; and (2) opaque reasoning, failures are difficult to diagnose, leading to costly blind retries. This paper presents VLM-DEWM, a cognitive architecture that decouples VLM reasoning from world-state management through a persistent, queryable Dynamic External World Model (DEWM). Each VLM decision is structured into an Externalizable Reasoning Trace (ERT), comprising action proposal, world belief, and causal assumption, which is validated against DEWM before execution. When failures occur, discrepancy analysis between predicted and observed states enables targeted recovery instead of global replanning. We evaluate VLM-DEWM on multi-station assembly, large-scale facility exploration, and real-robot recovery under induced failures. Compared to baseline memory-augmented VLM systems, VLM DEWM improves state-tracking accuracy from 56% to 93%, increases recovery success rate from below 5% to 95%, and significantly reduces computational overhead through structured memory. These results establish VLM-DEWM as a verifiable and resilient solution for long-horizon robotic operations in dynamic manufacturing environments.
This work presents a fully coupled, multiphysics computational framework for predicting the thermo-chemical material response of thermal protection systems in inductively coupled plasma (ICP) wind tunnels. The framework integrates a high-fidelity Navier-Stokes plasma solver, an electromagnetic field solver, and a discontinuous-Galerkin material response solver using a partitioned coupling strategy. This enables an ab initio, end-to-end simulation of the 350 kW Plasmatron X facility at the University of Illinois Urbana-Champaign (UIUC), including plasma generation, electromagnetic heating, near-wall thermochemistry, and time-accurate material ablation. The model captures key ICP physics such as vortex-mode recirculation, Joule-heating-driven plasma formation, and Lorentz-force-induced flow confinement, and accurately predicts the transition from subsonic to supersonic jet behavior at low pressures. Validation against cold-wall calorimetry and graphite ablation experiments shows that predicted stagnation-point heat fluxes fall well within experimental uncertainty, while fully coupled simulations accurately reproduce measured stagnation temperature histories and recession rates with errors below 12% and 10%, respectively. Remaining discrepancies during early transient heating are attributed to uncertainties in power-coupling efficiency, equilibrium ablation modeling, and material property datasets. Overall, the framework demonstrates strong predictive capability for ICP wind tunnel environments and provides a foundation for improved design, interpretation, and planning of hypersonic material testing campaigns.
A navigable agent needs to understand both high-level semantic instructions and precise spatial perceptions. Building navigation agents centered on Multimodal Large Language Models (MLLMs) demonstrates a promising solution due to their powerful generalization ability. However, the current tightly coupled design dramatically limits system performance. In this work, we propose a decoupled design that separates low-level spatial state estimation from high-level semantic planning. Unlike previous methods that rely on predefined, oversimplified textual maps, we introduce an interactive metric world representation that maintains rich and consistent information, allowing MLLMs to interact with and reason on it for decision-making. Furthermore, counterfactual reasoning is introduced to further elicit MLLMs' capacity, while the metric world representation ensures the physical validity of the produced actions. We conduct comprehensive experiments in both simulated and real-world environments. Our method establishes a new zero-shot state-of-the-art, achieving 48.8\% Success Rate (SR) in R2R-CE and 42.2\% in RxR-CE benchmarks. Furthermore, to validate the versatility of our metric representation, we demonstrate zero-shot sim-to-real transfer across diverse embodiments, including a wheeled TurtleBot 4 and a custom-built aerial drone. These real-world deployments verify that our decoupled framework serves as a robust, domain-invariant interface for embodied Vision-and-Language navigation.
We explore how different types and uses of memory can aid spatial navigation in changing uncertain environments. In the simple foraging task we study, every day, our agent has to find its way from its home, through barriers, to food. Moreover, the world is non-stationary: from day to day, the location of the barriers and food may change, and the agent's sensing such as its location information is uncertain and very limited. Any model construction, such as a map, and use, such as planning, needs to be robust against these challenges, and if any learning is to be useful, it needs to be adequately fast. We look at a range of strategies, from simple to sophisticated, with various uses of memory and learning. We find that an architecture that can incorporate multiple strategies is required to handle (sub)tasks of a different nature, in particular for exploration and search, when food location is not known, and for planning a good path to a remembered (likely) food location. An agent that utilizes non-stationary probability learning techniques to keep updating its (episodic) memories and that uses those memories to build maps and plan on the fly (imperfect maps, i.e. noisy and limited to the agent's experience) can be increasingly and substantially more efficient than the simpler (minimal-memory) agents, as the task difficulties such as distance to goal are raised, as long as the uncertainty, from localization and change, is not too large.
Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarity, and place special emphasis on measuring diversity and feasibility through recall, precision, and the F1 score. Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7\% and precision by 15\%. Additionally, the regularization term further improves diversity and feasibility, reflected in a 10\% increase in recall and 1\% in precision. We assess similarity distributions using a five-metric score. The joint approach performs better overall, and reaches a score of 88.1 compared to 84.6 for the sequential method. Since synthetic populations serve as a key input for ABM, this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
In this paper, we study pooling downstream beds across specialties in a stochastic operating room planning problem. The main sources of uncertainty are stochastic surgical durations and patients' lengths of stay. We developed a two-stage stochastic programming model where in the first stage we decide on 1) the number of non-shared ICU and ward beds to be allocated to each specialty, and 2) the allocation of surgeries to operating rooms during the planning horizon. In the second stage, we decide on 1) the number of shared beds in ICU and wards to be allocated to different specialties on each day during the planning horizon, 2) the surge capacity required to satisfy downstream service to patients, and 3) the overtime incurred in operating rooms. The proposed model aims at minimizing the total cost including the patients' waiting cost, postponement cost, overtime and fixed cost of operating rooms, and the cost of downstream surge capacity. We have implemented the proposed stochastic programming model in a sample average approximation framework. To enhance the efficiency of sample average approximation, we have developed a specialized algorithm that quickly solves the second-stage model for any given first-stage solution for a large number of scenarios. We have carried out extensive computational experiments to evaluate the effectiveness of several pooling policies for downstream beds and also the efficiency of the proposed sample average approximation algorithm. Moreover, we have performed an extensive sensitivity analysis of cost and stochastic parameters. Our results demonstrated that a full-sharing policy among different specialties in the downstream units enhance the functionality of the system by up to 19.53%. Moreover, the results indicated that the solutions obtained by the proposed stochastic model outperform those from the corresponding deterministic problem by 17.43% on average.
Mobile robots operating in agroindustrial environments, such as Mediterranean greenhouses, are subject to challenging conditions, including uneven terrain, variable friction, payload changes, and terrain slopes, all of which significantly affect control performance and stability. Despite the increasing adoption of robotic platforms in agriculture, the lack of standardized, reproducible benchmarks impedes fair comparisons and systematic evaluations of control strategies under realistic operating conditions. This paper presents a comprehensive benchmarking framework for evaluating mobile robot controllers in greenhouse environments. The proposed framework integrates an accurate three dimensional model of the environment, a physics based simulator, and a hierarchical control architecture comprising low, mid, and high level control layers. Three benchmark categories are defined to enable modular assessment, ranging from actuator level control to full autonomous navigation. Additionally, three disturbance scenarios payload variation, terrain type, and slope are explicitly modeled to replicate real world agricultural conditions. To ensure objective and reproducible evaluation, standardized performance metrics are introduced, including the Squared Absolute Error (SAE), the Squared Control Input (SCI), and composite performance indices. Statistical analysis based on repeated trials is employed to mitigate the influence of sensor noise and environmental variability. The framework is further enhanced by a plugin based architecture that facilitates seamless integration of user defined controllers and planners. The proposed benchmark provides a robust and extensible tool for the quantitative comparison of classical, predictive, and planning based control strategies in realistic conditions, bridging the gap between simulation based analysis and real world agroindustrial applications.
The brachistochrone, the curve of fastest descent under gravity, is a cycloid when friction is absent. Underwater, however, buoyancy, viscous drag, and the added mass of entrained fluid fundamentally alter the problem. We formulate and solve the brachistochrone for a body moving through a dense fluid, incorporating all three effects together with a Reynolds-number-dependent drag coefficient. The classical cycloid becomes increasingly suboptimal as the body density approaches the fluid density, and below a critical density ratio it fails to reach the endpoint altogether. Near the critical Reynolds number for the drag crisis, the optimal trajectory is acutely sensitive to the density ratio and object size; constant-drag approximations can yield qualitatively incorrect paths. A decomposition of physical effects shows that neglecting drag and added mass together yields a predicted transit time roughly half the realised minimum, and that omitting added mass alone underestimates the transit time by approximately 20%. We extend the formulation to a three-point brachistochrone in which the trajectory must pass through an intermediate waypoint, revealing a finite reachable domain that is absent in the classical problem. The underwater brachistochrone as presented here provides a simple planning tool for short-range trajectories of buoyancy-driven underwater vehicles.
Supernumerary robotic limbs (SLs) have the potential to transform a wide range of human activities, yet their usability remains limited by key technical challenges, particularly in ensuring safety and achieving versatile control. Here, we address the critical problem of maintaining balance in the human-SLs system, a prerequisite for safe and comfortable augmentation tasks. Unlike previous approaches that developed SLs specifically for stability support, we propose a general framework for preserving balance with SLs designed for generic use. Our hierarchical three-layer architecture consists of: (i) a prediction layer that estimates human trunk and center of mass (CoM) dynamics, (ii) a planning layer that generates optimal CoM trajectories to counteract trunk movements and computes the corresponding SL control inputs, and (iii) a control layer that executes these inputs on the SL hardware. We evaluated the framework with ten participants performing forward and lateral bending tasks. The results show a clear reduction in stance instability, demonstrating the framework's effectiveness in enhancing balance. This work paves the path towards safe and versatile human-SLs interactions. [This paper has been submitted for publication to IEEE.]
Advanced Aerial Mobility (AAM) operations require strategic flight planning services that predict both spatial and temporal uncertainties to safely validate flight plans against hazards such as weather cells, restricted airspaces, and CNS disruption areas. Current uncertainty estimation methods for AAM vehicles rely on conservative linear models due to limited real-world performance data. This paper presents a novel Kalman Filter-based uncertainty propagation method that models AAM Flight Management System (FMS) architectures through sigmoid-blended measurement noise covariance. Unlike existing approaches with fixed uncertainty thresholds, our method continuously adapts the filter's measurement trust based on progress toward waypoints, enabling FMS correction behavior to emerge naturally. The approach scales proportionally with control inputs and is tunable to match specific aircraft characteristics or route conditions. We validate the method using real ADS-B data from general aviation aircraft divided into training and verification sets. Uncertainty propagation parameters were tuned on the training set, achieving 76% accuracy in predicting arrival times when compared against the verification dataset, demonstrating the method's effectiveness for strategic flight plan validation in AAM operations.