Spatial Planning is a crucial part in the field of spatial intelligence, which requires the understanding and planning about object arrangements in space perspective. AI agents with the spatial planning ability can better adapt to various real-world applications, including robotic manipulation, automatic assembly, urban planning etc. Recent works have attempted to construct benchmarks for evaluating the spatial intelligence of Multimodal Large Language Models (MLLMs). Nevertheless, these benchmarks primarily focus on spatial reasoning based on typical Visual Question-Answering (VQA) forms, which suffers from the gap between abstract spatial understanding and concrete task execution. In this work, we take a step further to build a comprehensive benchmark called MineAnyBuild, aiming to evaluate the spatial planning ability of open-world AI agents in the Minecraft game. Specifically, MineAnyBuild requires an agent to generate executable architecture building plans based on the given multi-modal human instructions. It involves 4,000 curated spatial planning tasks and also provides a paradigm for infinitely expandable data collection by utilizing rich player-generated content. MineAnyBuild evaluates spatial planning through four core supporting dimensions: spatial understanding, spatial reasoning, creativity, and spatial commonsense. Based on MineAnyBuild, we perform a comprehensive evaluation for existing MLLM-based agents, revealing the severe limitations but enormous potential in their spatial planning abilities. We believe our MineAnyBuild will open new avenues for the evaluation of spatial intelligence and help promote further development for open-world AI agents capable of spatial planning.
Despite recent advances in multimodal content generation enabled by vision-language models (VLMs), their ability to reason about and generate structured 3D scenes remains largely underexplored. This limitation constrains their utility in spatially grounded tasks such as embodied AI, immersive simulations, and interactive 3D applications. We introduce a new paradigm that enables VLMs to generate, understand, and edit complex 3D environments by injecting a continually evolving spatial context. Constructed from multimodal input, this context consists of three components: a scene portrait that provides a high-level semantic blueprint, a semantically labeled point cloud capturing object-level geometry, and a scene hypergraph that encodes rich spatial relationships, including unary, binary, and higher-order constraints. Together, these components provide the VLM with a structured, geometry-aware working memory that integrates its inherent multimodal reasoning capabilities with structured 3D understanding for effective spatial reasoning. Building on this foundation, we develop an agentic 3D scene generation pipeline in which the VLM iteratively reads from and updates the spatial context. The pipeline features high-quality asset generation with geometric restoration, environment setup with automatic verification, and ergonomic adjustment guided by the scene hypergraph. Experiments show that our framework can handle diverse and challenging inputs, achieving a level of generalization not observed in prior work. Further results demonstrate that injecting spatial context enables VLMs to perform downstream tasks such as interactive scene editing and path planning, suggesting strong potential for spatially intelligent systems in computer graphics, 3D vision, and embodied applications.
This paper employs a game-theoretic approach to analyze investment decisions in Ireland's electricity market. It compares optimal electricity investment strategies among energy generators under a perfect competition framework with an imperfect Nash-Cournot competition. The model incorporates market price based on competition among generators while accounting for the supply capacity of each firm and each technology, along with the System Non-Synchronous Penetration (SNSP) constraint to reflect operational limitations in renewable energy contribution to the power system. Both models are formulated as single-objective function optimization problems. Furthermore, unit commitment constraints are introduced to the perfect competition model, allowing the model to incorporate binary decision variables to capture energy unit scheduling decisions of online status, startup, and shutdown costs. The proposed models are evaluated under three different demand test cases, using Ireland's electricity generation projections for 2023 to 2033. The results highlight key differences in investment decisions, carbon emissions, and the contribution of renewable technologies in perfect and imperfect competition structures. The findings provide managerial insights for policymakers and stakeholders, supporting optimal investment decisions and generation capacity planning to achieve Ireland's long-term energy objectives.
High Speed multi-vehicle Autonomous Racing will increase the safety and performance of road-going Autonomous Vehicles. Precise vehicle detection and dynamics estimation from a moving platform is a key requirement for planning and executing complex autonomous overtaking maneuvers. To address this requirement, we have developed a Latency-Aware EKF-based Multi Target Tracking algorithm fusing LiDAR and RADAR measurements. The algorithm explots the different sensor characteristics by explicitly integrating the Range Rate in the EKF Measurement Function, as well as a-priori knowledge of the racetrack during state prediction. It can handle Out-Of-Sequence Measurements via Reprocessing using a double State and Measurement Buffer, ensuring sensor delay compensation with no information loss. This algorithm has been implemented on Team PoliMOVE's autonomous racecar, and was proved experimentally by completing a number of fully autonomous overtaking maneuvers at speeds up to 275 km/h.
Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases. Code and dataset will be found in https://github.com/Liuxueyi/ReasonPlan.
Urban-induced changes in local microclimate, such as the urban heat island effect and air pollution, are known to vary with city size, leading to distinctive relations between average climate variables and city-scale quantities (e.g., total population or area). However, these approaches suffer from biases related to the choice of city boundaries and they neglect intra-urban variations of urban characteristics. Here we use high-resolution data of urban temperatures, air quality, population counts, and street intersections from 142 cities worldwide and show that their marginal and joint probability distributions follow universal scaling functions. By using a logarithmic relation between urban spatial features and climate variables, we show that average street network properties are sufficient to characterize the entire variability of the temperature and air pollution fields observed within and across cities. We further demonstrate that traditional models linking climate variables to the distance from the city center fail to reproduce the observed distributions unless the stochasticity of urban structure is fully considered. These findings provide a unified statistical framework for characterizing intra-urban climate variability, with important implications for climate modelling and urban planning.
Recent progress in strengthening the capabilities of large language models has stemmed from applying reinforcement learning to domains with automatically verifiable outcomes. A key question is whether we can similarly use RL to optimize for outcomes in domains where evaluating outcomes inherently requires human feedback; for example, in tasks like deep research and trip planning, outcome evaluation is qualitative and there are many possible degrees of success. One attractive and scalable modality for collecting human feedback is preference data: ordinal rankings (pairwise or $k$-wise) that indicate, for $k$ given outcomes, which one is preferred. In this work, we study a critical roadblock: preference data fundamentally and significantly limits outcome-based optimization. Even with idealized preference data (infinite, noiseless, and online), the use of ordinal feedback can prevent obtaining even approximately optimal solutions. We formalize this impossibility using voting theory, drawing an analogy between how a model chooses to answer a query with how voters choose a candidate to elect. This indicates that grounded human scoring and algorithmic innovations are necessary for extending the success of RL post-training to domains demanding human feedback. We also explore why these limitations have disproportionately impacted RLHF when it comes to eliciting reasoning behaviors (e.g., backtracking) versus situations where RLHF has been historically successful (e.g., instruction-tuning and safety training), finding that the limitations of preference data primarily suppress RLHF's ability to elicit robust strategies -- a class that encompasses most reasoning behaviors.
With the recent success of world-model agents, which extend the core idea of model-based reinforcement learning by learning a differentiable model for sample-efficient control across diverse tasks, active inference (AIF) offers a complementary, neuroscience-grounded paradigm that unifies perception, learning, and action within a single probabilistic framework powered by a generative model. Despite this promise, practical AIF agents still rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments requiring plans over long horizons, tens to hundreds of steps. Moreover, most existing agents are evaluated on robotic or vision benchmarks which, while natural for biological agents, fall short of real-world industrial complexity. We address these limitations with a generative-policy architecture featuring (i) a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead, (ii) an integrated policy network that enables the transition and receives gradients of the expected free energy, (iii) an alternating optimization scheme that updates model and policy from a replay buffer, and (iv) a single gradient step that plans over long horizons, eliminating exhaustive planning from the control loop. We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings. The empirical results confirm the effectiveness of the proposed approach, demonstrating the coupled world-model with the AIF formalism yields an end-to-end probabilistic controller capable of effective decision making in delayed, long-horizon settings without handcrafted rewards or expensive planning.
Understanding pain-related facial behaviors is essential for digital healthcare in terms of effective monitoring, assisted diagnostics, and treatment planning, particularly for patients unable to communicate verbally. Existing data-driven methods of detecting pain from facial expressions are limited due to interpretability and severity quantification. To this end, we propose GraphAU-Pain, leveraging a graph-based framework to model facial Action Units (AUs) and their interrelationships for pain intensity estimation. AUs are represented as graph nodes, with co-occurrence relationships as edges, enabling a more expressive depiction of pain-related facial behaviors. By utilizing a relational graph neural network, our framework offers improved interpretability and significant performance gains. Experiments conducted on the publicly available UNBC dataset demonstrate the effectiveness of the GraphAU-Pain, achieving an F1-score of 66.21% and accuracy of 87.61% in pain intensity estimation.
Electrical grid congestion has emerged as an immense challenge in Europe, making the forecasting of load and its associated metrics increasingly crucial. Among these metrics, peak load is fundamental. Non-time-resolved models of peak load have their advantages of being simple and compact, and among them Velander's formula (VF) is widely used in distribution network planning. However, several aspects of VF remain inadequately addressed, including year-ahead prediction, scaling of customers, aggregation, and, most importantly, the lack of probabilistic elements. The present paper proposes a quantile interpretation of VF that enables VF to learn truncated cumulative distribution functions of peak loads with multiple quantile regression under non-crossing constraints. The evaluations on non-residential customer data confirmed its ability to predict peak load year ahead, to fit customers with a wide range of electricity consumptions, and to model aggregations of customers. A noteworthy finding is that for a given electricity consumption, aggregations of customers have statistically larger peak loads than a single customer.
Reactive intelligence remains one of the cornerstones of versatile robotics operating in cluttered, dynamic, and human-centred environments. Among reactive approaches, potential fields (PF) continue to be widely adopted due to their simplicity and real-time applicability. However, existing PF methods typically oversimplify environmental representations by relying on isotropic, point- or sphere-based obstacle approximations. In human-centred settings, this simplification results in overly conservative paths, cumbersome tuning, and computational overhead -- even breaking real-time requirements. In response, we propose the Geometric Potential Field (GeoPF), a reactive motion-planning framework that explicitly infuses geometric primitives - points, lines, planes, cubes, and cylinders - into real-time planning. By leveraging precise closed-form distance functions, GeoPF significantly reduces computational complexity and parameter tuning effort. Extensive quantitative analyses consistently show GeoPF's higher success rates, reduced tuning complexity (a single parameter set across experiments), and substantially lower computational costs (up to 2 orders of magnitude) compared to traditional PF methods. Real-world experiments further validate GeoPF's robustness and practical ease of deployment. GeoPF provides a fresh perspective on reactive planning problems driving geometric-aware temporal motion generation, enabling flexible and low-latency motion planning suitable for modern robotic applications.
Ratio-based biomarkers -- such as the proportion of necrotic tissue within a tumor -- are widely used in clinical practice to support diagnosis, prognosis and treatment planning. These biomarkers are typically estimated from soft segmentation outputs by computing region-wise ratios. Despite the high-stakes nature of clinical decision making, existing methods provide only point estimates, offering no measure of uncertainty. In this work, we propose a unified \textit{confidence-aware} framework for estimating ratio-based biomarkers. We conduct a systematic analysis of error propagation in the segmentation-to-biomarker pipeline and identify model miscalibration as the dominant source of uncertainty. To mitigate this, we incorporate a lightweight, post-hoc calibration module that can be applied using internal hospital data without retraining. We leverage a tunable parameter $Q$ to control the confidence level of the derived bounds, allowing adaptation towards clinical practice. Extensive experiments show that our method produces statistically sound confidence intervals, with tunable confidence levels, enabling more trustworthy application of predictive biomarkers in clinical workflows.
We present Look-Back and Look-Ahead Adaptive Model Predictive Control (LLA-MPC), a real-time adaptive control framework for autonomous racing that addresses the challenge of rapidly changing tire-surface interactions. Unlike existing approaches requiring substantial data collection or offline training, LLA-MPC employs a model bank for immediate adaptation without a learning period. It integrates two key mechanisms: a look-back window that evaluates recent vehicle behavior to select the most accurate model and a look-ahead horizon that optimizes trajectory planning based on the identified dynamics. The selected model and estimated friction coefficient are then incorporated into a trajectory planner to optimize reference paths in real-time. Experiments across diverse racing scenarios demonstrate that LLA-MPC outperforms state-of-the-art methods in adaptation speed and handling, even during sudden friction transitions. Its learning-free, computationally efficient design enables rapid adaptation, making it ideal for high-speed autonomous racing in multi-surface environments.
This paper presents an integrated approach for efficient path planning and energy management in hybrid unmanned aerial vehicles (HUAVs) equipped with dual fuel-electric propulsion systems. These HUAVs operate in environments that include noise-restricted zones, referred to as quiet zones, where only electric mode is permitted. We address the problem by parameterizing the position of a point along the side of the quiet zone using its endpoints and a scalar parameter, transforming the problem into a variant of finding the shortest path over a graph of convex sets. We formulate this problem as a mixed-integer convex program (MICP), which can be efficiently solved using commercial solvers. Additionally, a tight lower bound can be obtained by relaxing the path-selection variable. Through extensive computations across 200 instances over four maps, we show a substantial improvement in computational efficiency over a state-of-the-art method, achieving up to a 100-fold and 10-fold decrease in computation time for calculating the lower bound and the exact solution, respectively. Moreover, the average gap between the exact cost and the lower bound was approximately 0.24%, and the exact cost was 1.05% lower than the feasible solution from the state-of-the-art approach on average, highlighting the effectiveness of our method. We also extend our approach to plan the HUAV route to visit a set of targets and return to its starting location in environments with quiet zones, yielding a Traveling Salesman Problem (TSP). We employ two methodologies to solve the TSP: one where the SOC at each target is discretized, and another where it is assumed to be the minimum allowable level upon departure. A comparative analysis reveals the second method achieves a cost within 1.02% of the first on average while requiring significantly less computational time.
Research interest in end-to-end autonomous driving has surged owing to its fully differentiable design integrating modular tasks, i.e. perception, prediction and planing, which enables optimization in pursuit of the ultimate goal. Despite the great potential of the end-to-end paradigm, existing methods suffer from several aspects including expensive BEV (bird's eye view) computation, action diversity, and sub-optimal decision in complex real-world scenarios. To address these challenges, we propose a novel hybrid sparse-dense diffusion policy, empowered by a Vision-Language Model (VLM), called Diff-VLA. We explore the sparse diffusion representation for efficient multi-modal driving behavior. Moreover, we rethink the effectiveness of VLM driving decision and improve the trajectory generation guidance through deep interaction across agent, map instances and VLM output. Our method shows superior performance in Autonomous Grand Challenge 2025 which contains challenging real and reactive synthetic scenarios. Our methods achieves 45.0 PDMS.
Natural disasters are increasing in frequency and severity, causing hundreds of billions of dollars in damage annually and posing growing threats to infrastructure and human livelihoods. Accurate data on roofing materials is critical for modeling building vulnerability to natural hazards such as earthquakes, floods, wildfires, and hurricanes, yet such data remain unavailable. To address this gap, we introduce RoofNet, the largest and most geographically diverse novel multimodal dataset to date, comprising over 51,500 samples from 184 geographically diverse sites pairing high-resolution Earth Observation (EO) imagery with curated text annotations for global roof material classification. RoofNet includes geographically diverse satellite imagery labeled with 14 key roofing types -- such as asphalt shingles, clay tiles, and metal sheets -- and is designed to enhance the fidelity of global exposure datasets through vision-language modeling (VLM). We sample EO tiles from climatically and architecturally distinct regions to construct a representative dataset. A subset of 6,000 images was annotated in collaboration with domain experts to fine-tune a VLM. We used geographic- and material-aware prompt tuning to enhance class separability. The fine-tuned model was then applied to the remaining EO tiles, with predictions refined through rule-based and human-in-the-loop verification. In addition to material labels, RoofNet provides rich metadata including roof shape, footprint area, solar panel presence, and indicators of mixed roofing materials (e.g., HVAC systems). RoofNet supports scalable, AI-driven risk assessment and serves as a downstream benchmark for evaluating model generalization across regions -- offering actionable insights for insurance underwriting, disaster preparedness, and infrastructure policy planning.
To address the gaps between the static pre-set "thinking-planning-action" of humanoid robots in unfamiliar scenarios and the highly programmed "call tool-return result" due to the lack of autonomous coding capabilities, this work designs a dynamic architecture connecting continuous thought machines (CTM) and model context protocol (MCP). It proposes a theoretical parallel solution through tick-slab and uses rank compression to achieve parameter suppression to provide a solution for achieving autonomous actions due to autonomous coding. The researcher used a simulation-based experiment using OpenAI's o4-mini-high as a tool to build the experimental environment, and introduced the extended SayCan dataset to conduct nine epochs of experiments. The experimental results show that the CTM-MCP architecture is feasible and effective through the data results of seven metrics: task success rate (TSR), execution success rate (ESR), average episode length (AEL), ROSCOE, REVEAL, proficiency self-assessment (PSA), task effectiveness (TE). In practice, it provides a reference experience for exploring the autonomous dynamic coding of humanoid robots based on continuous thinking to achieve human-like autonomous actions.
In the evolving digital landscape, it is crucial to study the dynamics of cyberattacks and defences. This study uses an Evolutionary Game Theory (EGT) framework to investigate the evolutionary dynamics of attacks and defences in cyberspace. We develop a two-population asymmetric game between attacker and defender to capture the essential factors of costs, potential benefits, and the probability of successful defences. Through mathematical analysis and numerical simulations, we find that systems with high defence intensities show stability with minimal attack frequencies, whereas low-defence environments show instability, and are vulnerable to attacks. Furthermore, we find five equilibria, where the strategy pair always defend and attack emerged as the most likely stable state as cyber domain is characterised by a continuous battle between defenders and attackers. Our theoretical findings align with real-world data from past cyber incidents, demonstrating the interdisciplinary impact, such as fraud detection, risk management and cybersecurity decision-making. Overall, our analysis suggests that adaptive cybersecurity strategies based on EGT can improve resource allocation, enhance system resilience, and reduce the overall risk of cyberattacks. By incorporating real-world data, this study demonstrates the applicability of EGT in addressing the evolving nature of cyber threats and the need for secure digital ecosystems through strategic planning and proactive defence measures.
Multi-Agent Path Finding (MAPF) is a fundamental problem in artificial intelligence and robotics, requiring the computation of collision-free paths for multiple agents navigating from their start locations to designated goals. As autonomous systems become increasingly prevalent in warehouses, urban transportation, and other complex environments, MAPF has evolved from a theoretical challenge to a critical enabler of real-world multi-robot coordination. This comprehensive survey bridges the long-standing divide between classical algorithmic approaches and emerging learning-based methods in MAPF research. We present a unified framework that encompasses search-based methods (including Conflict-Based Search, Priority-Based Search, and Large Neighborhood Search), compilation-based approaches (SAT, SMT, CSP, ASP, and MIP formulations), and data-driven techniques (reinforcement learning, supervised learning, and hybrid strategies). Through systematic analysis of experimental practices across 200+ papers, we uncover significant disparities in evaluation methodologies, with classical methods typically tested on larger-scale instances (up to 200 by 200 grids with 1000+ agents) compared to learning-based approaches (predominantly 10-100 agents). We provide a comprehensive taxonomy of evaluation metrics, environment types, and baseline selections, highlighting the need for standardized benchmarking protocols. Finally, we outline promising future directions including mixed-motive MAPF with game-theoretic considerations, language-grounded planning with large language models, and neural solver architectures that combine the rigor of classical methods with the flexibility of deep learning. This survey serves as both a comprehensive reference for researchers and a practical guide for deploying MAPF solutions in increasingly complex real-world applications.
Optimal Transport (OT) is a resource allocation problem with applications in biology, data science, economics and statistics, among others. In some of the applications, practitioners have access to samples which approximate the continuous measure. Hence the quantities of interest derived from OT -- plans, maps and costs -- are only available in their empirical versions. Statistical inference on OT aims at finding confidence intervals of the population plans, maps and costs. In recent years this topic gained an increasing interest in the statistical community. In this paper we provide a comprehensive review of the most influential results on this research field, underlying the some of the applications. Finally, we provide a list of open problems.
In this work, we introduce SPADE, a path planning framework designed for autonomous navigation in dynamic environments using 3D scene graphs. SPADE combines hierarchical path planning with local geometric awareness to enable collision-free movement in dynamic scenes. The framework bifurcates the planning problem into two: (a) solving the sparse abstract global layer plan and (b) iterative path refinement across denser lower local layers in step with local geometric scene navigation. To ensure efficient extraction of a feasible route in a dense multi-task domain scene graphs, the framework enforces informed sampling of traversable edges prior to path-planning. This removes extraneous information not relevant to path-planning and reduces the overall planning complexity over a graph. Existing approaches address the problem of path planning over scene graphs by decoupling hierarchical and geometric path evaluation processes. Specifically, this results in an inefficient replanning over the entire scene graph when encountering path obstructions blocking the original route. In contrast, SPADE prioritizes local layer planning coupled with local geometric scene navigation, enabling navigation through dynamic scenes while maintaining efficiency in computing a traversable route. We validate SPADE through extensive simulation experiments and real-world deployment on a quadrupedal robot, demonstrating its efficacy in handling complex and dynamic scenarios.
Reinforcement learning (RL) is increasingly applied to real-world problems involving complex and structured decisions, such as routing, scheduling, and assortment planning. These settings challenge standard RL algorithms, which struggle to scale, generalize, and exploit structure in the presence of combinatorial action spaces. We propose Structured Reinforcement Learning (SRL), a novel actor-critic framework that embeds combinatorial optimization layers into the actor neural network. We enable end-to-end learning of the actor via Fenchel-Young losses and provide a geometric interpretation of SRL as a primal-dual algorithm in the dual of the moment polytope. Across six environments with exogenous and endogenous uncertainty, SRL matches or surpasses the performance of unstructured RL and imitation learning on static tasks and improves over these baselines by up to 92% on dynamic problems, with improved stability and convergence speed.
Accurate tumour segmentation is vital for various targeted diagnostic and therapeutic procedures for cancer, e.g., planning biopsies or tumour ablations. Manual delineation is extremely labour-intensive, requiring substantial expert time. Fully-supervised machine learning models aim to automate such localisation tasks, but require a large number of costly and often subjective 3D voxel-level labels for training. The high-variance and subjectivity in such labels impacts model generalisability, even when large datasets are available. Histopathology labels may offer more objective labels but the infeasibility of acquiring pixel-level annotations to develop tumour localisation methods based on histology remains challenging in-vivo. In this work, we propose a novel weakly-supervised semantic segmentation framework called SPARS (Self-Play Adversarial Reinforcement Learning for Segmentation), which utilises an object presence classifier, trained on a small number of image-level binary cancer presence labels, to localise cancerous regions on CT scans. Such binary labels of patient-level cancer presence can be sourced more feasibly from biopsies and histopathology reports, enabling a more objective cancer localisation on medical images. Evaluating with real patient data, we observed that SPARS yielded a mean dice score of $77.3 \pm 9.4$, which outperformed other weakly-supervised methods by large margins. This performance was comparable with recent fully-supervised methods that require voxel-level annotations. Our results demonstrate the potential of using SPARS to reduce the need for extensive human-annotated labels to detect cancer in real-world healthcare settings.
Modern end-to-end autonomous driving systems suffer from a critical limitation: their planners lack mechanisms to enforce temporal consistency between predicted trajectories and evolving scene dynamics. This absence of self-supervision allows early prediction errors to compound catastrophically over time. We introduce Echo Planning, a novel self-correcting framework that establishes a closed-loop Current - Future - Current (CFC) cycle to harmonize trajectory prediction with scene coherence. Our key insight is that plausible future trajectories must be bi-directionally consistent, ie, not only generated from current observations but also capable of reconstructing them. The CFC mechanism first predicts future trajectories from the Bird's-Eye-View (BEV) scene representation, then inversely maps these trajectories back to estimate the current BEV state. By enforcing consistency between the original and reconstructed BEV representations through a cycle loss, the framework intrinsically penalizes physically implausible or misaligned trajectories. Experiments on nuScenes demonstrate state-of-the-art performance, reducing L2 error by 0.04 m and collision rate by 0.12% compared to one-shot planners. Crucially, our method requires no additional supervision, leveraging the CFC cycle as an inductive bias for robust planning. This work offers a deployable solution for safety-critical autonomous systems.
This report describes an extension of the distributed job scheduling and SAT solving platform Mallob by incremental SAT solving, embedded in a case study on SAT-based hierarchical planning. We introduce a low-latency interface for incremental jobs and specifically for IPASIR-style incremental SAT solving to Mallob. This also allows to process many independent planning instances in parallel via Mallob's scheduling capabilities. In an experiment where 587 planning inputs are resolved in parallel on 2348 cores, we observe significant speedups for several planning domains where SAT solving constitutes a major part of the planner's running time. These findings indicate that our approach to distributed incremental SAT solving may be useful for a wide range of SAT applications.
Wind-wave and ocean current interactions affect critical coastal and oceanic processes, yet modeling these interactions presents significant challenges. The western North Atlantic Ocean provides an ideal test environment for coupled hydrodynamics and wind wave models, thanks to its energetic surface currents such as the Gulf Stream. This study evaluates a high-resolution coupled SCHISM WWM III model, utilizing NOAA's 'STOFS-3D-Atlantic' computational mesh, while incorporating three-dimensional baroclinic dynamics to account for density stratification effects. We evaluate the model's calculated water level and tidal predictions against NOAA tide gauge measurements during December 2016. The coupled model demonstrates robust skills in reproducing tidal constituents, non-tidal components, and total water level predictions along the U.S. East and Gulf of Mexico Coasts. In addition, we systematically evaluate three wave physics parameterizations (Ardhuin, Makin and Stam, and Cycle Three) in the spectral wave model to quantify their effects on the modeled wave characteristics. This validated modeling framework enhances our ability to understand and predict complex coastal and oceanic processes, offering significant applications for coastal management, maritime operations, and climate adaptation planning throughout the western North Atlantic region.
The fourth observing run (O4) of Advanced LIGO, Virgo, and KAGRA has started in May 2023 and is planned to continue until October 2025. On behalf of the LVK Collaboration, I will cover two topics: Status of the O4 run and latest non-CBC results. Status of the O4 run. The focus will be on detectors' performance and online searches/alerts, drawing on publicly available sources provided by the collaboration. Additionally, I will give an overview of removing noise techniques, including AI approaches that help gain sensitivity at a small cost. Latest non-CBC results. Compact Binary Coalescence (CBC) is just one of the potential GW sources: Continuous Waves, Bursts, and Stochastic are still being hunted down. Here, O4 public results of searches will be presented, or the latest O3 will be discussed when the former are not yet available. So far, no GW detections have been associated with these non-CBC sources in any of the searches conducted.
Efficient tabletop rearrangement planning seeks to find high-quality solutions while minimizing total cost. However, the task is challenging due to object dependencies and limited buffer space for temporary placements. The complexity increases for mobile robots, which must navigate around the table with restricted access. A*-based methods yield high-quality solutions, but struggle to scale as the number of objects increases. Monte Carlo Tree Search (MCTS) has been introduced as an anytime algorithm, but its convergence speed to high-quality solutions remains slow. Previous work~\cite{strap2024} accelerated convergence but required the robot to move to the closest position to the object for each pick and place operation, leading to inefficiencies. To address these limitations, we extend the planner by introducing a more efficient strategy for mobile robots. Instead of selecting the nearest available location for each action, our approach allows multiple operations (e.g., pick-and-place) from a single standing position, reducing unnecessary movement. Additionally, we incorporate state re-exploration to further improve plan quality. Experimental results show that our planner outperforms existing planners both in terms of solution quality and planning time.
Off-road navigation remains challenging for autonomous robots due to the harsh terrain and clustered obstacles. In this letter, we extend the YOPO (You Only Plan Once) end-to-end navigation framework to off-road environments, explicitly focusing on forest terrains, consisting of a high-performance, multi-sensor supported off-road simulator YOPO-Sim, a zero-shot transfer sim-to-real planner YOPO-Rally, and an MPC controller. Built on the Unity engine, the simulator can generate randomized forest environments and export depth images and point cloud maps for expert demonstrations, providing competitive performance with mainstream simulators. Terrain Traversability Analysis (TTA) processes cost maps, generating expert trajectories represented as non-uniform cubic Hermite curves. The planner integrates TTA and the pathfinding into a single neural network that inputs the depth image, current velocity, and the goal vector, and outputs multiple trajectory candidates with costs. The planner is trained by behavior cloning in the simulator and deployed directly into the real-world without fine-tuning. Finally, a series of simulated and real-world experiments is conducted to validate the performance of the proposed framework.
Modeling human mobility across diverse cities is essential for applications such as urban planning, transportation optimization, and personalized services. However, generalization remains challenging due to heterogeneous spatial representations and mobility patterns across cities. Existing methods typically rely on numerical coordinates or require training city-specific models, limiting their scalability and transferability. We propose TrajMoE, a unified and scalable model for cross-city human mobility modeling. TrajMoE addresses two key challenges: (1) inconsistent spatial semantics across cities, and (2) diverse urban mobility patterns. To tackle these, we begin by designing a spatial semantic encoder that learns transferable location representations from POI-based functional semantics and visit patterns. Furthermore, we design a Spatially-Aware Mixture-of-Experts (SAMoE) Transformer that injects structured priors into experts specialized in distinct mobility semantics, along with a shared expert to capture city-invariant patterns and enable adaptive cross-city generalization. Extensive experiments demonstrate that TrajMoE achieves up to 27% relative improvement over competitive mobility foundation models after only one epoch of fine-tuning, and consistently outperforms full-data baselines using merely 5% of target city data. These results establish TrajMoE as a significant step toward realizing a truly generalizable, transferable, and pretrainable foundation model for human mobility.