Achieving robust cognitive autonomy in robots navigating complex, unpredictable environments remains a fundamental challenge in robotics. This paper presents Underwater Robot Self-Organizing Autonomy (UROSA), a groundbreaking architecture leveraging distributed Large Language Model AI agents integrated within the Robot Operating System 2 (ROS 2) framework to enable advanced cognitive capabilities in Autonomous Underwater Vehicles. UROSA decentralises cognition into specialised AI agents responsible for multimodal perception, adaptive reasoning, dynamic mission planning, and real-time decision-making. Central innovations include flexible agents dynamically adapting their roles, retrieval-augmented generation utilising vector databases for efficient knowledge management, reinforcement learning-driven behavioural optimisation, and autonomous on-the-fly ROS 2 node generation for runtime functional extensibility. Extensive empirical validation demonstrates UROSA's promising adaptability and reliability through realistic underwater missions in simulation and real-world deployments, showing significant advantages over traditional rule-based architectures in handling unforeseen scenarios, environmental uncertainties, and novel mission objectives. This work not only advances underwater autonomy but also establishes a scalable, safe, and versatile cognitive robotics framework capable of generalising to a diverse array of real-world applications.
Clinical trials are a systematic endeavor to assess the safety and efficacy of new drugs or treatments. Conducting such trials typically demands significant financial investment and meticulous planning, highlighting the need for accurate predictions of trial outcomes. Accurately predicting patient enrollment, a key factor in trial success, is one of the primary challenges during the planning phase. In this work, we propose a novel deep learning-based method to address this critical challenge. Our method, implemented as a neural network model, leverages pre-trained language models (PLMs) to capture the complexities and nuances of clinical documents, transforming them into expressive representations. These representations are then combined with encoded tabular features via an attention mechanism. To account for uncertainties in enrollment prediction, we enhance the model with a probabilistic layer based on the Gamma distribution, which enables range estimation. We apply the proposed model to predict clinical trial duration, assuming site-level enrollment follows a Poisson-Gamma process. We carry out extensive experiments on real-world clinical trial data, and show that the proposed method can effectively predict the number of patients enrolled at a number of sites for a given clinical trial, outperforming established baseline models.
Decentralized Multi-Agent Reinforcement Learning (MARL) methods allow for learning scalable multi-agent policies, but suffer from partial observability and induced non-stationarity. These challenges can be addressed by introducing mechanisms that facilitate coordination and high-level planning. Specifically, coordination and temporal abstraction can be achieved through communication (e.g., message passing) and Hierarchical Reinforcement Learning (HRL) approaches to decision-making. However, optimization issues limit the applicability of hierarchical policies to multi-agent systems. As such, the combination of these approaches has not been fully explored. To fill this void, we propose a novel and effective methodology for learning multi-agent hierarchies of message-passing policies. We adopt the feudal HRL framework and rely on a hierarchical graph structure for planning and coordination among agents. Agents at lower levels in the hierarchy receive goals from the upper levels and exchange messages with neighboring agents at the same level. To learn hierarchical multi-agent policies, we design a novel reward-assignment method based on training the lower-level policies to maximize the advantage function associated with the upper levels. Results on relevant benchmarks show that our method performs favorably compared to the state of the art.
Estimating electricity consumption accurately is essential for the planning and operation of energy systems, as well as for billing processes. Standard Load Profiles (SLP) are widely used to estimate consumption patterns of different user groups. However, in Germany these SLP were formulated using historical data from over 20 years ago and have not been adjusted since. Changing electricity consumption behaviour, which leads to increasing deviations between load patterns and SLP, results in a need for a revision taking into account new data. The growing number of smart meters provides a large measurement database, which enables more accurate load modelling. This paper creates updated SLP using recent data. In addition, the assumptions of the SLP method are validated and improvements are proposed, taking into account the ease of applicability. Furthermore, a Fourier Series-based model is proposed as an alternative SLP model. The different models are compared and evaluated.
Distribution grid operation faces new challenges caused by a rising share of renewable energy sources and the introduction of additional types of loads to the grid. With the increasing adoption of distributed generation and emerging prosumer households, Energy Management Systems, which manage and apply flexibility of connected devices, are gaining popularity. While potentially beneficial to grid capacity, strategic energy management also adds to the complexity of distribution grid operation and planning processes. Novel approaches of time-series-based planning likewise face increasingly complex simulation scenarios and rising computational cost. Discrete event modelling helps facilitating simulations of such scenarios by restraining computation to the most relevant points in simulation time. We provide an enhancement of a discrete event distribution grid simulation software that offers fast implementation and testing of energy management algorithms, embedded into a feature-rich simulation environment. Physical models are specified using the Discrete Event System Specification. Furthermore, we contribute a communication protocol that makes use of the discrete event paradigm by only computing flexibility potential when necessary.
Multimodal planning capabilities refer to the ability to predict, reason, and design steps for task execution with multimodal context, which is essential for complex reasoning and decision-making across multiple steps. However, current benchmarks face two key challenges: (1) they cannot directly assess multimodal real-world planning capabilities, and (2) they lack constraints or implicit constraints across modalities. To address these issues, we introduce Multimodal Planning with Complex Constraints (MPCC), the first benchmark to systematically evaluate MLLMs' ability to handle multimodal constraints in planning. To address the first challenge, MPCC focuses on three real-world tasks: Flight Planning, Calendar Planning, and Meeting Planning. To solve the second challenge, we introduce complex constraints (e.g. budget, temporal, and spatial) in these tasks, with graded difficulty levels (EASY, MEDIUM, HARD) to separate constraint complexity from search space expansion. Experiments on 13 advanced MLLMs reveal significant challenges: closed-source models achieve only 21.3% feasible plans, while open-source models average below 11%. Additionally, we observe that MLLMs are highly sensitive to constraint complexity and that traditional multimodal prompting strategies fail in multi-constraint scenarios. Our work formalizes multimodal constraints in planning, provides a rigorous evaluation framework, and highlights the need for advancements in constraint-aware reasoning for real-world MLLM applications.
There is a growing demand for autonomous mobile robots capable of navigating unstructured agricultural environments. Tasks such as weed control in meadows require efficient path planning through an unordered set of coordinates while minimizing travel distance and adhering to curvature constraints to prevent soil damage and protect vegetation. This paper presents an integrated navigation framework combining a global path planner based on the Dubins Traveling Salesman Problem (DTSP) with a Nonlinear Model Predictive Control (NMPC) strategy for local path planning and control. The DTSP generates a minimum-length, curvature-constrained path that efficiently visits all targets, while the NMPC leverages this path to compute control signals to accurately reach each waypoint. The system's performance was validated through comparative simulation analysis on real-world field datasets, demonstrating that the coupled DTSP-based planner produced smoother and shorter paths, with a reduction of about 16% in the provided scenario, compared to decoupled methods. Based thereon, the NMPC controller effectively steered the robot to the desired waypoints, while locally optimizing the trajectory and ensuring adherence to constraints. These findings demonstrate the potential of the proposed framework for efficient autonomous navigation in agricultural environments.
We present 115 compact radio point sources in three galaxies, NGC 5474, NGC 4631 and M51, taken in the most extended (A-)configuration of the Karl G. Jansky Very Large Array at 10GHz. Several of these compact radio point sources have diffuse counterparts identified in previous multi-band studies of resolved radio continuum emission. We find compact counterparts to eight star forming regions, four anomalous microwave emission candidates, and one supernova remnant (SN 2011dh). Nine of the compact radio sources match X-ray counterparts, the majority of which are background galaxies. These AGN are all within the D25 (isophotal diameter) of the host galaxy and might act as contaminants for X-ray binary population studies, highlighting the need for high-resolution multi-band imaging. This study showcases the broad number of science cases that require sensitive radio facilities, like the upcoming Square Kilometre Array and the planned next generation Very Large Array.
Lane segment topology reasoning provides comprehensive bird's-eye view (BEV) road scene understanding, which can serve as a key perception module in planning-oriented end-to-end autonomous driving systems. Existing lane topology reasoning methods often fall short in effectively leveraging temporal information to enhance detection and reasoning performance. Recently, stream-based temporal propagation method has demonstrated promising results by incorporating temporal cues at both the query and BEV levels. However, it remains limited by over-reliance on historical queries, vulnerability to pose estimation failures, and insufficient temporal propagation. To overcome these limitations, we propose FASTopoWM, a novel fast-slow lane segment topology reasoning framework augmented with latent world models. To reduce the impact of pose estimation failures, this unified framework enables parallel supervision of both historical and newly initialized queries, facilitating mutual reinforcement between the fast and slow systems. Furthermore, we introduce latent query and BEV world models conditioned on the action latent to propagate the state representations from past observations to the current timestep. This design substantially improves the performance of temporal perception within the slow pipeline. Extensive experiments on the OpenLane-V2 benchmark demonstrate that FASTopoWM outperforms state-of-the-art methods in both lane segment detection (37.4% v.s. 33.6% on mAP) and centerline perception (46.3% v.s. 41.5% on OLS).
A key challenge in deploying automated vehicles (AVs) is ensuring they make appropriate decisions in ethically challenging everyday driving situations. While much attention has been paid to rare, high-stakes dilemmas such as trolley problems, similar tensions also arise in routine scenarios, such as navigating empty intersections, where multiple human considerations, including legality and comfort, often conflict. Current AV planning systems typically rely on rigid rules, which struggle to balance these competing considerations and can lead to behaviour that misaligns with human expectations. This paper proposes a novel reasons-based trajectory evaluation framework that operationalises the tracking condition of Meaningful Human Control (MHC). The framework models the reasons of human agents, such as regulatory compliance, as quantifiable functions and evaluates how well candidate AV trajectories align with these reasons. By assigning adjustable weights to agent priorities and integrating a balance function to discourage the exclusion of any agent, the framework supports interpretable decision evaluation. Through a real-world-inspired overtaking scenario, we show how this approach reveals tensions, for instance between regulatory compliance, efficiency, and comfort. The framework functions as a modular evaluation layer over existing planning algorithms. It offers a transparent tool for assessing ethical alignment in everyday scenarios and provides a practical step toward implementing MHC in real-world AV deployment.
Vision-Language-Action (VLA) models have demonstrated significant potential in complex scene understanding and action reasoning, leading to their increasing adoption in end-to-end autonomous driving systems. However, the long visual tokens of VLA models greatly increase computational costs. Current visual token pruning methods in Vision-Language Models (VLM) rely on either visual token similarity or visual-text attention, but both have shown poor performance in autonomous driving scenarios. Given that human drivers concentrate on relevant foreground areas while driving, we assert that retaining visual tokens containing this foreground information is essential for effective decision-making. Inspired by this, we propose FastDriveVLA, a novel reconstruction-based vision token pruning framework designed specifically for autonomous driving. FastDriveVLA includes a plug-and-play visual token pruner called ReconPruner, which prioritizes foreground information through MAE-style pixel reconstruction. A novel adversarial foreground-background reconstruction strategy is designed to train ReconPruner for the visual encoder of VLA models. Once trained, ReconPruner can be seamlessly applied to different VLA models with the same visual encoder without retraining. To train ReconPruner, we also introduce a large-scale dataset called nuScenes-FG, consisting of 241K image-mask pairs with annotated foreground regions. Our approach achieves state-of-the-art results on the nuScenes closed-loop planning benchmark across different pruning ratios.
Ethical dilemmas are a common challenge in everyday driving, requiring human drivers to balance competing priorities such as safety, efficiency, and rule compliance. However, much of the existing research in automated vehicles (AVs) has focused on high-stakes "trolley problems," which involve extreme and rare situations. Such scenarios, though rich in ethical implications, are rarely applicable in real-world AV decision-making. In practice, when AVs confront everyday ethical dilemmas, they often appear to prioritise strict adherence to traffic rules. By contrast, human drivers may bend the rules in context-specific situations, using judgement informed by practical concerns such as safety and efficiency. According to the concept of meaningful human control, AVs should respond to human reasons, including those of drivers, vulnerable road users, and policymakers. This work introduces a novel human reasons-based supervision framework that detects when AV behaviour misaligns with expected human reasons to trigger trajectory reconsideration. The framework integrates with motion planning and control systems to support real-time adaptation, enabling decisions that better reflect safety, efficiency, and regulatory considerations. Simulation results demonstrate that this approach could help AVs respond more effectively to ethical challenges in dynamic driving environments by prompting replanning when the current trajectory fails to align with human reasons. These findings suggest that our approach offers a path toward more adaptable, human-centered decision-making in AVs.
In recent times, products have become increasingly complex and highly reliable, so failures typically occur after long periods of operation under normal conditions and may arise from multiple causes. This paper employs simple step-stress partial accelerated life testing (SSSPALT) within the competing risks framework to determine the Bayesian reliability acceptance sampling plan (BRASP) under type-II censoring. Elevating the stress during the life test incurs an additional cost that increases the cost of the life test. In this context, an adaptive scenario is also considered in that sampling plan. The adaptive scenario is as follows: the stress is increased after a certain time if the number of failures up to that point is less than a pre-specified number of failures. The Bayes decision function and Bayes risk are derived for the general loss function. An optimal BRASP under that adaptive SSSPALT is obtained for the quadratic loss function by minimizing Bayes risk. An algorithm is provided to determine the optimal proposed BRASP. Further, comparative studies are conducted between the proposed BRASP, the conventional non-accelerated BRASP, and the conventional accelerated BRASP under type-II censoring to evaluate the effectiveness of the proposed approach. Finally, the methodology is illustrated using real data.
Breast MRI provides high-resolution volumetric imaging critical for tumor assessment and treatment planning, yet manual interpretation of 3D scans remains labor-intensive and subjective. While AI-powered tools hold promise for accelerating medical image analysis, adoption of commercial medical AI products remains limited in low- and middle-income countries due to high license costs, proprietary software, and infrastructure demands. In this work, we investigate whether the Segment Anything Model 2 (SAM2) can be adapted for low-cost, minimal-input 3D tumor segmentation in breast MRI. Using a single bounding box annotation on one slice, we propagate segmentation predictions across the 3D volume using three different slice-wise tracking strategies: top-to-bottom, bottom-to-top, and center-outward. We evaluate these strategies across a large cohort of patients and find that center-outward propagation yields the most consistent and accurate segmentations. Despite being a zero-shot model not trained for volumetric medical data, SAM2 achieves strong segmentation performance under minimal supervision. We further analyze how segmentation performance relates to tumor size, location, and shape, identifying key failure modes. Our results suggest that general-purpose foundation models such as SAM2 can support 3D medical image analysis with minimal supervision, offering an accessible and affordable alternative for resource-constrained settings.
Reconfigurable multi-robot cells offer a promising approach to meet fluctuating assembly demands. However, the recurrent planning of their configurations introduces new challenges, particularly in generating optimized, coordinated multi-robot motion sequences that minimize the assembly duration. This work presents a simulation-based method for generating such optimized sequences. The approach separates assembly steps into task-related core operations and connecting traverse operations. While core operations are constrained and predetermined, traverse operations offer substantial optimization potential. Scheduling the core operations is formulated as an optimization problem, requiring feasible traverse operations to be integrated using a decomposition-based motion planning strategy. Several solution techniques are explored, including a sampling heuristic, tree-based search and gradient-free optimization. For motion planning, a decomposition method is proposed that identifies specific areas in the schedule, which can be solved independently with modified centralized path planning algorithms. The proposed method generates efficient and collision-free multi-robot assembly procedures that outperform a baseline relying on decentralized, robot-individual motion planning. Its effectiveness is demonstrated through simulation experiments.
Tidal disruption events (TDEs) involving supermassive black holes (SMBHs) often exhibit radio emission, yet its physical origin remains uncertain, especially in non-jetted cases. In this Letter, we formulate a general dynamical framework for a radio-emitting shell driven by disk winds and expanding through a power-law ambient medium under the influence of SMBH gravity. We derive and classify power-law-in-time solutions to the governing equations in the adiabatic regime. In particular, a universal $t^{2/3}$ scaling emerges naturally when gravitational energy dominates or is comparable to thermal energy, irrespective of the ambient density profile, whereas the classical Sedov-Taylor solution is recovered when gravity is negligible. Our analysis reveals that, in regimes where SMBH gravity governs the shell expansion, the SMBH mass can be inferred from radio observations of the shell. This approach is independent of and complementary to conventional mass estimators, with direct implications for interpreting radio-emitting TDEs and probing SMBH demographics. Our formalism further predicts that 10-100 GHz monitoring with existing and planned facilities can yield SMBH masses within months of disruption, providing a time-domain analogue to reverberation mapping.
Understanding causal relationships across modalities is a core challenge for multimodal models operating in real-world environments. We introduce ISO-Bench, a benchmark for evaluating whether models can infer causal dependencies between visual observations and procedural text. Each example presents an image of a task step and a text snippet from a plan, with the goal of deciding whether the visual step occurs before or after the referenced text step. Evaluation results on ten frontier vision-language models show underwhelming performance: the best zero-shot F1 is only 0.57, and chain-of-thought reasoning yields only modest gains (up to 0.62 F1), largely behind humans (0.98 F1). Our analysis further highlights concrete directions for improving causal understanding in multimodal models.
The Extract, Transform, Load (ETL) workflow is fundamental for populating and maintaining data warehouses and other data stores accessed by analysts for downstream tasks. A major shortcoming of modern ETL solutions is the extensive need for a human-in-the-loop, required to design and implement context-specific, and often non-generalisable transformations. While related work in the field of ETL automation shows promising progress, there is a lack of solutions capable of automatically designing and applying these transformations. We present FlowETL, a novel example-based autonomous ETL pipeline architecture designed to automatically standardise and prepare input datasets according to a concise, user-defined target dataset. FlowETL is an ecosystem of components which interact together to achieve the desired outcome. A Planning Engine uses a paired input-output datasets sample to construct a transformation plan, which is then applied by an ETL worker to the source dataset. Monitoring and logging provide observability throughout the entire pipeline. The results show promising generalisation capabilities across 14 datasets of various domains, file structures, and file sizes.
We present SMART-Editor, a framework for compositional layout and content editing across structured (posters, websites) and unstructured (natural images) domains. Unlike prior models that perform local edits, SMART-Editor preserves global coherence through two strategies: Reward-Refine, an inference-time rewardguided refinement method, and RewardDPO, a training-time preference optimization approach using reward-aligned layout pairs. To evaluate model performance, we introduce SMARTEdit-Bench, a benchmark covering multi-domain, cascading edit scenarios. SMART-Editor outperforms strong baselines like InstructPix2Pix and HIVE, with RewardDPO achieving up to 15% gains in structured settings and Reward-Refine showing advantages on natural images. Automatic and human evaluations confirm the value of reward-guided planning in producing semantically consistent and visually aligned edits.
Surgical resection is the primary treatment option for brain tumor patients, but it carries the risk of postoperative cognitive dysfunction. This study investigates how tumor-induced alterations in presurgical neural dynamics relate to postoperative working memory decline. We analyzed functional magnetic resonance imaging (fMRI) of brain tumor patients before surgery and extracted energy landscapes of high-order brain interactions. We then examined the relation between these energy features and postoperative working memory performance using statistical and machine learning (random forest) models. Patients with lower postoperative working memory scores exhibited fewer but more extreme transitions between local energy minima and maxima, whereas patients with higher scores showed more frequent but less extreme shifts. Furthermore, the presurgical high-order energy features were able to accurately predict postoperative working memory decline with a mean accuracy of 90\%, F1 score of 87.5\%, and an AUC of 0.95. Our study suggests that the brain tumor-induced disruptions in high-order neural dynamics before surgery are predictive of postoperative working memory decline. Our findings pave the path for personalized surgical planning and targeted interventions to mitigate cognitive risks associated with brain tumor resection.
As extreme heat events intensify due to climate change and urbanization, cities face increasing challenges in mitigating outdoor heat stress. While traditional physical models such as SOLWEIG and ENVI-met provide detailed assessments of human-perceived heat exposure, their computational demands limit scalability for city-wide planning. In this study, we propose GSM-UTCI, a multimodal deep learning framework designed to predict daytime average Universal Thermal Climate Index (UTCI) at 1-meter hyperlocal resolution. The model fuses surface morphology (nDSM), high-resolution land cover data, and hourly meteorological conditions using a feature-wise linear modulation (FiLM) architecture that dynamically conditions spatial features on atmospheric context. Trained on SOLWEIG-derived UTCI maps, GSM-UTCI achieves near-physical accuracy, with an R2 of 0.9151 and a mean absolute error (MAE) of 0.41{\deg}C, while reducing inference time from hours to under five minutes for an entire city. To demonstrate its planning relevance, we apply GSM-UTCI to simulate systematic landscape transformation scenarios in Philadelphia, replacing bare earth, grass, and impervious surfaces with tree canopy. Results show spatially heterogeneous but consistently strong cooling effects, with impervious-to-tree conversion producing the highest aggregated benefit (-4.18{\deg}C average change in UTCI across 270.7 km2). Tract-level bivariate analysis further reveals strong alignment between thermal reduction potential and land cover proportions. These findings underscore the utility of GSM-UTCI as a scalable, fine-grained decision support tool for urban climate adaptation, enabling scenario-based evaluation of greening strategies across diverse urban environments.
The development of autonomous agents for complex, long-horizon tasks is a central goal in AI. However, dominant training paradigms face a critical limitation: reinforcement learning (RL) methods that optimize solely for final task success often reinforce flawed or inefficient reasoning paths, a problem we term inefficient exploration. This leads to agents that are brittle and fail to generalize, as they learn to find solutions without learning how to reason coherently. To address this, we introduce RLVMR, a novel framework that integrates dense, process-level supervision into end-to-end RL by rewarding verifiable, meta-reasoning behaviors. RLVMR equips an agent to explicitly tag its cognitive steps, such as planning, exploration, and reflection, and provides programmatic, rule-based rewards for actions that contribute to effective problem-solving. These process-centric rewards are combined with the final outcome signal and optimized using a critic-free policy gradient method. On the challenging ALFWorld and ScienceWorld benchmarks, RLVMR achieves new state-of-the-art results, with our 7B model reaching an 83.6% success rate on the most difficult unseen task split. Our analysis confirms these gains stem from improved reasoning quality, including significant reductions in redundant actions and enhanced error recovery, leading to more robust, efficient, and interpretable agents.
Query-focused table summarization requires complex reasoning, often approached through step-by-step natural language (NL) plans. However, NL plans are inherently ambiguous and lack structure, limiting their conversion into executable programs like SQL and hindering scalability, especially for multi-table tasks. To address this, we propose a paradigm shift to structured representations. We introduce a new structured plan, TaSoF, inspired by formalism in traditional multi-agent systems, and a framework, SPaGe, that formalizes the reasoning process in three phases: 1) Structured Planning to generate TaSoF from a query, 2) Graph-based Execution to convert plan steps into SQL and model dependencies via a directed cyclic graph for parallel execution, and 3) Summary Generation to produce query-focused summaries. Our method explicitly captures complex dependencies and improves reliability. Experiments on three public benchmarks show that SPaGe consistently outperforms prior models in both single- and multi-table settings, demonstrating the advantages of structured representations for robust and scalable summarization.
Land-air bimodal robots (LABR) are gaining attention for autonomous navigation, combining high mobility from aerial vehicles with long endurance from ground vehicles. However, existing LABR navigation methods are limited by suboptimal trajectories from mapping-based approaches and the excessive computational demands of learning-based methods. To address this, we propose a two-stage lightweight framework that integrates global key points prediction with local trajectory refinement to generate efficient and reachable trajectories. In the first stage, the Global Key points Prediction Network (GKPN) was used to generate a hybrid land-air keypoint path. The GKPN includes a Sobel Perception Network (SPN) for improved obstacle detection and a Lightweight Attention Planning Network (LAPN) to improves predictive ability by capturing contextual information. In the second stage, the global path is segmented based on predicted key points and refined using a mapping-based planner to create smooth, collision-free trajectories. Experiments conducted on our LABR platform show that our framework reduces network parameters by 14\% and energy consumption during land-air transitions by 35\% compared to existing approaches. The framework achieves real-time navigation without GPU acceleration and enables zero-shot transfer from simulation to reality during
Assigning passenger trips to specific network paths using automatic fare collection (AFC) data is a fundamental application in urban transit analysis. The task is a difficult inverse problem: the only available information consists of each passenger's total travel time and their origin and destination, while individual passenger path choices and dynamic network costs are unobservable, and behavior varies significantly across space and time. We propose a novel Bayesian hierarchical model to resolve this problem by jointly estimating dynamic network costs and passenger path choices while quantifying their uncertainty. Our model decomposes trip travel time into four components -- access, in-vehicle, transfer, and egress -- each modeled as a time-varying random walk. To capture heterogeneous passenger behavior, we introduce a multinomial logit model with spatiotemporally varying coefficients. We manage the high dimensionality of these coefficients using kernelized tensor factorization with Gaussian process priors to effectively model complex spatiotemporal correlations. We develop a tailored and efficient Markov chain Monte Carlo (MCMC) algorithm for model inference. A simulation study demonstrates the method's effectiveness in recovering the underlying model parameters. On a large-scale dataset from the Hong Kong Mass Transit Railway, our framework demonstrates superior estimation accuracy over established benchmarks. The results reveal significant spatiotemporal variations in passenger preferences and provide robust uncertainty quantification, offering transit operators a powerful tool for enhancing service planning and operational management.
The advent of end-to-end autonomy stacks - often lacking interpretable intermediate modules - has placed an increased burden on ensuring that the final output, i.e., the motion plan, is safe in order to validate the safety of the entire stack. This requires a safety monitor that is both complete (able to detect all unsafe plans) and sound (does not flag safe plans). In this work, we propose a principled safety monitor that leverages modern multi-modal trajectory predictors to approximate forward reachable sets (FRS) of surrounding agents. By formulating a convex program, we efficiently extract these data-driven FRSs directly from the predicted state distributions, conditioned on scene context such as lane topology and agent history. To ensure completeness, we leverage conformal prediction to calibrate the FRS and guarantee coverage of ground-truth trajectories with high probability. To preserve soundness in out-of-distribution (OOD) scenarios or under predictor failure, we introduce a Bayesian filter that dynamically adjusts the FRS conservativeness based on the predictor's observed performance. We then assess the safety of the ego vehicle's motion plan by checking for intersections with these calibrated FRSs, ensuring the plan remains collision-free under plausible future behaviors of others. Extensive experiments on the nuScenes dataset show our approach significantly improves soundness while maintaining completeness, offering a practical and reliable safety monitor for learned autonomy stacks.
AI agents powered by large language models are increasingly capable of autonomously completing complex, multi-step tasks using external tools. Yet, they still fall short of human-level performance in most domains including computer use, software development, and research. Their growing autonomy and ability to interact with the outside world, also introduces safety and security risks including potentially misaligned actions and adversarial manipulation. We argue that human-in-the-loop agentic systems offer a promising path forward, combining human oversight and control with AI efficiency to unlock productivity from imperfect systems. We introduce Magentic-UI, an open-source web interface for developing and studying human-agent interaction. Built on a flexible multi-agent architecture, Magentic-UI supports web browsing, code execution, and file manipulation, and can be extended with diverse tools via Model Context Protocol (MCP). Moreover, Magentic-UI presents six interaction mechanisms for enabling effective, low-cost human involvement: co-planning, co-tasking, multi-tasking, action guards, and long-term memory. We evaluate Magentic-UI across four dimensions: autonomous task completion on agentic benchmarks, simulated user testing of its interaction capabilities, qualitative studies with real users, and targeted safety assessments. Our findings highlight Magentic-UI's potential to advance safe and efficient human-agent collaboration.
Existing earthmoving autonomy is largely confined to highly controlled and well-characterized environments due to the complexity of vehicle-terrain interaction dynamics and the partial observability of the terrain resulting from unknown and spatially varying soil conditions. In this chapter, a a soil-property mapping system is proposed to extend the environmental state, in order to overcome these restrictions and facilitate development of more robust autonomous earthmoving. A GPU accelerated elevation mapping system is extended to incorporate a blind mapping component which traces the movement of the blade through the terrain to displace and erode intersected soil, enabling separately tracking undisturbed and disturbed soil. Each interaction is approximated as a flat blade moving through a locally homogeneous soil, enabling modeling of cutting forces using the fundamental equation of earthmoving (FEE). Building upon our prior work on in situ soil-property estimation, a method is devised to extract approximate geometric parameters of the model given the uneven terrain, and an improved physics infused neural network (PINN) model is developed to predict soil properties and uncertainties of these estimates. A simulation of a compact track loader (CTL) with a blade attachment is used to collect data to train the PINN model. Post-training, the model is leveraged online by the mapping system to track soil property estimates spatially as separate layers in the map, with updates being performed in a Bayesian manner. Initial experiments show that the system accurately highlights regions requiring higher relative interaction forces, indicating the promise of this approach in enabling soil-aware planning for autonomous terrain shaping.
Understanding reflectance-related quantities for worlds enables effective comparative planetology and strengthens mission planning and execution. Measurements of these properties for Earth, especially its geometric albedo and phase function, have been difficult to achieve due to our Terrestrial situation -- it is challenging to obtain planetary-scale brightness measurements for the world we stand on. Using a curated dataset of visual phase-dependent, disk-averaged observations of Earth taken from the ground and spacecraft, alongside a physical-statistical model, this work arrives at a definitive value for the visual geometric albedo of our planet: 0.242 (+0.005/-0.004). This albedo constraint is up 30--40% smaller than earlier, widely-quoted values. The physical-statistical model enables retrieval-like inferences to be performed on phase curves, and includes contributions from optically thick clouds, optically thin aerosols, Rayleigh scattering, ocean glint, gas absorption, and Lambertian surface reflectance. Detailed application of this inverse model to Earth's phase curve quantifies contributions of these different processes to the phase-dependent brightness of the Pale Blue Dot. Model selection identifies a scenario where aerosol forward scattering results in a false negative for surface habitability detection. Observations of phase curves for Earth at redder-optical or near-infrared wavelengths could disentangle ocean glint effects from aerosol forward scattering and would help with understanding the utility of phase curve observations for the under-development Habitable Worlds Observatory.
This study addresses the challenge of efficiently assigning locomotives in large freight rail networks, where operational complexity and power imbalances make cost-effective planning difficult. It presents a strategic optimization framework for the Locomotive Assignment Problem (LAP), developed in collaboration with a major North American Class I Freight Railroad. The problem is formulated as a network-based integer program over a cyclic space-time network, producing a repeatable weekly locomotive assignment plan. The model captures a comprehensive set of real-world operational constraints and jointly optimizes the placement of pick-up and set-out locomotive work events, improving the effectiveness of downstream planning. To solve large-scale instances exactly for the first time, novel reduction rules are introduced to dramatically reduce the number of light travel arcs in the space-time network. Extensive computational experiments demonstrate the performance and trade-offs on real instances under a variety of practical constraints. Beyond delivering scalable, high-quality solutions, the proposed framework serves as a practical decision-support tool grounded in the operational realities of modern freight railroads.