UAV vision-language navigation (VLN) requires an agent to navigate complex 3D environments from an egocentric perspective while following ambiguous multi-step instructions over long horizons. Existing zero-shot methods remain limited, as they often rely on large base models, generic prompts, and loosely coordinated modules. In this work, we propose FineCog-Nav, a top-down framework inspired by human cognition that organizes navigation into fine-grained modules for language processing, perception, attention, memory, imagination, reasoning, and decision-making. Each module is driven by a moderate-sized foundation model with role-specific prompts and structured input-output protocols, enabling effective collaboration and improved interpretability. To support fine-grained evaluation, we construct AerialVLN-Fine, a curated benchmark of 300 trajectories derived from AerialVLN, with sentence-level instruction-trajectory alignment and refined instructions containing explicit visual endpoints and landmark references. Experiments show that FineCog-Nav consistently outperforms zero-shot baselines in instruction adherence, long-horizon planning, and generalization to unseen environments. These results suggest the effectiveness of fine-grained cognitive modularization for zero-shot aerial navigation. Project page: https://smartdianlab.github.io/projects-FineCogNav.
Decision-makers rely on weather forecasts to plant crops, manage wildfires, allocate water and energy, and prepare for weather extremes. Today, such forecasts enjoy unprecedented accuracy out to two weeks thanks to steady advances in physics-based dynamical models and data-driven artificial intelligence (AI) models. However, model skill drops precipitously at subseasonal timescales (2 - 6 weeks ahead), due to compounding errors and persistent biases. To counter this degradation, we introduce probabilistic bias correction (PBC), a machine learning framework that substantially reduces systematic error by learning to correct historical probabilistic forecasts. When applied to the leading dynamical and AI models from the European Centre for Medium-Range Weather Forecasts (ECMWF), PBC doubles the subseasonal skill of the AI Forecasting System and improves the skill of the operationally-debiased dynamical model for 91% of pressure, 92% of temperature, and 98% of precipitation targets. We designed PBC for operational deployment, and, in ECMWF's 2025 real-time forecasting competition, its global forecasts placed first for all weather variables and lead times, outperforming the dynamical models from six operational forecasting centers, an international dynamical multi-model ensemble, ECMWF's AI Forecasting System, and the forecasting systems of 34 teams worldwide. These probabilistic skill gains translate into more accurate prediction of extreme events and have the potential to improve agricultural planning, energy management, and disaster preparedness in vulnerable communities.
We propose a descriptive, realization-centred framework for detecting and characterising explosive and co-explosive behaviour in economic time series, which we term path-explosive behaviour. Departing from the data-generating-process (DGP) perspective that underlies recursive unit root testing, the approach operates directly on observable path properties of the realised series. Four diagnostic layers -- level geometry, growth rate dynamics, normalised curvature, and log-space behaviour -- yield statistics that discriminate between genuine self-reinforcing multiplicative growth and I(2) dynamics without distributional assumptions or asymptotic critical values. Two theoretically motivated absolute gate thresholds screen detected episodes before a composite intensity score is assigned. Co-explosive behaviour between pairs of series is assessed at the episode level through a Jaccard co-occurrence index and non-parametric intensity concordance measures. The theoretical motivation draws on the path dependence and planning irreversibility literatures to argue that, in settings where discrete institutional decisions shape growth trajectories, a realization-centred characterisation is epistemically more appropriate than a DGP-based test. A simulation study across four DGP regimes validates the framework's discriminating power and conservatism. An empirical application to real house prices, commodity prices, public debt, and Spanish tourism destinations illustrates the empirical content of the path-explosive concept and distinguishes it from speculative bubble detection.
State-of-the-art multimodal journey-planning algorithms, such as ULTRA, have recently been adapted to account for delays. In this work, we extend this approach to be more memory-efficient, faster, and accurate. We also adapt this framework to other state-of-the-art algorithms, like CSA and RAPTOR. We demonstrate a speedup of 1.9-4.2x over existing algorithms in the single-criterion search. In the multicriteria setting, we achieve competitive speedup results but greater accurateness. We also found that our method scales much better as the delay increases.
The evolution of 6th generation (6G) networks increasingly relies on satellite-based Non-Terrestrial Networks (NTNs) to extend broadband connectivity to remote and unserved regions, and to support public safety. In this paper we compare two representative and conceptually different satellite constellation architectures, namely Starlink and IRIS 2. Starlink is a commercial private Internet constellation by SpaceX, based on dense Low Earth Orbit (LEO) satellites. It is primarily designed to deliver high-capacity broadband services for civil applications, with performance targets comparable to those of terrestrial networks. In contrast, IRIS 2 is a planned public initiative to be deployed by the European Union, based on a multi-layer combination of LEO, Medium Earth Orbit (MEO), and Geo-stationary Earth Orbit (GEO) satellites. It is primarily designed to provide a secure, resilient, and sovereign infrastructure for government and critical communications. After describing the main technical characteristics of Starlink and IRIS 2, we run a comprehensive simulation campaign to evaluate the design tradeoffs between the two. Specifically, we evaluate the per-cell and per-user achievable capacity, the impact of satellite mobility and handover, and identify the capability of each architecture to support global and reliable connectivity. We also provide design suggestions for possible future IRIS 2 deployment extensions.
We discuss the status and progress of recent efforts to modernize the International Lattice Data Grid(ILDG).This includes activities of the metadata and middleware workinggroups concerning deployment and operation of crucial services (user management, metadata catalogues, file catalogues) and extensions of the metadata format, which have been tailored according to the needs of the large collaborations. We also report on developments and extensions that are planned to be addressed in the foreseeable future.
Gas infrastructure datasets are essential inputs for energy system planning to support strategic decision-making toward decarbonization. However, relevant data are typically scattered across heterogeneous sources, including geospatial datasets, image-based infrastructure plans, and tabular data, making it complex, time-consuming, and error-prone to create topology-consistent network representations with existing tools.This paper presents QGas, an interactive toolkit for visualizing, creating, and collaboratively extending georeferenced gas infrastructure datasets. QGas integrates GIS-based geometry editing with topology-preserving graph operations in a unified web-based environment, enabling users to digitize infrastructure plans, edit network elements, manage attributes, and perform topology-consistent modifications while maintaining a georeferenced representation of the system. The toolkit is implemented using a modular architecture based on Python, JavaScript, and the Leaflet mapping library. An illustrative example demonstrates its application in extending a natural gas dataset to include hydrogen and CO2 infrastructure, highlighting QGas's capability to support the preparation of consistent multi-carrier gas infrastructure datasets for energy system planning.
Growing privacy regulations and internal governance mandates are driving demand for fine-grained, context-sensitive access control in data management systems. Among competing approaches, content-based access control -- where access decisions depend on the data values referenced by a query -- is becoming particularly prominent, and is supported directly in modern database engines. While simple content-based predicates often incur negligible overhead, increasingly rich policies can interact in subtle ways with query optimization, leading to significant and poorly understood performance variability. This paper investigates this gap by introducing a structural framework and expressive policy grammar for modelling content-based compliance policies and analysing their impact on query planning and execution in database systems. Building on this framework, we augment an analytical benchmark with structured policy workloads, enabling controlled evaluation of enforcement mechanisms and optimization strategies under combined query - policy workloads. Our experimental results show that policy structure has a decisive impact on optimizer behaviour and end-to-end performance, underscoring the need for policy-aware database and optimizer design.
The strategic placement of bike-sharing infrastructure shapes urban accessibility and mobility outcomes. However, station-allocation approaches vary in their assumptions and decision logic. This study examines how alternative modelling paradigms prioritise urban space when applied to the same planning problem in Trondheim, Norway. We developed a unified analytical framework to compare three location-allocation approaches: weighted linear combination (WLC), maximal covering location problem (MCLP), and a data-driven suitability score based on exogenous spatial features (SSE). Each model designs a 68-station bike-sharing network from scratch using the same 24 spatial features and hierarchical weighting scheme. The resulting configurations are compared with the existing network, and consensus-based synthesis identifies 12 priority locations for expansion. The findings reveal systematic differences in spatial prioritisation across modelling approaches. WLC achieves the strongest coverage of population and transit demand, MCLP produces the widest spatial distribution prioritising geographic reach, and SSE balances demand intensity with accessibility. All model-derived configurations diverge from the existing network, highlighting the influence of historical and institutional factors on real-world deployment. Consensus synthesis identifies 12 expansion sites characterised by multimodal integration potential, underserved residential clusters, and high latent demand. This analysis demonstrates that methodological choices fundamentally shape spatial decision-support outcomes. By systematically evaluating classical optimisation and data-driven approaches under controlled conditions, the study provides evidence-based recommendations for bike-sharing network expansion and clarifies the strengths and limitations of alternative analytical frameworks for location-allocation planning.
Urban bike-sharing systems require strategic station expansion to meet growing demand. Traditional allocation approaches rely on explicit demand modelling that may not capture the urban characteristics distinguishing successful stations. This study addresses the need to exploit patterns from existing stations to inform expansion decisions, particularly in data-constrained environments. We present a data-driven framework leveraging existing stations deemed desirable by operational metrics. A hybrid denoising autoencoder (HDAE) learns compressed latent representations from multi-source grid-level features (socio-demographic, built environment, and transport network), with a supervised classification head regularising the embedding space structure. Expansion candidates are selected via greedy allocation with spatial constraints based on latent-space similarity to existing stations. Evaluation on Trondheim's bike-sharing network demonstrates that HDAE embeddings yield more spatially coherent clusters and allocation patterns than raw features. Sensitivity analyses across similarity methods and distance metrics confirm robustness. A consensus-based procedure across multiple parametrisations distils 32 high-confidence extension zones where all parametrisations agree. The results demonstrate how representation learning captures complex patterns that raw features miss, enabling evidence-based expansion planning without explicit demand modelling. The consensus procedure strengthens recommendations by requiring agreement across parametrisations, while framework configurability allows planners to incorporate operational knowledge. The methodology generalises to any location-allocation problem where existing desirable instances inform the selection of new candidates.
Traffic simulations, essential for planning urban transit infrastructure interventions, require vehicle-category-specific origin-destination (OD) data. Existing data sources are imperfect: sparse tollbooth sensors provide accurate vehicle counts by category, while extensive mobility data from cellular network activity captures aggregated crowd movement, but lack modal disaggregation and have systematic biases. This study develops a machine learning framework to correct and disaggregate cellular network data using sparse tollbooth counts as ground truth. The model uses temporal and spatial features to learn the complex relationship between aggregated mobility data and vehicular data. The framework infers destinations from transit routes and implements routing logic to distribute corrected flows between OD pairs. This approach is applied to a bus depot expansion in Trondheim, Norway, generating hourly OD matrices by vehicle length category. The results show how limited but accurate sensor measurements can correct extensive but aggregated mobility data to produce grounded estimates of background vehicular traffic flows. These macro-scale estimates can be refined for micro-scale analysis at desired locations. The framework provides a generalisable approach for generating origin-destination data from cellular network data. This enables downstream tasks, like detailed traffic simulations for infrastructure planning in data-scarce contexts, supporting urban planners in making informed decisions.
Recent end-to-end spoken dialogue models enable natural interaction. However, as user demands become increasingly complex, models that rely solely on conversational abilities often struggle to cope. Incorporating agentic capabilities is therefore essential: by enabling tool use, these models can extend their knowledge boundaries and better solve real-world tasks. Yet, existing research has largely concentrated on core perception and generation, with comparatively limited exploration of such tool-augmented extensions. To bridge this gap, we present VoxMind, an integrated framework designed to equip end-to-end spoken dialogue models with comprehensive agentic abilities. Leveraging our curated 470-hour AgentChat dataset, we incorporate a "Think-before-Speak" mechanism, enabling the model to internalize structured reasoning as a critical prerequisite for planning and response generation. Furthermore, to mitigate latency bottlenecks caused by large-scale tool integration, we propose a Multi-Agent Dynamic Tool Management architecture. By asynchronously delegating retrieval tasks to an auxiliary agent aligned with the main model's reasoning trajectory, this system effectively decouples inference latency from toolset size. Experimental results confirm that VoxMind achieves significant improvements in agent performance: compared with strong baselines, the task completion rate increases from 34.88% to 74.57%, outperforming Gemini-2.5-Pro on spoken agent tasks while preserving general conversational quality. The source code and associated data are publicly available at https://github.com/MM-Speech/VoxMind.
Active inference, a neurally-inspired model for inferring actions based on the free energy principle (FEP), has been proposed as a unifying framework for understanding perception, action, and learning in the brain. Active inference has previously been used to model ecologically important tasks such as navigation and planning, but scaling it to solve complex large-scale problems in real-world environments has remained a challenge. Inspired by the existence of multi-scale hierarchical representations in the brain, we propose a model for planning of actions based on hierarchical active inference. Our approach combines a hierarchical model of the environment with successor representations for efficient planning. We present results demonstrating (1) how lower-level successor representations can be used to learn higher-level abstract states, (2) how planning based on active inference at the lower-level can be used to bootstrap and learn higher-level abstract actions, and (3) how these learned higher-level abstract states and actions can facilitate efficient planning. We illustrate the performance of the approach on several planning and reinforcement learning (RL) problems including a variant of the well-known four rooms task, a key-based navigation task, a partially observable planning problem, the Mountain Car problem, and PointMaze, a family of navigation tasks with continuous state and action spaces. Our results represent, to our knowledge, the first application of learned hierarchical state and action abstractions to active inference in FEP-based theories of brain function.
Vision-Language-Action (VLA) models have demonstrated significant potential for embodied decision-making; however, their application in complex chemical laboratory automation remains restricted by limited long-horizon reasoning and the absence of persistent experience accumulation. Existing frameworks typically treat planning and execution as decoupled processes, often failing to consolidate successful strategies, which results in inefficient trial-and-error in multi-stage protocols. In this paper, we propose ChemBot, a dual-layer, closed-loop framework that integrates an autonomous AI agent with a progress-aware VLA model (Skill-VLA) for hierarchical task decomposition and execution. ChemBot utilizes a dual-layer memory architecture to consolidate successful trajectories into retrievable assets, while a Model Context Protocol (MCP) server facilitates efficient sub-agent and tool orchestration. To address the inherent limitations of VLA models, we further implement a future-state-based asynchronous inference mechanism to mitigate trajectory discontinuities. Extensive experiments on collaborative robots demonstrate that ChemBot achieves superior operational safety, precision, and task success rates compared to existing VLA baselines in complex, long-horizon chemical experimentation.
Continuum robots are well suited for navigating confined and fragile environments, such as vascular or endoluminal anatomy, where contact with surrounding structures is often unavoidable. While controlled contact can assist motion, unfavorable contact can degrade controllability, induce kinematic singularities, or introduce safety risks. We present a contact-aware planning approach that evaluates contact quality, penalizing hazardous interactions, while permitting benign contact. The planner produces kinematically feasible trajectories and contact-aware Jacobians which can be used for closed-loop control in hardware experiments. We validate the approach by testing the integrated system (planning, control, and mechanical design) on anatomical models from patient scans. The planner generates effective plans for three common anatomical environments, and, in all hardware trials, the continuum robot was able to reach the target while avoiding dangerous tip contact (100% success). Mean tracking errors were 1.9 +/- 0.5 mm, 1.2 +/- 0.1 mm, and 1.7 +/- 0.2 mm across the three different environments. Ablation studies showed that penalizing end-of-continuum-segment (ECS) contact improved manipulability and prevented hardware failures. Overall, this work enables reliable, contact-aware navigation in highly constrained environments.
Rules files (e.g., AGENTS.md, CLAUDE.md) are the primary mechanism for human-agent alignment when developers vibe code. However, they remain passive: it is not immediately apparent when rules are being used or followed, or how to improve them. To transform rules from passive text into active controls, we introduce ZORO, an interactive interface that integrates directly with a coding agent and anchors rules to every step of the coding process. After an agent generates an initial plan, ZORO enriches the plan with rules, enforces the rules during implementation by requiring the agent prove that each rule was followed, and allows users to provide in-situ feedback when they are unsatisfied with a rule application to evolve the ruleset. A technical evaluation shows that coding agents follow rules more with ZORO than without. A user study demonstrates a change in people's behavior and cognitive strategies when rules are at the forefront of vibe coding. We discuss how making rules active in agentic systems unlocks broader opportunities for human-agent alignment in coding settings and beyond.
Latent diffusion models have emerged as powerful generative models in medical imaging, enabling the synthesis of high quality brain magnetic resonance imaging scans. In particular, predicting the evolution of a patients brain can aid in early intervention, prognosis, and treatment planning. In this study, we introduce CLIMB, Controllable Longitudinal brain Image generation via state space based latent diffusion model, an advanced framework for modeling temporal changes in brain structure. CLIMB is designed to model the structural evolution of the brain structure over time, utilizing a baseline MRI scan and its acquisition age as foundational inputs. Additionally, multiple conditional variables, including projected age, gender, disease status, genetic information, and brain structure volumes, are incorporated to enhance the temporal modeling of anatomical changes. Unlike existing LDM methods that rely on self attention modules, which effectively capture contextual information from input images but are computationally expensive, our approach leverages state space, a state space model architecture that substantially reduces computational overhead while preserving high-quality image synthesis. Furthermore, we introduce a Gaussian-aligned autoencoder that extracts latent representations conforming to prior distributions without the sampling noise inherent in conventional variational autoencoders. We train and evaluate our proposed model on the Alzheimers Disease Neuroimaging Initiative dataset, consisting of 6,306 MRI scans from 1,390 participants. By comparing generated images with real MRI scans, CLIMB achieves a structural similarity index of 0.9433, demonstrating notable improvements over existing methods.
Smart healthcare industry is increasingly relying on Internet of Things (IoT) devices to improve patient care and operational efficiency. However, the cryptographic algorithms that enable fundamental security and are widely used in these cyber systems are vulnerable to attacks by emerging quantum computers - known as Quantum Threat. This paper examines the quantum threat to healthcare IoT across the four layers of the IoT architecture: physical, network, perception, and application. It proposes a comprehensive migration framework integrating a phased hybrid approach with crypto-agility to transition healthcare IoT systems to quantum-safe cryptography. This framework prioritises resource-constrained devices, emphasises interoperability, and considers the challenges of vendor readiness and infrastructure upgrades. This paper contributes a detailed, phased migration plan specifically tailored to the unique security needs and resource limitations of IoT-based healthcare systems.
Planning safe trajectories under model uncertainty is a fundamental challenge. Robust planning ensures safety by considering worst-case realizations, yet ignores uncertainty reduction and leads to overly conservative behavior. Actively reducing uncertainty on-the-fly during a nominal mission defines the dual control problem. Most approaches address this by adding a weighted exploration term to the cost, tuned to trade off the nominal objective and uncertainty reduction, but without formal consideration of when exploration is beneficial. Moreover, safety is enforced in some methods but not in others. We study a budget-constrained dual control problem, where uncertainty is reduced subject to safety and a mission-level cost budget that limits the allowable degradation in task performance due to exploration. In this work, we propose Dual-gatekeeper, a framework that integrates robust planning with active exploration under formal guarantees of safety and budget feasibility. The key idea is that exploration is pursued only when it provides a verifiable improvement without compromising safety or violating the budget, enabling the system to balance immediate task performance with long-term uncertainty reduction in a principled manner. We provide two implementations of the framework based on different safety mechanisms and demonstrate its performance on quadrotor navigation and autonomous car racing case studies under parametric uncertainty.
As AI systems scale to multi-chiplet and wafer-level architectures, the demand for ultra-high bandwidth and system scalability has outpaced the capabilities of electrical interconnects and computing units. Large-scale heterogeneous electronic-photonic integrated chiplets (EPICs) provide a promising solution, but their practical adoption is limited by the lack of a unified, fabrication-aware physical design automation stack. At the same time, inverse-designed ultra-compact photonic devices offer orders-of-magnitude improvements in spatial and spectral density, yet remain constrained by insufficient design-for-manufacturing support and yield optimization. In this work, we present OptoSynthesizer, an end-to-end physical design automation flow for yield-optimized, inverse-designed EPICs. It integrates three key components across the physical design pipeline: (1) OptoSynthesizer-InvDes, a physical-AI-augmented, digital-twin-assisted photonic inverse design and photonics-aware inverse lithography framework; (2) OptoSynthesizer-Place, a GPU-accelerated routing-informed EPIC placer for large-scale routability-optimized layout; and (3) OptoSynthesizer-Route, a hierarchical curvy-aware waveguide router with global-planning-assisted electrical-optical co-routing. Together, these toolkits form a seamless flow from EPIC netlists to fabrication-ready, yield-robust GDS layouts. We demonstrate how this framework enables compact large-scale photonic tensor cores and high-bandwidth interconnect fabrics for heterogeneous EPIC platforms, providing a practical foundation for manufacturable large-scale EPICs in next-generation AI systems.
Deploying learned multi-robot models on heterogeneous robots remains challenging due to hardware heterogeneity, communication constraints, and the lack of a unified execution stack. This paper presents NeuroMesh, a multi-domain, cross-platform, and modular decentralized neural inference framework that standardizes observation encoding, message passing, aggregation, and task decoding in a unified pipeline. NeuroMesh combines a dual-aggregation paradigm for reduction- and broadcast-based information fusion with a parallelized architecture that decouples cycle time from end-to-end latency. Our high-performance C++ implementation leverages Zenoh for inter-robot communication and supports hybrid GPU/CPU inference. We validate NeuroMesh on a heterogeneous team of aerial and ground robots across collaborative perception, decentralized control, and task assignment, demonstrating robust operation across diverse task structures and payload sizes. We plan to release NeuroMesh as an open-source framework to the community.
For the distance cost $c(x,y)=|x-y|$, the set $O(μ,ν)$ of $W_1$-optimal plans is generally not a singleton. Under the classical absolute-continuity hypotheses in the Euclidean case, secondary variational selection by the quadratic energy $C_2$ yields the ray-monotone $W_1$-optimal plan. We provide a counterexample to an open problem posed by Santambrogio that concerns the stability of this selector under weak convergence of the marginals. More precisely, we construct a fixed absolutely continuous source $μ$ and absolutely continuous targets $ν_n\rightharpoonupν$ such that $γ^{\mathrm{sel}}(μ,ν_n)\rightharpoonupγ^{\mathrm{hom}}$, where $γ^{\mathrm{hom}}\in O(μ,ν)$ but $γ^{\mathrm{hom}}\neqγ^{\mathrm{sel}}(μ,ν)$. We also identify the narrow Kuratowski limit of the optimal-plan sets $O(μ,ν_n)$, derive the constrained $Γ$-limit for secondary energies of the form $\int Φ(|x-y|)\,dγ$ with $Φ\in C([0,2])$, and deduce a non-commutation result for the additive perturbation $c_\varepsilon(x,y)=|x-y|+\varepsilon|x-y|^2$.
Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare and biomedical research. Recent deep research systems aim to accelerate evidence-grounded scientific discovery by integrating AI agents with multi-hop information retrieval, reasoning, and synthesis. However, most existing systems lack explicit and inspectable criteria for evidence appraisal, creating a risk of compounding errors and making it difficult for researchers and clinicians to assess the reliability of their outputs. In parallel, current benchmarking approaches rarely evaluate performance on complex, real-world medical questions. Here, we introduce DeepER-Med, a Deep Evidence-based Research framework for Medicine with an agentic AI system. DeepER-Med frames deep medical research as an explicit and inspectable workflow of evidence-based generation, consisting of three modules: research planning, agentic collaboration, and evidence synthesis. To support realistic evaluation, we also present DeepER-MedQA, an evidence-grounded dataset comprising 100 expert-level research questions derived from authentic medical research scenarios and curated by a multidisciplinary panel of 11 biomedical experts. Expert manual evaluation demonstrates that DeepER-Med consistently outperforms widely used production-grade platforms across multiple criteria, including the generation of novel scientific insights. We further demonstrate the practical utility of DeepER-Med through eight real-world clinical cases. Human clinician assessment indicates that DeepER-Med's conclusions align with clinical recommendations in seven cases, highlighting its potential for medical research and decision support.
The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.
High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of corrective negative feedback when trained purely with imitation learning. To address these issues, we propose RAD-2, a unified generator-discriminator framework for closed-loop planning. Specifically, a diffusion-based generator is used to produce diverse trajectory candidates, while an RL-optimized discriminator reranks these candidates according to their long-term driving quality. This decoupled design avoids directly applying sparse scalar rewards to the full high-dimensional trajectory space, thereby improving optimization stability. To further enhance reinforcement learning, we introduce Temporally Consistent Group Relative Policy Optimization, which exploits temporal coherence to alleviate the credit assignment problem. In addition, we propose On-policy Generator Optimization, which converts closed-loop feedback into structured longitudinal optimization signals and progressively shifts the generator toward high-reward trajectory manifolds. To support efficient large-scale training, we introduce BEV-Warp, a high-throughput simulation environment that performs closed-loop evaluation directly in Bird's-Eye View feature space via spatial warping. RAD-2 reduces the collision rate by 56% compared with strong diffusion-based planners. Real-world deployment further demonstrates improved perceived safety and driving smoothness in complex urban traffic.
Many SLT systems quietly assume that brief chunks of signing map directly to spoken-language words. That assumption breaks down because signers often create meaning on the fly using context, space, and movement. We revisit SLT and argue that it is mainly a cross-modal reasoning task, not just a straightforward video-to-text conversion. We thus introduce a reasoning-driven SLT framework that uses an ordered sequence of latent thoughts as an explicit middle layer between the video and the generated text. These latent thoughts gradually extract and organize meaning over time. On top of this, we use a plan-then-ground decoding method: the model first decides what it wants to say, and then looks back at the video to find the evidence. This separation improves coherence and faithfulness. We also built and released a new large-scale gloss-free SLT dataset with stronger context dependencies and more realistic meanings. Experiments across several benchmarks show consistent gains over existing gloss-free methods. Our code and data are available at https://github.com/fletcherjiang/SignThought.
The MCORD detector is a modular scintillator-based system employing silicon photomultipliers (SiPMs) and FPGA-based digital signal processing, designed for applications such as cosmic muon detection, veto systems, and detector calibration support. In this work, we investigate the influence of ambient temperature variations on detector performance, with particular emphasis on SiPM gain stability. Several automatic temperature compensation loops were implemented to stabilize the operating voltage of the sensors. Based on controlled laboratory measurements, we evaluate the effectiveness of different control strategies, including variations in temperature averaging time and threshold response criteria. The performance of each approach is compared in terms of gain stability and response dynamics. We identify the optimal temperature control configuration for planned MCORD measurements and present recent modifications to the detector electronics, including updated software for AFE control. Additionally, we describe modifications made to the detectors electronics since the previous publication, including new software developed to control the AFE electronics.
This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among agents, through processes of communication, observation, and mutual influence that shape collective behaviour over time. As the object of analysis shifts, a methodological gap emerges. Approaches focused either on single agents or on aggregate outcomes do not identify the interaction-level mechanisms that generate collective risks or the design variables that control them. A framework is required that links local interaction structure to population-level dynamics in a causally explicit way, allowing both explanation and intervention. We introduce two linked concepts. Agentic microphysics defines the level of analysis: local interaction dynamics where one agent's output becomes another's input under specific protocol conditions. Generative safety defines the methodology: growing phenomena and elicit risks from micro-level conditions to identify sufficient mechanisms, detect thresholds, and design effective interventions.
Optimized charging of electric vehicles (EVs) at public locations consists of two decisions: how much energy to deliver at what times, which is continuous, and where to plug in, which is binary. This makes optimizing EV charging a mixed-integer linear program (MILP). This discreteness undermines traditional marginal pricing methods. In this paper, we develop the first marginal-price-based mechanism for pricing EV charging with binary station access constraints. Using the result of Burer (2009), we express the EV charging as a completely positive program (CPP), whose dual is a copositive program (COP). This convex dual admits valid shadow prices even though the original allocation problem is discrete and nonconvex. By interpreting the COP dual variables as marginal prices, we construct a pricing mechanism that captures EV supply equipment (EVSE) congestion as well as charging-capacity limits. We prove that the resulting mechanism is revenue-adequate for the operator and individually rational for every EV user, in the strong sense that each user maximizes their own welfare by accepting their assigned charging plan rather than deviating to any alternative option. We further develop problem-specific inner-approximation and dimension-reduction techniques that substantially improve the computational tractability of solving the COP in our setting. Numerical experiments on both small and large scale charging instances demonstrate that our pricing mechanism captures discrete congestion effects and aligns user incentives with the system-optimal assignment, outperforming time-of-use (TOU) and convex relaxation benchmarks.
Coverage path planning on irregular hexagonal grids is relevant to maritime surveillance, search and rescue and environmental monitoring, yet classical methods are often compared on small ad hoc examples or on rectangular grids. This paper presents a reproducible benchmark of deterministic single-vehicle coverage path planning heuristics on irregular hexagonal graphs derived from synthetic but maritime-motivated areas of interest. The benchmark contains 10,000 Hamiltonian-feasible instances spanning compact, elongated, and irregular morphologies, 17 heuristics from seven families, and a common evaluation protocol covering Hamiltonian success, complete-coverage success, revisits, path length, heading changes, and CPU latency. Across the released dataset, heuristics with explicit shortest-path reconnection solve the relaxed coverage task reliably but almost never produce zero-revisit tours. Exact Depth-First Search confirms that every released instance is Hamiltonian-feasible. The strongest classical Hamiltonian baseline is a Warnsdorff variant that uses an index-based tie-break together with a terminal-inclusive residual-degree policy, reaching 79.0% Hamiltonian success. The dominant design choice is not tie-breaking alone, but how the residual degree is defined when the endpoint is reserved until the final move. This shows that underreported implementation details can materially affect performance on sparse geometric graphs with bottlenecks. The benchmark is intended as a controlled testbed for heuristic analysis rather than as a claim of operational optimality at fleet scale.