Single-cell RNA sequencing (scRNA-seq) technologies have enabled the profiling of gene expression for a collection of cells across time during a dynamic biological process. Given that each time point provides only a static snapshot, modeling and understanding the underlying cellular dynamics remains a central yet challenging task in modern genomics. To associate biological time with single cell distributions, we develop a robust local Fr\'echet regression for interpolating the high-dimensional cellular distribution at any given time point using data observed over a finite time points. To allow for robustness in cell distributions, we propose to apply the unbalanced optimal transport-based Wasserstein distance in our local Fr\'echet regression analysis. We develop a computationally efficient algorithm to generate the cell distribution for a given time point using generative neural networks. The resulting single cell generated models and the corresponding transport plans can be use to interpolate the single cells at any unobserved time point and to track the cell trajectory during the cell differentiation process. We demonstrate the methods using three single cell differentiation data sets, including differentiation of human embryonic stem cells into embryoids, mouse hematopoietic and progenitor cell differentiation, and reprogramming of mouse embryonic fibroblasts. We show that the proposed methods lead to better single cell interpolations, reveal different cell differential trajectories, and identify early genes that regulate these cell trajectories.
High-dose-rate (HDR) brachytherapy plays a critical role in the treatment of locally advanced cervical cancer but remains highly dependent on manual treatment planning expertise. The objective of this study is to develop a fully automated HDR brachytherapy planning framework that integrates reinforcement learning (RL) and dose-based optimization to generate clinically acceptable treatment plans with improved consistency and efficiency. We propose a hierarchical two-stage autoplanning framework. In the first stage, a deep Q-network (DQN)-based RL agent iteratively selects treatment planning parameters (TPPs), which control the trade-offs between target coverage and organ-at-risk (OAR) sparing. The agent's state representation includes both dose-volume histogram (DVH) metrics and current TPP values, while its reward function incorporates clinical dose objectives and safety constraints, including D90, V150, V200 for targets, and D2cc for all relevant OARs (bladder, rectum, sigmoid, small bowel, and large bowel). In the second stage, a customized Adam-based optimizer computes the corresponding dwell time distribution for the selected TPPs using a clinically informed loss function. The framework was evaluated on a cohort of patients with complex applicator geometries. The proposed framework successfully learned clinically meaningful TPP adjustments across diverse patient anatomies. For the unseen test patients, the RL-based automated planning method achieved an average score of 93.89%, outperforming the clinical plans which averaged 91.86%. These findings are notable given that score improvements were achieved while maintaining full target coverage and reducing CTV hot spots in most cases.
Automatic segmentation of anatomical structures is critical in medical image analysis, aiding diagnostics and treatment planning. Skin segmentation plays a key role in registering and visualising multimodal imaging data. 3D skin segmentation enables applications in personalised medicine, surgical planning, and remote monitoring, offering realistic patient models for treatment simulation, procedural visualisation, and continuous condition tracking. This paper analyses and compares algorithmic and AI-driven skin segmentation approaches, emphasising key factors to consider when selecting a strategy based on data availability and application requirements. We evaluate an iterative region-growing algorithm and the TotalSegmentator, a deep learning-based approach, across different imaging modalities and anatomical regions. Our tests show that AI segmentation excels in automation but struggles with MRI due to its CT-based training, while the graphics-based method performs better for MRIs but introduces more noise. AI-driven segmentation also automates patient bed removal in CT, whereas the graphics-based method requires manual intervention.
Musculoskeletal disorders (MSDs) are a leading cause of disability worldwide, requiring advanced diagnostic and therapeutic tools for personalised assessment and treatment. Effective management of MSDs involves the interaction of heterogeneous data sources, making the Digital Twin (DT) paradigm a valuable option. This paper introduces the Musculoskeletal Digital Twin (MS-DT), a novel framework that integrates multiscale biomechanical data with computational modelling to create a detailed, patient-specific representation of the musculoskeletal system. By combining motion capture, ultrasound imaging, electromyography, and medical imaging, the MS-DT enables the analysis of spinal kinematics, posture, and muscle function. An interactive visualisation platform provides clinicians and researchers with an intuitive interface for exploring biomechanical parameters and tracking patient-specific changes. Results demonstrate the effectiveness of MS-DT in extracting precise kinematic and dynamic tissue features, offering a comprehensive tool for monitoring spine biomechanics and rehabilitation. This framework provides high-fidelity modelling and real-time visualization to improve patient-specific diagnosis and intervention planning.
Remote sensing has emerged as a critical tool for large-scale Earth monitoring and land management. In this paper, we introduce AgriPotential, a novel benchmark dataset composed of Sentinel-2 satellite imagery spanning multiple months. The dataset provides pixel-level annotations of agricultural potentials for three major crop types - viticulture, market gardening, and field crops - across five ordinal classes. AgriPotential supports a broad range of machine learning tasks, including ordinal regression, multi-label classification, and spatio-temporal modeling. The data covers diverse areas in Southern France, offering rich spectral information. AgriPotential is the first public dataset designed specifically for agricultural potential prediction, aiming to improve data-driven approaches to sustainable land use planning. The dataset and the code are freely accessible at: https://zenodo.org/records/15556484
The collaboration and interaction of multiple robots have become integral aspects of smart manufacturing. Effective planning and management play a crucial role in achieving energy savings and minimising overall costs. This paper addresses the real-time Dynamic Multiple Sources to Single Destination (DMS-SD) navigation problem, particularly with a material distribution case for multiple intelligent robots in smart manufacturing. Enumerated solutions, such as in \cite{xiao2022efficient}, tackle the problem by generating as many optimal or near-optimal solutions as possible but do not learn patterns from the previous experience, whereas the method in \cite{xiao2023collaborative} only uses limited information from the earlier trajectories. Consequently, these methods may take a considerable amount of time to compute results on large maps, rendering real-time operations impractical. To overcome this challenge, we propose a lightweight Deep Reinforcement Learning (DRL) method to address the DMS-SD problem. The proposed DRL method can be efficiently trained and rapidly converges to the optimal solution using the designed target-guided reward function. A well-trained DRL model significantly reduces the computation time for the next movement to a millisecond level, which improves the time up to 100 times in our experiments compared to the enumerated solutions. Moreover, the trained DRL model can be easily deployed on lightweight devices in smart manufacturing, such as Internet of Things devices and mobile phones, which only require limited computational resources.
Relational Graph Neural Networks (R-GNNs) are a GNN-based approach for learning value functions that can generalise to unseen problems from a given planning domain. R-GNNs were theoretically motivated by the well known connection between the expressive power of GNNs and $C_2$, first-order logic with two variables and counting. In the context of planning, $C_2$ features refer to the set of formulae in $C_2$ with relations defined by the unary and binary predicates of a planning domain. Some planning domains exhibit optimal value functions that can be decomposed as arithmetic expressions of $C_2$ features. We show that, contrary to empirical results, R-GNNs cannot learn value functions defined by $C_2$ features. We also identify prior GNN architectures for planning that may better learn value functions defined by $C_2$ features.
The next generation of radio surveys is going to be transformative for cosmology and other aspects of our understanding of astrophysics. Realistic simulations of radio observations are essential for the design and planning of radio surveys. They are employed in the development of methods for tasks, such as data calibration and reduction, automated analysis and statistical studies in cosmology. We implemented a software for machine learning-assisted simulations of realistic surveys with the LOFAR telescope, resulting in a synthetic radio sky model and a corresponding artificial telescope observation. We employed a diffusion model trained on LoTSS observations to generate individual radio galaxy images with control over the angular size. Single sources are assembled into a radio sky model, using an input catalog from cosmological simulations. We then transformed this sky model into visibilities corresponding to a typical LoTSS pointing. We added realistic noise to this synthetic measurement and obtained our final simulated sky maps through deconvolution. We explored different ways to evaluate our resulting sky model. We were able to simulate realistic LOFAR observations, covering a sky patch of 5x5 degrees at an effective resolution of 8.5 arcseconds. The simulated sources have flux and size distributions that match real observations, and the resulting maps have sensitivities compatible with LoTSS observations. Our diffusion model is able to synthesize high-quality realistic radio galaxy images with precise control over the source sizes. This software can readily be applied to other instruments.
To evaluate methodological challenges and regulatory considerations of indirect treatment comparisons (ITCs) with the analysis of international health technology assessment guidelines and French Transparency Committee (TC) decisions. We conducted a pragmatic review of ITC guidelines from major health technology assessment (HTA) bodies and multistakeholder organizations. Then, we analyzed TC opinions published between 2021-2023. We extracted data on ITC methodology, therapeutic areas, acceptability, and limitations expressed by the TC. The targeted review of the main guidelines showed mainly agreements between HTA bodies and multistakeholder organizations, with some specificities. 138 TC opinions containing 195 ITCs were analyzed. Only 13.3% of these ITCs influenced TC decision-making. ITCs were more frequently accepted in genetic diseases (34.4%) compared to oncology (10.0%) and autoimmune diseases (11.1%). Methods using individual patient data showed higher acceptance rates (23.1%) than network meta-analyses (4.2%). Main limitations included heterogeneity/bias risk (59%), lack of data (48%), statistical methodology issues (29%), study design concerns (27%), small sample size (25%), and outcome definition variability (20%). When ITCs were the primary source of evidence, the proportion of important clinical benefit was lower (60.9% vs. 73.4%) than when randomized controlled trials were available. While ITCs are increasingly submitted, particularly where direct evidence is impractical, their influence on reimbursement decisions remains limited. There is a need for clear and accessible guides so manufacturers can produce clearer and more robust ITCs that follow regulatory guidelines, from the planning phase to execution.
This paper is concerned with a discounted stochastic optimal control problem for regime switching diffusion in an infinite horizon. First, as a preliminary with particular interests in its own right, the global well-posedness of infinite horizon forward and backward stochastic differential equations with Markov chains and the asymptotic property of their solutions when time goes to infinity are obtained. Then, a sufficient stochastic maximum principle for optimal controls is established via a dual method under certain convexity condition of the Hamiltonian. As an application of our maximum principle, a linear quadratic production planning problem is solved with an explicit feedback optimal production rate. The existence and uniqueness of a non-negative solution to the associated algebraic Riccati equation are proved. Numerical experiments are reported to illustrate the theoretical results, especially, the monotonicity of the value function on various model parameters.
Background: Coronary artery disease (CAD) remains one of the leading causes of mortality worldwide. Precise segmentation of coronary arteries from invasive coronary angiography (ICA) is critical for effective clinical decision-making. Objective: This study aims to propose a novel deep learning model based on frequency-domain analysis to enhance the accuracy of coronary artery segmentation and stenosis detection in ICA, thereby offering robust support for the stenosis detection and treatment of CAD. Methods: We propose the Frequency-Domain Attention-Guided Diffusion Network (FAD-Net), which integrates a frequency-domain-based attention mechanism and a cascading diffusion strategy to fully exploit frequency-domain information for improved segmentation accuracy. Specifically, FAD-Net employs a Multi-Level Self-Attention (MLSA) mechanism in the frequency domain, computing the similarity between queries and keys across high- and low-frequency components in ICAs. Furthermore, a Low-Frequency Diffusion Module (LFDM) is incorporated to decompose ICAs into low- and high-frequency components via multi-level wavelet transformation. Subsequently, it refines fine-grained arterial branches and edges by reintegrating high-frequency details via inverse fusion, enabling continuous enhancement of anatomical precision. Results and Conclusions: Extensive experiments demonstrate that FAD-Net achieves a mean Dice coefficient of 0.8717 in coronary artery segmentation, outperforming existing state-of-the-art methods. In addition, it attains a true positive rate of 0.6140 and a positive predictive value of 0.6398 in stenosis detection, underscoring its clinical applicability. These findings suggest that FAD-Net holds significant potential to assist in the accurate diagnosis and treatment planning of CAD.
In end-to-end autonomous driving,the motion prediction plays a pivotal role in ego-vehicle planning. However, existing methods often rely on globally aggregated motion features, ignoring the fact that planning decisions are primarily influenced by a small number of locally interacting agents. Failing to attend to these critical local interactions can obscure potential risks and undermine planning reliability. In this work, we propose FocalAD, a novel end-to-end autonomous driving framework that focuses on critical local neighbors and refines planning by enhancing local motion representations. Specifically, FocalAD comprises two core modules: the Ego-Local-Agents Interactor (ELAI) and the Focal-Local-Agents Loss (FLA Loss). ELAI conducts a graph-based ego-centric interaction representation that captures motion dynamics with local neighbors to enhance both ego planning and agent motion queries. FLA Loss increases the weights of decision-critical neighboring agents, guiding the model to prioritize those more relevant to planning. Extensive experiments show that FocalAD outperforms existing state-of-the-art methods on the open-loop nuScenes datasets and closed-loop Bench2Drive benchmark. Notably, on the robustness-focused Adv-nuScenes dataset, FocalAD achieves even greater improvements, reducing the average colilision rate by 41.9% compared to DiffusionDrive and by 15.6% compared to SparseDrive.
The rapid increase in satellite launches in recent years, and the pressure of launches planned into the next decade, demands an improvement in the efficiency of space domain awareness facilities. Optical facilities form an important component of global space domain awareness capabilities, however traditional optical telescopes are restricted to observing satellites during a small twilight window. In this work we explore expanding this operational period to encompass the entire day to dramatically improve the observing opportunities at a single site. We explore daytime space domain awareness observations with the Huntsman Telescope Pathfinder, an instrument built using predominantly off the self components, and Canon telephoto lenses. We report successful detections and photometric light curves of 81 Starlink satellites from Sun altitudes ranging 20 degrees to midday. Starlink satellites are found to be particularly bright at $3.6 \pm 0.05$ mag, $\sigma = 0.6 \pm 0.05$ mag in Sloan r', or $\sim 11\times$ brighter than twilight conditions. We conclude this surprising observed brightness is due to the contribution of Earthshine beneath the orbiting satellites. We also compare our observations to existing satellite optical brightness models and find that satellite optical brightness during the day can only be well described by a model including an Earthshine component. We find that observed light curves are more complex than simple geometric models predict, but generally agree within an order of magnitude. Finally we suggest improvements to satellite optical brightness models by incorporating weather data to measure the actual Earthshine under a satellite.
Retrieving the phase of a complex-valued field from the measurements of its amplitude is a crucial problem with a wide range of applications in microscopy and ultracold atomic physics. In particular, obtaining an accurate and efficient solution to this problem is a key step in shaping laser beams for trapping atoms in optical tweezer arrays and applying high-fidelity entangling gates on a neutral atom quantum computer. Current approaches to this problem fail to converge on the optimal solution due to a phenomenon known as vortex formation. In this work, we present an efficient optimization algorithm using Optimal Transport. Our approach completely bypasses the creation of phase vortices and allows for a state-of-the-art solution both in terms of accuracy and efficiency. Furthermore, we show a deep theoretical connection between the Optimal Transport plan and the ray-optics limit of the Wigner distribution of the unknown complex-valued field, and show that our method can be used to retrieve the phase-space transformation of any unknown quadratic phase system. Finally, we reinterpret this problem in the modern quantum learning framework. The techniques we develop provide both useful intuition and practical tools for advancing the frontiers of phase retrieval and laser beam shaping.
We present a new program aimed at developing a new generation of micromirror devices specifically tailored for astronomical applications, multi-slit spectroscopy in particular. We first overview the general characteristics of Multi-Object-Spectrographs based on the current Digital Micromirror Devices (DMDs), with particular focus on the newly deployed SAMOS instrument at the 4.1 m SOAR telescope on Cerro Pachon. We illustrate the operational advantages of DMD-based instruments and the technical limitations of the currently available devices, the DMDs produced by Texas Instruments (TI). We then introduce the baseline and target parameters of the new Micro-Mirror-Devices (MMDs) that we plan to develop with the goal of reaching TRL-5 by mid-2029 as required by the Habitable Worlds Observatory (HWO) timeline. We conclude with a brief illustration of the exciting potential of MMD-based spectrographs for an 8 m class space telescope like HWO.
This paper proposes a control-oriented optimization platform for autonomous mobile robots (AMRs), focusing on extending battery life while ensuring task completion. The requirement of fast AMR task planning while maintaining minimum battery state of charge, thus maximizing the battery life, renders a bilinear optimization problem. McCormick envelop technique is proposed to linearize the bilinear term. A novel planning algorithm with relaxed constraints is also developed to handle parameter uncertainties robustly with high efficiency ensured. Simulation results are provided to demonstrate the utility of the proposed methods in reducing battery degradation while satisfying task completion requirements.
Urban digital twins are increasingly perceived as a way to pool the growing digital resources of cities for the purpose of a more sustainable and integrated urban planning. Models and simulations are central to this undertaking: They enable "what if?" scenarios, create insights and describe relationships between the vast data that is being collected. However, the process of integrating and subsequently using models in urban digital twins is an inherently complex undertaking. It raises questions about how to represent urban complexity, how to deal with uncertain assUrban Model Platformtions and modeling paradigms, and how to capture underlying power relations. Existent approaches in the domain largely focus on monolithic and centralized solutions in the tradition of neoliberal city-making, oftentimes prohibiting pluralistic and open interoperable models. Using a participatory design for participatory systems approach together with the City of Hamburg, Germany, we find that an open Urban Model Platform can function both as a public technological backbone for modeling and simulation in urban digital twins and as a socio-technical framework for a collaborative and pluralistic representation of urban processes. Such a platform builds on open standards, allows for a decentralized integration of models, enables communication between models and supports a multi-model approach to representing urban systems.
Lunar surface operations impose stringent requirements on wireless communication systems, including autonomy, robustness to disruption, and the ability to adapt to environmental and mission-driven context. While Space-O-RAN provides a distributed orchestration model aligned with 3GPP standards, its decision logic is limited to static policies and lacks semantic integration. We propose a novel extension incorporating a semantic agentic layer enabled by the Model Context Protocol (MCP) and Agent-to-Agent (A2A) communication protocols, allowing context-aware decision making across real-time, near-real-time, and non-real-time control layers. Distributed cognitive agents deployed in rovers, landers, and lunar base stations implement wireless-aware coordination strategies, including delay-adaptive reasoning and bandwidth-aware semantic compression, while interacting with multiple MCP servers to reason over telemetry, locomotion planning, and mission constraints.
The increasing demand for reliable, high-capacity communication during large-scale outdoor events poses significant challenges for traditional Terrestrial Networks (TNs), which often struggle to provide consistent coverage in high-density environments. This paper presents a novel 6G radio network planning framework that integrates Non-Terrestrial Networks (NTNs) with Reconfigurable Intelligent Surfaces (RISs) to deliver ubiquitous coverage and enhanced network capacity. Our framework overcomes the limitations of conventional deployable base stations by leveraging NTN architectures, including Low Earth Orbit (LEO) satellites and passive RIS platforms seamlessly integrated with Beyond 5G (B5G) TNs. By incorporating advanced B5G technologies such as Massive Multiple Input Multiple Output (mMIMO) and beamforming, and by optimizing spectrum utilization across the C, S, and Ka bands, we implement a rigorous interference management strategy based on a dynamic SINR model. Comprehensive calculations and simulations validate the proposed framework, demonstrating significant improvements in connectivity, reliability, and cost-efficiency in crowded scenarios. This integration strategy represents a promising solution for meeting the evolving demands of future 6G networks.
This paper presents mRAG, a multi-agent retrieval-augmented generation (RAG) framework composed of specialized agents for subtasks such as planning, searching, reasoning, and coordination. Our system uses a self-training paradigm with reward-guided trajectory sampling to optimize inter-agent collaboration and enhance response generation. Evaluated on DataMorgana-derived datasets during the SIGIR 2025 LiveRAG competition, mRAG outperforms conventional RAG baselines. We further analyze competition outcomes and showcase the framework's strengths with case studies, demonstrating its efficacy for complex, real-world RAG tasks.
This paper presents a scenario based robust optimization framework for short term energy scheduling in electricity intensive industrial plants, explicitly addressing uncertainty in planning decisions. The model is formulated as a two-stage Mixed Integer Linear Program (MILP) and integrates a hybrid scenario generation method capable of representing uncertain inputs such as electricity prices, renewable generation, and internal demand. A convex objective function combining expected and worst case operational costs allows for tunable risk aversion, enabling planners to balance economic performance and robustness. The resulting schedule ensures feasibility across all scenarios and supports coordinated use of industrial flexibility assets, including battery energy storage and shiftable production. To isolate the effects of market volatility, the framework is applied to a real world cement manufacturing case study considering only day-ahead electricity price uncertainty, with all other inputs treated deterministically. Results show improved resilience to forecast deviations, reduced cost variability, and more consistent operations. The proposed method offers a scalable and risk-aware approach for industrial flexibility planning under uncertainty.
Cardiac substructures are essential in thoracic radiation therapy planning to minimize risk of radiation-induced heart disease. Deep learning (DL) offers efficient methods to reduce contouring burden but lacks generalizability across different modalities and overlapping structures. This work introduces and validates a Modality-AGnostic Image Cascade (MAGIC) for comprehensive and multi-modal cardiac substructure segmentation. MAGIC is implemented through replicated encoding and decoding branches of an nnU-Net-based, U-shaped backbone conserving the function of a single model. Twenty cardiac substructures (heart, chambers, great vessels (GVs), valves, coronary arteries (CAs), and conduction nodes) from simulation CT (Sim-CT), low-field MR-Linac, and cardiac CT angiography (CCTA) modalities were manually delineated and used to train (n=76), validate (n=15), and test (n=30) MAGIC. Twelve comparison models (four segmentation subgroups across three modalities) were equivalently trained. All methods were compared for training efficiency and against reference contours using the Dice Similarity Coefficient (DSC) and two-tailed Wilcoxon Signed-Rank test (threshold, p<0.05). Average DSC scores were 0.75(0.16) for Sim-CT, 0.68(0.21) for MR-Linac, and 0.80(0.16) for CCTA. MAGIC outperforms the comparison in 57% of cases, with limited statistical differences. MAGIC offers an effective and accurate segmentation solution that is lightweight and capable of segmenting multiple modalities and overlapping structures in a single model. MAGIC further enables clinical implementation by simplifying the computational requirements and offering unparalleled flexibility for clinical settings.
We propose a variant of the Rapidly Exploring Random Tree Star (RRT$^{\star}$) algorithm to synthesize trajectories satisfying a given spatio-temporal specification expressed in a fragment of Signal Temporal Logic (STL) for linear systems. Previous approaches for planning trajectories under STL specifications using sampling-based methods leverage either mixed-integer or non-smooth optimization techniques, with poor scalability in the horizon and complexity of the task. We adopt instead a control-theoretic perspective on the problem, based on the notion of set forward invariance. Specifically, from a given STL task defined over polyhedral predicates, we develop a novel algorithmic framework by which the task is efficiently encoded into a time-varying set via linear programming, such that trajectories evolving within the set also satisfy the task. Forward invariance properties of the resulting set with respect to the system dynamics and input limitations are then proved via non-smooth analysis. We then present a modified RRT$^{\star}$ algorithm to synthesize asymptotically optimal and dynamically feasible trajectories satisfying a given STL specification, by sampling a tree of trajectories within the previously constructed time-varying set. We showcase two use cases of our approach involving an autonomous inspection of the International Space Station and room-servicing task requiring timed revisit of a charging station.
Quality management in semiconductor manufacturing often relies on template matching with known golden standards. For Indium-Phosphide (InP) multi-project wafer manufacturing, low production scale and high design variability lead to such golden standards being typically unavailable. Defect detection, in turn, is manual and labor-intensive. This work addresses this challenge by proposing a methodology to generate a synthetic golden standard using Deep Neural Networks, trained to simulate photo-realistic InP wafer images from CAD data. We evaluate various training objectives and assess the quality of the simulated images on both synthetic data and InP wafer photographs. Our deep-learning-based method outperforms a baseline decision-tree-based approach, enabling the use of a 'simulated golden die' from CAD plans in any user-defined region of a wafer for more efficient defect detection. We apply our method to a template matching procedure, to demonstrate its practical utility in surface defect detection.
Recent efforts to leverage the Multi-modal Large Language Model (MLLM) as GUI agents have yielded promising outcomes. However, these agents still struggle with long-horizon tasks in online environments, primarily due to insufficient knowledge and the inherent gap between offline and online domains. In this paper, inspired by how humans generalize knowledge in open-ended environments, we propose a Hierarchical Multimodal Skills (HMS) module to tackle the issue of insufficient knowledge. It progressively abstracts trajectories into execution skills, core skills, and ultimately meta-skills, providing a hierarchical knowledge structure for long-horizon task planning. To bridge the domain gap, we propose the Skill-Augmented Monte Carlo Tree Search (SA-MCTS) algorithm, which efficiently leverages skills acquired in offline environments to reduce the action search space during online tree exploration. Building on HMS, we propose Mirage-1, a multimodal, cross-platform, plug-and-play GUI agent. To validate the performance of Mirage-1 in real-world long-horizon scenarios, we constructed a new benchmark, AndroidLH. Experimental results show that Mirage-1 outperforms previous agents by 32\%, 19\%, 15\%, and 79\% on AndroidWorld, MobileMiniWob++, Mind2Web-Live, and AndroidLH, respectively. Project page: https://cybertronagent.github.io/Mirage-1.github.io/
StarCraft: Brood War remains a challenging benchmark for artificial intelligence research, particularly in the domain of macromanagement, where long-term strategic planning is required. Traditional approaches to StarCraft AI rely on rule-based systems or supervised deep learning, both of which face limitations in adaptability and computational efficiency. In this work, we introduce NeuroPAL, a neuroevolutionary framework that integrates Neuroevolution of Augmenting Topologies (NEAT) with Punctuated Anytime Learning (PAL) to improve the efficiency of evolutionary training. By alternating between frequent, low-fidelity training and periodic, high-fidelity evaluations, PAL enhances the sample efficiency of NEAT, enabling agents to discover effective strategies in fewer training iterations. We evaluate NeuroPAL in a fixed-map, single-race scenario in StarCraft: Brood War and compare its performance to standard NEAT-based training. Our results show that PAL significantly accelerates the learning process, allowing the agent to reach competitive levels of play in approximately half the training time required by NEAT alone. Additionally, the evolved agents exhibit emergent behaviors such as proxy barracks placement and defensive building optimization, strategies commonly used by expert human players. These findings suggest that structured evaluation mechanisms like PAL can enhance the scalability and effectiveness of neuroevolution in complex real-time strategy environments.
Recently, agents based on multimodal large language models (MLLMs) have achieved remarkable progress across various domains. However, building a generalist agent with capabilities such as perception, planning, action, grounding, and reflection in open-world environments like Minecraft remains challenges: insufficient domain-specific data, interference among heterogeneous tasks, and visual diversity in open-world settings. In this paper, we address these challenges through three key contributions. 1) We propose a knowledge-enhanced data generation pipeline to provide scalable and high-quality training data for agent development. 2) To mitigate interference among heterogeneous tasks, we introduce a Mixture-of-Experts (MoE) architecture with task-level routing. 3) We develop a Multimodal Reasoning-Augmented Reinforcement Learning approach to enhance the agent's reasoning ability for visual diversity in Minecraft. Built upon these innovations, we present Optimus-3, a general-purpose agent for Minecraft. Extensive experimental results demonstrate that Optimus-3 surpasses both generalist multimodal large language models and existing state-of-the-art agents across a wide range of tasks in the Minecraft environment. Project page: https://cybertronagent.github.io/Optimus-3.github.io/
Since its inception in the mid-60s, the inventory staggering problem has been explored and exploited in a wide range of application domains, such as production planning, stock control systems, warehousing, and aerospace/defense logistics. However, even with a rich history of academic focus, we are still very much in the dark when it comes to cornerstone computational questions around inventory staggering and to related structural characterizations, with our methodological toolbox being severely under-stocked. The central contribution of this paper consists in devising a host of algorithmic techniques and analytical ideas -- some being entirely novel and some leveraging well-studied concepts in combinatorics and number theory -- for surpassing essentially all known approximation guarantees for the inventory staggering problem. In particular, our work demonstrates that numerous structural properties open the door for designing polynomial-time approximation schemes, including polynomially-bounded cycle lengths, constantly-many distinct time intervals, so-called nested instances, and pairwise coprime settings. These findings offer substantial improvements over currently available constant-factor approximations and resolve outstanding open questions in their respective contexts. In parallel, we develop new theory around a number of yet-uncharted questions, related to the sampling complexity of peak inventory estimation as well as to the plausibility of groupwise synchronization. Interestingly, we establish the global nature of inventory staggering, proving that there are $n$-item instances where, for every subset of roughly $\sqrt{n}$ items, no policy improves on the worst-possible one by a factor greater than $1+\epsilon$, whereas for the entire instance, there exists a policy that outperforms the worst-possible one by a factor of nearly $2$, which is optimal.
Skin carcinoma is the most prevalent form of cancer globally, accounting for over $8 billion in annual healthcare expenditures. In clinical settings, physicians document patient visits using detailed SOAP (Subjective, Objective, Assessment, and Plan) notes. However, manually generating these notes is labor-intensive and contributes to clinician burnout. In this work, we propose a weakly supervised multimodal framework to generate clinically structured SOAP notes from limited inputs, including lesion images and sparse clinical text. Our approach reduces reliance on manual annotations, enabling scalable, clinically grounded documentation while alleviating clinician burden and reducing the need for large annotated data. Our method achieves performance comparable to GPT-4o, Claude, and DeepSeek Janus Pro across key clinical relevance metrics. To evaluate clinical quality, we introduce two novel metrics MedConceptEval and Clinical Coherence Score (CCS) which assess semantic alignment with expert medical concepts and input features, respectively.
Less-than-truckload (LTL) shipment is vital in modern freight transportation yet is in dire need of more efficient usage of resources, higher service responsiveness and velocity, lower overall shipping cost across all parties, and better quality of life for the drivers. The industry is currently highly fragmented, with numerous small to medium-sized LTL carriers typically operating within dedicated regions or corridors, mostly disconnected from each other. This paper investigates the large-scale interconnection of LTL carriers enabling each to leverage multi-carrier networks for cross-region services exploiting their mutual logistic hubs, in line with Physical Internet principles. In such a network, efficient open cooperation strategies are critical for optimizing multiparty relay shipment consolidation and delivery, transport and logistic operations and orchestration, and enabling inter-hub driver short hauls. To dynamically plan relay truck transportation of involved carriers across hyperconnected hub networks, we develop an optimization-based model to build loads, coordinate shipments, and synchronize driver deliveries. We report a simulation-based experiment in a multiparty LTL network covering the eastern U.S. in three scenarios: 1) each carrier operates separately and serves its clients with end-to-end transportation, 2) each carrier operates separately and adopts relay transportation in its service region, and 3) all carriers operate jointly and serve clients in the multi-carrier hyperconnected relay network. By comparing these three scenarios, we evaluate the impact of relay transportation and carrier cooperations on cost savings, trip duration, and greenhouse gas emissions. Overall, this research advances operational efficiencies through an effective collaborative solution across the LTL industry and contributes to the pursuit of sustainable logistics networks.