Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and goal-conditioned control, our approach produces meaningful sub-goals and robustly drives long-horizon goal-reaching on diverse a navigation benchmarks.
Imitation learning (IL), particularly when leveraging high-dimensional visual inputs for policy training, has proven intuitive and effective in complex bimanual manipulation tasks. Nonetheless, the generalization capability of visuomotor policies remains limited, especially when small demonstration datasets are available. Accumulated errors in visuomotor policies significantly hinder their ability to complete long-horizon tasks. To address these limitations, we propose SViP, a framework that seamlessly integrates visuomotor policies into task and motion planning (TAMP). SViP partitions human demonstrations into bimanual and unimanual operations using a semantic scene graph monitor. Continuous decision variables from the key scene graph are employed to train a switching condition generator. This generator produces parameterized scripted primitives that ensure reliable performance even when encountering out-of-the-distribution observations. Using only 20 real-world demonstrations, we show that SViP enables visuomotor policies to generalize across out-of-distribution initial conditions without requiring object pose estimators. For previously unseen tasks, SViP automatically discovers effective solutions to achieve the goal, leveraging constraint modeling in TAMP formulism. In real-world experiments, SViP outperforms state-of-the-art generative IL methods, indicating wider applicability for more complex tasks. Project website: https://sites.google.com/view/svip-bimanual
In this paper, we present the main actions of the Women in Physics Group of the Spanish Royal Physics Society over the period of 2022 to 2023, in which we celebrated the 20th anniversary of the group. We also outline relevant equality initiatives implemented during this period by the Spanish Government as well as analyse their impact on the status of women in Physics in our country. In 2023, our scientific society approved the Gender Equality Plan, thus becoming a pioneer scientific society in Spain in implementing this relevant measure
We study the Universal Solvability of Robot Motion Planning on Graphs (USolR) problem: given an undirected graph G = (V, E) and p robots, determine whether any arbitrary configuration of the robots can be transformed into any other arbitrary configuration via a sequence of valid, collision-free moves. We design a canonical accumulation procedure that maps arbitrary configurations to configurations that occupy a fixed subset of vertices, enabling us to analyze configuration reachability in terms of equivalence classes. We prove that in instances that are not universally solvable, at least half of all configurations are unreachable from a given one, and leverage this to design an efficient randomized algorithm with one-sided error, which can be derandomized with a blow-up in the running time by a factor of p. Further, we optimize our deterministic algorithm by using the structure of the input graph G = (V, E), achieving a running time of O(p * (|V| + |E|)) in sparse graphs and O(|V| + |E|) in dense graphs. Finally, we consider the Graph Edge Augmentation for Universal Solvability (EAUS) problem, where given a connected graph G that is not universally solvable for p robots, the question is to check if for a given budget b, at most b edges can be added to G to make it universally solvable for p robots. We provide an upper bound of p - 2 on b for general graphs. On the other hand, we also provide examples of graphs that require Theta(p) edges to be added. We further study the Graph Vertex and Edge Augmentation for Universal Solvability (VEAUS) problem, where a vertices and b edges can be added, and we provide lower bounds on a and b.
This paper presents a novel high-level task planning and optimal coordination framework for autonomous masonry construction, using a team of heterogeneous aerial robotic workers, consisting of agents with separate skills for brick placement and mortar application. This introduces new challenges in scheduling and coordination, particularly due to the mortar curing deadline required for structural bonding and ensuring the safety constraints among UAVs operating in parallel. To address this, an automated pipeline generates the wall construction plan based on the available bricks while identifying static structural dependencies and potential conflicts for safe operation. The proposed framework optimizes UAV task allocation and execution timing by incorporating dynamically coupled precedence deadline constraints that account for the curing process and static structural dependency constraints, while enforcing spatio-temporal constraints to prevent collisions and ensure safety. The primary objective of the scheduler is to minimize the overall construction makespan while minimizing logistics, traveling time between tasks, and the curing time to maintain both adhesion quality and safe workspace separation. The effectiveness of the proposed method in achieving coordinated and time-efficient aerial masonry construction is extensively validated through Gazebo simulated missions. The results demonstrate the framework's capability to streamline UAV operations, ensuring both structural integrity and safety during the construction process.
Autonomous aerial target tracking in unstructured and GPS-denied environments remains a fundamental challenge in robotics. Many existing methods rely on motion capture systems, pre-mapped scenes, or feature-based localization to ensure safety and control, limiting their deployment in real-world conditions. We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation using only a stereo camera and an IMU. Rather than constructing a global map or relying on absolute localization, NOVA formulates perception, estimation, and control entirely in the target's reference frame. A tightly integrated stack combines a lightweight object detector with stereo depth completion, followed by histogram-based filtering to infer robust target distances under occlusion and noise. These measurements feed a visual-inertial state estimator that recovers the full 6-DoF pose of the robot relative to the target. A nonlinear model predictive controller (NMPC) plans dynamically feasible trajectories in the target frame. To ensure safety, high-order control barrier functions are constructed online from a compact set of high-risk collision points extracted from depth, enabling real-time obstacle avoidance without maps or dense representations. We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss and severe lighting changes that disrupt feature-based localization. Each experiment is repeated multiple times under similar conditions to assess resilience, showing consistent and reliable performance. NOVA achieves agile target following at speeds exceeding 50 km/h. These results show that high-speed vision-based tracking is possible in the wild using only onboard sensing, with no reliance on external localization or environment assumptions.
IceTop is the cosmic-ray detector located on the surface of the IceCube Neutrino Observatory at the South Pole, consisting of 81 pairs of ice-Cherenkov tanks. The rise in the energy threshold of air-shower measurements in IceTop due to accumulating snow emphasized the need for the next generation of IceCube surface detectors. For this purpose, the Surface Array Enhancement (SAE) is set to comprise elevated scintillator panels and radio antennas controlled by hybrid DAQ systems. The detectors of the SAE are also expected to extend to the planned IceCube-Gen2 Surface Array. An initial study with a prototype station is already conducted. We briefly review the SAE and the deployment as well as the calibration status of the upcoming stations of the planned array of 32 stations. The focus of this contribution is on the radio detection of extensive air showers. A preliminary estimation of the position of the shower maximum ($X_\mathrm{max}$), that is sensitive to the primary mass, with data from the 3 antennas of the prototype station was carried out. An extension of the method from previous analyses is also briefly discussed.
In cable driven parallel robots (CDPRs), the payload is suspended using a network of cables whose length can be controlled to maneuver the payload within the workspace. Compared to rigid link robots, CDPRs provide better maneuverability due to the flexibility of the cables and consume lesser power due to the high strength-to-weight ratio of the cables. However, amongst other things, the flexibility of the cables and the fact that they can only pull (and not push) render the dynamics of CDPRs complex. Hence advanced modelling paradigms and control algorithms must be developed to fully utilize the potential of CDPRs. Furthermore, given the complex dynamics of CDPRs, the models and control algorithms proposed for them must be validated on experimental setups to ascertain their efficacy in practice. We have recently developed an elaborate experimental setup for a CDPR with three cables and validated elementary open-loop motion planning algorithms on it. In this paper, we describe several aspects of the design and fabrication of our setup, including component selection and assembly, and present our experimental results. Our setup can reproduce complex phenomenon such as the transverse vibration of the cables seen in large CDPRs and will in the future be used to model and control such phenomenon and also to validate more sophisticated motion planning algorithms.
Precise and flexible cart-pushing is a challenging task for mobile robots. The motion constraints during cart-pushing and the robot's redundancy lead to complex motion planning problems, while variable payloads and disturbances present complicated dynamics. In this work, we propose a novel planning and control framework for flexible whole-body coordination and robust adaptive control. Our motion planning method employs a local coordinate representation and a novel kinematic model to solve a nonlinear optimization problem, thereby enhancing motion maneuverability by generating feasible and flexible push poses. Furthermore, we present a disturbance rejection control method to resist disturbances and reduce control errors for the complex control problem without requiring an accurate dynamic model. We validate our method through extensive experiments in simulation and real-world settings, demonstrating its superiority over existing approaches. To the best of our knowledge, this is the first work to systematically evaluate the flexibility and robustness of cart-pushing methods in experiments. The video supplement is available at https://sites.google.com/view/mpac-pushing/.
This paper studies the problem of using a robot arm to manipulate a uniformly rotating chain with its bottom end fixed. Existing studies have investigated ideal rotational shapes for practical applications, yet they do not discuss how these shapes can be consistently achieved through manipulation planning. Our work presents a manipulation strategy for stable and consistent shape transitions. We find that the configuration space of such a chain is homeomorphic to a three-dimensional cube. Using this property, we suggest a strategy to manipulate the chain into different configurations, specifically from one rotation mode to another, while taking stability and feasibility into consideration. We demonstrate the effectiveness of our strategy in physical experiments by successfully transitioning from rest to the first two rotation modes. The concepts explored in our work has critical applications in ensuring safety and efficiency of drill string and yarn spinning operations.
This paper introduces an advanced framework leveraging Large Language Model-based Multi-Agent Systems (LLMMA) for the automated search and optimization of Quantum Machine Learning (QML) algorithms. Inspired by Google DeepMind's FunSearch, the proposed system works on abstract level to iteratively generates and refines quantum transformations of classical machine learning algorithms (concepts), such as the Multi-Layer Perceptron, forward-forward and backpropagation algorithms. As a proof of concept, this work highlights the potential of agentic frameworks to systematically explore classical machine learning concepts and adapt them for quantum computing, paving the way for efficient and automated development of QML algorithms. Future directions include incorporating planning mechanisms and optimizing strategy in the search space for broader applications in quantum-enhanced machine learning.
Large vision-language models (VLMs) for autonomous driving (AD) are evolving beyond perception and cognition tasks toward motion planning. However, we identify two critical challenges in this direction: (1) VLMs tend to learn shortcuts by relying heavily on history input information, achieving seemingly strong planning results without genuinely understanding the visual inputs; and (2) the chain-ofthought (COT) reasoning processes are always misaligned with the motion planning outcomes, and how to effectively leverage the complex reasoning capability to enhance planning remains largely underexplored. In this paper, we start from a small-scale domain-specific VLM and propose Drive-R1 designed to bridges the scenario reasoning and motion planning for AD. Drive-R1 first undergoes the supervised finetuning on a elaborate dataset containing both long and short COT data. Drive-R1 is encouraged to reason step-by-step from visual input to final planning decisions. Subsequently, Drive-R1 is trained within a reinforcement learning framework that incentivizes the discovery of reasoning paths that are more informative for planning, guided by rewards based on predicted trajectories and meta actions. Experimental evaluations on the nuScenes and DriveLM-nuScenes benchmarks demonstrate that Drive-R1 achieves superior performance compared to existing state-of-the-art VLMs. We believe that Drive-R1 presents a promising direction for bridging reasoning and planning in AD, offering methodological insights for future research and applications.
(Abridged) Clusters of galaxies, formed in the latest stages of structure formation, are unique cosmological probes. With the advent of large CMB surveys like those from the Planck satellite, the ACT and SPT telescopes, we now have access to a large number of galaxy clusters detected at millimeter wavelengths via the thermal Sunyaev-Zel'dovich (tSZ) effect. Nevertheless, it is interesting to complement them with high-angular-resolution (tens of arcseconds) observations to target the lowest-mass and highest-redshift clusters. This is the case of observations with the NIKA2 camera, which is installed on the IRAM 30--m telescope in Pico Veleta, Spain. We used the existing 150 GHz (2 mm) data from the NIKA2 Cosmological Legacy Survey (N2CLS) Large Program to blindly search for galaxy clusters in the well-known COSMOS field, across a 877 arcmin$^2$ region centered on (R.A., Dec.)$_{J2000}$ = (10h00m28.81s, +02d17m30.44s). We first developed a dedicated data reduction pipeline to construct NIKA2 maps at 2 mm. We then used a matched-filter algorithm to extract cluster candidates assuming a universal pressure profile to model the expected cluster tSZ signal. We computed the purity and completeness of the sample by applying the previous algorithm to simulated maps of the sky signal in the COSMOS field. We find a total of 16 cluster candidates at S/N > 4, from which eight have either an optical or X-ray cluster (or group of galaxies) counterpart. This is the first blind detection of clusters of galaxies at mm wavelengths at 18" angular resolution. From this analysis, we confirm that NIKA2 and the IRAM 30--m telescope should be sensitive to low-mass clusters at intermediate and high redshift, complementing current and planned large tSZ-based cluster surveys.
Experimental data collected from a triple-axis spectrometer (TAS) are typically analysed by considering the instrument resolution, as the resolution of a TAS instrument is often complex and significantly influences the measured results. Two Python packages, TasVisAn and InsPy, have been developed to visualize and analyse data from TAS instruments - particularly from the cold-neutron TAS Sika and the thermal-neutron TAS Taipan at the Australian Centre for Neutron Scattering. TasVisAn offers a range of functions, including data importing, reduction, plotting, contour mapping, convolution fitting, and more, for data collected on TAS instruments, especially on Sika and Taipan. It also supports data reduction of the current trendy multi-analyser and multiplexing TAS instruments, including the multiplexing mode of Sika. Besides, it includes scan simulation and batch file validation tools for both Taipan and Sika, assisting users in designing and planning experiments in advance. InsPy is a general-purpose Python package designed to calculate the four-dimensional (4D) instrument resolution in momentum-energy space for any TAS instrument. Combined with InsPy, TasVisAn supports both instrument resolution calculation and resolution-convoluted data fitting. Its flexible external data import feature further allows TasVisAn to be adapted for the visualization and convolution analysis of inelastic neutron scattering data across various TAS instruments.
The automation of composite sheet layup is essential to meet the increasing demand for composite materials in various industries. However, draping plans for the robotic layup of composite sheets are not robust. A plan that works well under a certain condition does not work well in a different condition. Changes in operating conditions due to either changes in material properties or working environment may lead a draping plan to exhibit suboptimal performance. In this paper, we present a comprehensive framework aimed at refining plans based on the observed execution performance. Our framework prioritizes the minimization of uncompacted regions while simultaneously improving time efficiency. To achieve this, we integrate human expertise with data-driven decision-making to refine expert-crafted plans for diverse production environments. We conduct experiments to validate the effectiveness of our approach, revealing significant reductions in the number of corrective paths required compared to initial expert-crafted plans. Through a combination of empirical data analysis, action-effectiveness modeling, and search-based refinement, our system achieves superior time efficiency in robotic layup. Experimental results demonstrate the efficacy of our approach in optimizing the layup process, thereby advancing the state-of-the-art in composite manufacturing automation.
We present the concept of the ultracold neutron (UCN) source with a superfluid helium-4 converter located in the thermal column of the WWR-K research reactor at the Institute of Nuclear Physics (INP) in Almaty, Kazakhstan. The conceptual design is based on the idea of accumulating UCNs in the source and effectively transporting them to experimental setups. We propose to improve the UCN density in the source by separating the heat and the UCN flows from the production volume and decreasing both, the temperature of the SuperFluid $ ^{ 4 }$He (SF $ ^{ 4 }$He) converter below $\sim$1 K and the coefficient of UCN wall loss below $\sim$$ 10^{ -4 }$. To achieve the operation temperatures below 1 K we plan to use a He-3 pumping cryogenic system and minimize the thermal load on the UCN accumulation trap walls. Additional gain in the total number of accumulated UCNs can be achieved due to use of a material of a high critical velocity for the walls of the accumulation trap. The implementation of such a design critically depends on the availability of materials with specific UCN and cryogenic properties. This paper describes the conceptual design of the source, discusses its implementation methods and material requirements, and plans for material testing studies.
Medical image analysis increasingly relies on the integration of multiple imaging modalities to capture complementary anatomical and functional information, enabling more accurate diagnosis and treatment planning. Achieving aligned feature representations across these diverse modalities is therefore important for effective multimodal analysis. While contrastive language-image pre-training (CLIP) and its variant have enabled image-text alignments, they require explicitly paired data between arbitrary two modalities, which is difficult to acquire in medical contexts. To address the gap, we present Multimodal Medical Image Binding with Text (M\textsuperscript{3}Bind), a novel pre-training framework that enables seamless alignment of multiple medical imaging modalities through a shared text representation space without requiring explicit paired data between any two medical image modalities. Specifically, based on the insight that different images can naturally bind with text, M\textsuperscript{3}Bind first fine-tunes pre-trained CLIP-like image-text models to align their modality-specific text embedding space while preserving their original image-text alignments. Subsequently, we distill these modality-specific text encoders into a unified model, creating a shared text embedding space. Experiments on X-ray, CT, retina, ECG, and pathological images on multiple downstream tasks demonstrate that M\textsuperscript{3}Bind achieves state-of-the-art performance in zero-shot, few-shot classification and cross-modal retrieval tasks compared to its CLIP-like counterparts. These results validate M\textsuperscript{3}Bind's effectiveness in achieving cross-image-modal alignment for medical analysis.
Air pollution remains a leading global health threat, with fine particulate matter (PM2.5) contributing to millions of premature deaths annually. Chemical transport models (CTMs) are essential tools for evaluating how emission controls improve air quality and save lives, but they are computationally intensive. Reduced form models accelerate simulations but sacrifice spatial-temporal granularity, accuracy, and flexibility. Here we present CleanAir, a deep-learning-based model developed as an efficient alternative to CTMs in simulating daily PM2.5 and its chemical compositions in response to precursor emission reductions at 36 km resolution, which could predict PM2.5 concentration for a full year within 10 seconds on a single GPU, a speed five orders of magnitude faster. Built on a Residual Symmetric 3D U-Net architecture and trained on more than 2,400 emission reduction scenarios generated by a well-validated Community Multiscale Air Quality (CMAQ) model, CleanAir generalizes well across unseen meteorological years and emission patterns. It produces results comparable to CMAQ in both absolute concentrations and emission-induced changes, enabling efficient, full-coverage simulations across short-term interventions and long-term planning horizons. This advance empowers researchers and policymakers to rapidly evaluate a wide range of air quality strategies and assess the associated health impacts, thereby supporting more responsive and informed environmental decision-making.
Integrated sensing and communication (ISAC) networks strive to deliver both high-precision target localization and high-throughput data services across the entire coverage area. In this work, we examine the fundamental trade-off between sensing and communication from the perspective of base station (BS) deployment. Furthermore, we conceive a design that simultaneously maximizes the target localization coverage, while guaranteeing the desired communication performance. In contrast to existing schemes optimized for a single target, an effective network-level approach has to ensure consistent localization accuracy throughout the entire service area. While employing time-of-flight (ToF) based localization, we first analyze the deployment problem from a localization-performance coverage perspective, aiming for minimizing the area Cramer-Rao Lower Bound (A-CRLB) to ensure uniformly high positioning accuracy across the service area. We prove that for a fixed number of BSs, uniformly scaling the service area by a factor \kappa increases the optimal A-CRLB in proportion to \kappa^{2\beta}, where \beta is the BS-to-target pathloss exponent. Based on this, we derive an approximate scaling law that links the achievable A-CRLB across the area of interest to the dimensionality of the sensing area. We also show that cooperative BSs extends the coverage but yields marginal A-CRLB improvement as the dimensionality of the sensing area grows.
World models -- generative models that simulate environment dynamics conditioned on past observations and actions -- are gaining prominence in planning, simulation, and embodied AI. However, evaluating their rollouts remains a fundamental challenge, requiring fine-grained, temporally grounded assessment of action alignment and semantic consistency -- capabilities not captured by existing metrics. Vision-Language Models (VLMs) have shown promise as automatic evaluators of generative content due to their strong multimodal reasoning abilities. Yet, their use in fine-grained, temporally sensitive evaluation tasks remains limited and requires targeted adaptation. We introduce a evaluation protocol targeting two recognition tasks -- action recognition and character recognition -- each assessed across binary, multiple-choice, and open-ended formats. To support this, we present UNIVERSE (UNIfied Vision-language Evaluator for Rollouts in Simulated Environments), a method for adapting VLMs to rollout evaluation under data and compute constraints. We conduct a large-scale study comparing full, partial, and parameter-efficient finetuning across task formats, context lengths, sampling strategies, and data compositions. The resulting unified evaluator matches the performance of task-specific baselines using a single checkpoint. Human studies confirm strong alignment with human judgments, establishing UNIVERSE as a scalable, semantics-aware evaluator for world models.
Reliable navigation in unstructured, real-world environments remains a significant challenge for embodied agents, especially when operating across diverse terrains, weather conditions, and sensor configurations. In this paper, we introduce GeNIE (Generalizable Navigation System for In-the-Wild Environments), a robust navigation framework designed for global deployment. GeNIE integrates a generalizable traversability prediction model built on SAM2 with a novel path fusion strategy that enhances planning stability in noisy and ambiguous settings. We deployed GeNIE in the Earth Rover Challenge (ERC) at ICRA 2025, where it was evaluated across six countries spanning three continents. GeNIE took first place and achieved 79% of the maximum possible score, outperforming the second-best team by 17%, and completed the entire competition without a single human intervention. These results set a new benchmark for robust, generalizable outdoor robot navigation. We will release the codebase, pretrained model weights, and newly curated datasets to support future research in real-world navigation.
Multiple unmanned aerial vehicles (UAVs) play a vital role in monitoring and data collection in wide area environments with harsh conditions. In most scenarios, issues such as real-time data retrieval and real-time UAV positioning are often disregarded, essentially neglecting the communication constraints. In this paper, we comprehensively address both the coverage of the target area and the data transmission capabilities of the flying ad hoc network (FANET). The data throughput of the network is therefore maximized by optimizing the network topology and the UAV trajectories. The resultant optimization problem is effectively solved by the proposed reinforcement learning-based trajectory planning (RL-TP) algorithm and the convex-based topology optimization (C-TOP) algorithm sequentially. The RL-TP optimizes the UAV paths while considering the constraints of FANET. The C-TOP maximizes the data throughput of the network while simultaneously constraining the neighbors and transmit powers of the UAVs, which is shown to be a convex problem that can be efficiently solved in polynomial time. Simulations and field experimental results show that the proposed optimization strategy can effectively plan the UAV trajectories and significantly improve the data throughput of the FANET over the adaptive local minimum spanning tree (A-LMST) and cyclic pruning-assisted power optimization (CPAPO) methods.
Life sciences research increasingly requires identifying, accessing, and effectively processing data from an ever-evolving array of information sources on the Linked Open Data (LOD) network. This dynamic landscape places a significant burden on researchers, as the quality of query responses depends heavily on the selection and semantic integration of data sources --processes that are often labor-intensive, error-prone, and costly. While the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles has aimed to address these challenges, barriers to efficient and accurate scientific data processing persist. In this paper, we introduce FAIRBridge, an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases, even when they are not FAIR-compliant. FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases described in scientific literature, and generate executable queries via intelligent resource access plans. The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered. FAIRBridge's autonomous query processing framework enables users to explore alternative data sources, make informed choices at every step, and leverage community-driven crowd curation when needed. By providing a user-friendly, automated hypothesis-testing platform in natural English, FAIRBridge significantly enhances the integration and processing of scientific data, offering researchers a powerful new tool for advancing their inquiries.
Topological complexity is a homotopy invariant that measures the minimal number of continuous rules required for motion planning in a space. In this work, we introduce persistent analogs of topological complexity and its cohomological lower bound, the zero-divisor-cup-length, for persistent topological spaces, and establish their stability. For Vietoris-Rips filtrations of compact metric spaces, we show that the erosion distances between these persistent invariants are bounded above by twice the Gromov-Hausdorff distance. We also present examples illustrating that persistent topological complexity and persistent zero-divisor-cup-length can distinguish between certain spaces more effectively than persistent homology.
Accurate reconstruction of the environment is a central goal of Simultaneous Localization and Mapping (SLAM) systems. However, the agent's trajectory can significantly affect estimation accuracy. This paper presents a new method to model map uncertainty in Active SLAM systems using an Uncertainty Map (UM). The UM uses probability distributions to capture where the map is uncertain, allowing Uncertainty Frontiers (UF) to be defined as key exploration-exploitation objectives and potential stopping criteria. In addition, the method introduces the Signed Relative Entropy (SiREn), based on the Kullback-Leibler divergence, to measure both coverage and uncertainty together. This helps balance exploration and exploitation through an easy-to-understand parameter. Unlike methods that depend on particular SLAM setups, the proposed approach is compatible with different types of sensors, such as cameras, LiDARs, and multi-sensor fusion. It also addresses common problems in exploration planning and stopping conditions. Furthermore, integrating this map modeling approach with a UF-based planning system enables the agent to autonomously explore open spaces, a behavior not previously observed in the Active SLAM literature. Code and implementation details are available as a ROS node, and all generated data are openly available for public use, facilitating broader adoption and validation of the proposed approach.
Exploring the outer reaches of the Solar System presents significant propulsion and mission design challenges. This study assesses the feasibility of a mission to Sedna using two advanced propulsion concepts: the Direct Fusion Drive (DFD) rocket engine, based on D-$^{3}$He thermonuclear fusion, and a solar sail utilizing thermal desorption of its coating for propulsion. Both are evaluated for a one-way Earth-to-Sedna mission; however, due to the different performances the DFD would enable orbit insertion, whereas for the solar sail a flyby is envisioned. The analysis evaluates key mission parameters, including delivered payload capacity, travel time, and potential science return. For the DFD, we assume a 1.6 MW system with constant thrust and specific impulse, while for the solar sail, we consider acceleration via thermal desorption and a gravity-assist maneuver around Jupiter. The mission analysis incorporates four key phases: departure, interplanetary acceleration, interplanetary coasting, and rendezvous. Sedna is expected to pass through the perihelion of its orbit in 2075--2076 and then move again away from the Sun. Considering the distances involved, a mission targeting the object would need to be launched "relatively" soon, especially if using conventional propulsion systems, which could require up to 30 years of deep-space travel. In our study, results indicate that the DFD could reach Sedna in approximately 10 years, with 1.5 years of thrusting, while the solar sail, assisted by Jupiter's gravity, could complete the journey in 7 years. The feasibility of science payload accommodation, power availability, and communication constraints is also considered. These findings provide a comparative foundation for future deep-space mission planning.
Accurate segmentation of Pelvic Radiation Injury (PRI) from Magnetic Resonance Images (MRI) is crucial for more precise prognosis assessment and the development of personalized treatment plans. However, automated segmentation remains challenging due to factors such as complex organ morphologies and confusing context. To address these challenges, we propose a novel Pattern Divide-and-Conquer Network (PDC-Net) for PRI segmentation. The core idea is to use different network modules to "divide" various local and global patterns and, through flexible feature selection, to "conquer" the Regions of Interest (ROI) during the decoding phase. Specifically, considering that our ROI often manifests as strip-like or circular-like structures in MR slices, we introduce a Multi-Direction Aggregation (MDA) module. This module enhances the model's ability to fit the shape of the organ by applying strip convolutions in four distinct directions. Additionally, to mitigate the challenge of confusing context, we propose a Memory-Guided Context (MGC) module. This module explicitly maintains a memory parameter to track cross-image patterns at the dataset level, thereby enhancing the distinction between global patterns associated with the positive and negative classes. Finally, we design an Adaptive Fusion Decoder (AFD) that dynamically selects features from different patterns based on the Mixture-of-Experts (MoE) framework, ultimately generating the final segmentation results. We evaluate our method on the first large-scale pelvic radiation injury dataset, and the results demonstrate the superiority of our PDC-Net over existing approaches.
Recent studies on Vision-Language-Action (VLA) models have shifted from the end-to-end action-generation paradigm toward a pipeline involving task planning followed by action generation, demonstrating improved performance on various complex, long-horizon manipulation tasks. However, existing approaches vary significantly in terms of network architectures, planning paradigms, representations, and training data sources, making it challenging for researchers to identify the precise sources of performance gains and components to be further improved. To systematically investigate the impacts of different planning paradigms and representations isolating from network architectures and training data, in this paper, we introduce VLA-OS, a unified VLA architecture series capable of various task planning paradigms, and design a comprehensive suite of controlled experiments across diverse object categories (rigid and deformable), visual modalities (2D and 3D), environments (simulation and real-world), and end-effectors (grippers and dexterous hands). Our results demonstrate that: 1) visually grounded planning representations are generally better than language planning representations; 2) the Hierarchical-VLA paradigm generally achieves superior or comparable performance than other paradigms on task performance, pretraining, generalization ability, scalability, and continual learning ability, albeit at the cost of slower training and inference speeds.
Our quality audit for three widely used public multilingual speech datasets - Mozilla Common Voice 17.0, FLEURS, and VoxPopuli - shows that in some languages, these datasets suffer from significant quality issues. We believe addressing these issues will make these datasets more useful as training and evaluation sets, and improve downstream models. We divide these quality issues into two categories: micro-level and macro-level. We find that macro-level issues are more prevalent in less institutionalized, often under-resourced languages. We provide a case analysis of Taiwanese Southern Min (nan_tw) that highlights the need for proactive language planning (e.g. orthography prescriptions, dialect boundary definition) and enhanced data quality control in the process of Automatic Speech Recognition (ASR) dataset creation. We conclude by proposing guidelines and recommendations to mitigate these issues in future dataset development, emphasizing the importance of sociolinguistic awareness in creating robust and reliable speech data resources.
In their simplest form, bulk acoustic wave (BAW) devices consist of a piezoelectric crystal between two electrodes that transduce the material's vibrations into electrical signals. They are adopted in frequency control and metrology, with well-established standards at frequencies of 5~MHz and above. Their use as a resonant-mass strain antenna for high-frequency gravitational waves has been recently proposed (Goryachev and Tobar, 2014). The estimated power spectral density sensitivity at the resonant frequencies is of the order of $10^{-21}\, \textrm{strain}/\sqrt{\textrm{Hz}}$. In this paper, after introducing the science opportunity and potential of gravitational wave detection with BAWs, we describe the two-stage BAUSCIA project plan to build a multimode antenna based on commercial BAWs, followed by an optimized array of custom BAWs. We show that commercially available BAWs already provide sensitivity comparable to current experiments around 10~MHz. Finally, we outline options for optimization of custom devices to improve sensitivity in an unexplored region, probe multiple frequencies between 0.1 and 10 MHz, and target specific signals, such as post-merger emission from neutron stars or emission from various dark matter candidates.