Optimal transport has found widespread applications in signal processing and machine learning. Among its many equivalent formulations, optimal transport seeks to reconstruct a random variable/vector with a prescribed distribution at the destination while minimizing the expected distortion relative to a given random variable/vector at the source. However, in practice, certain constraints may render the optimal transport plan infeasible. In this work, we consider three types of constraints: rate constraints, dimension constraints, and channel constraints, motivated by perception-aware lossy compression, generative principal component analysis, and deep joint source-channel coding, respectively. Special attenion is given to the setting termed Gaussian Wasserstein optimal transport, where both the source and reconstruction variables are multivariate Gaussian, and the end-to-end distortion is measured by the mean squared error. We derive explicit results for the minimum achievable mean squared error under the three aforementioned constraints when the covariance matrices of the source and reconstruction variables commute.
The advancement of visual language models (VLMs) has enhanced mobile device operations, allowing simulated human-like actions to address user requirements. Current VLM-based mobile operating assistants can be structured into three levels: task, subtask, and action. The subtask level, linking high-level goals with low-level executable actions, is crucial for task completion but faces two challenges: ineffective subtasks that lower-level agent cannot execute and inefficient subtasks that fail to contribute to the completion of the higher-level task. These challenges stem from VLM's lack of experience in decomposing subtasks within GUI scenarios in multi-agent architecture. To address these, we propose a new mobile assistant architecture with constrained high-frequency o}ptimized planning (CHOP). Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector. We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency. Our dataset and code is available at https://github.com/Yuqi-Zhou/CHOP
The electrification of road transport, as the predominant mode of transportation in Africa, represents a great opportunity to reduce greenhouse gas emissions and dependence on costly fuel imports. However, it introduces major challenges for local energy infrastructures, including the deployment of charging stations and the impact on often fragile electricity grids. Despite its importance, research on electric mobility planning in Africa remains limited, while existing planning tools rely on detailed local mobility data that is often unavailable, especially for privately owned passenger vehicles. In this study, we introduce a novel framework designed to support private vehicle electrification in data-scarce regions and apply it to Addis Ababa, simulating the mobility patterns and charging needs of 100,000 electric vehicles. Our analysis indicate that these vehicles generate a daily charging demand of approximately 350 MWh and emphasize the significant influence of the charging location on the spatial and temporal distribution of this demand. Notably, charging at public places can help smooth the charging demand throughout the day, mitigating peak charging loads on the electricity grid. We also estimate charging station requirements, finding that workplace charging requires approximately one charging point per three electric vehicles, while public charging requires only one per thirty. Finally, we demonstrate that photovoltaic energy can cover a substantial share of the charging needs, emphasizing the potential for renewable energy integration. This study lays the groundwork for electric mobility planning in Addis Ababa while offering a transferable framework for other African cities.
Autonomous motion planning under unknown nonlinear dynamics presents significant challenges. An agent needs to continuously explore the system dynamics to acquire its properties, such as reachability, in order to guide system navigation adaptively. In this paper, we propose a hybrid planning-control framework designed to compute a feasible trajectory toward a target. Our approach involves partitioning the state space and approximating the system by a piecewise affine (PWA) system with constrained control inputs. By abstracting the PWA system into a directed weighted graph, we incrementally update the existence of its edges via affine system identification and reach control theory, introducing a predictive reachability condition by exploiting prior information of the unknown dynamics. Heuristic weights are assigned to edges based on whether their existence is certain or remains indeterminate. Consequently, we propose a framework that adaptively collects and analyzes data during mission execution, continually updates the predictive graph, and synthesizes a controller online based on the graph search outcomes. We demonstrate the efficacy of our approach through simulation scenarios involving a mobile robot operating in unknown terrains, with its unknown dynamics abstracted as a single integrator model.
We propose a novel system for robot-to-human object handover that emulates human coworker interactions. Unlike most existing studies that focus primarily on grasping strategies and motion planning, our system focus on 1. inferring human handover intents, 2. imagining spatial handover configuration. The first one integrates multimodal perception-combining visual and verbal cues-to infer human intent. The second one using a diffusion-based model to generate the handover configuration, involving the spacial relationship among robot's gripper, the object, and the human hand, thereby mimicking the cognitive process of motor imagery. Experimental results demonstrate that our approach effectively interprets human cues and achieves fluent, human-like handovers, offering a promising solution for collaborative robotics. Code, videos, and data are available at: https://i3handover.github.io.
In many industrial robotics applications, multiple robots are working in a shared workspace to complete a set of tasks as quickly as possible. Such settings can be treated as multi-modal multi-robot multi-goal path planning problems, where each robot has to reach an ordered sequence of goals. Existing approaches to this type of problem solve this using prioritization or assume synchronous completion of tasks, and are thus neither optimal nor complete. We formalize this problem as a single path planning problem and introduce a benchmark encompassing a diverse range of problem instances including scenarios with various robots, planning horizons, and collaborative tasks such as handovers. Along with the benchmark, we adapt an RRT* and a PRM* planner to serve as a baseline for the planning problems. Both planners work in the composite space of all robots and introduce the required changes to work in our setting. Unlike existing approaches, our planner and formulation is not restricted to discretized 2D workspaces, supports a changing environment, and works for heterogeneous robot teams over multiple modes with different constraints, and multiple goals. Videos and code for the benchmark and the planners is available at https://vhartman.github.io/mrmg-planning/.
This survey provides a comprehensive review on recent advancements of generative learning models in robotic manipulation, addressing key challenges in the field. Robotic manipulation faces critical bottlenecks, including significant challenges in insufficient data and inefficient data acquisition, long-horizon and complex task planning, and the multi-modality reasoning ability for robust policy learning performance across diverse environments. To tackle these challenges, this survey introduces several generative model paradigms, including Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), diffusion models, probabilistic flow models, and autoregressive models, highlighting their strengths and limitations. The applications of these models are categorized into three hierarchical layers: the Foundation Layer, focusing on data generation and reward generation; the Intermediate Layer, covering language, code, visual, and state generation; and the Policy Layer, emphasizing grasp generation and trajectory generation. Each layer is explored in detail, along with notable works that have advanced the state of the art. Finally, the survey outlines future research directions and challenges, emphasizing the need for improved efficiency in data utilization, better handling of long-horizon tasks, and enhanced generalization across diverse robotic scenarios. All the related resources, including research papers, open-source data, and projects, are collected for the community in https://github.com/GAI4Manipulation/AwesomeGAIManipulation
Hemodynamic parameters such as pressure and wall shear stress play an important role in diagnosis, prognosis, and treatment planning in cardiovascular diseases. These parameters can be accurately computed using computational fluid dynamics (CFD), but CFD is computationally intensive. Hence, deep learning methods have been adopted as a surrogate to rapidly estimate CFD outcomes. A drawback of such data-driven models is the need for time-consuming reference CFD simulations for training. In this work, we introduce an active learning framework to reduce the number of CFD simulations required for the training of surrogate models, lowering the barriers to their deployment in new applications. We propose three distinct querying strategies to determine for which unlabeled samples CFD simulations should be obtained. These querying strategies are based on geometrical variance, ensemble uncertainty, and adherence to the physics governing fluid dynamics. We benchmark these methods on velocity field estimation in synthetic coronary artery bifurcations and find that they allow for substantial reductions in annotation cost. Notably, we find that our strategies reduce the number of samples required by up to 50% and make the trained models more robust to difficult cases. Our results show that active learning is a feasible strategy to increase the potential of deep learning-based CFD surrogates.
Liver-vessel segmentation is an essential task in the pre-operative planning of liver resection. State-of-the-art 2D or 3D convolution-based methods focusing on liver vessel segmentation on 2D CT cross-sectional views, which do not take into account the global liver-vessel topology. To maintain this global vessel topology, we rely on the underlying physics used in the CT reconstruction process, and apply this to liver-vessel segmentation. Concretely, we introduce the concept of top-k maximum intensity projections, which mimics the CT reconstruction by replacing the integral along each projection direction, with keeping the top-k maxima along each projection direction. We use these top-k maximum projections to condition a diffusion model and generate 3D liver-vessel trees. We evaluate our 3D liver-vessel segmentation on the 3D-ircadb-01 dataset, and achieve the highest Dice coefficient, intersection-over-union (IoU), and Sensitivity scores compared to prior work.
For quadrotors, achieving safe and autonomous flight in complex environments with wind disturbances and dynamic obstacles still faces significant challenges. Most existing methods address wind disturbances in either trajectory planning or control, which may lead to hazardous situations during flight. The emergence of dynamic obstacles would further worsen the situation. Therefore, we propose an efficient and reliable framework for quadrotors that incorporates wind disturbance estimations during both the planning and control phases via a generalized proportional integral observer. First, we develop a real-time adaptive spatial-temporal trajectory planner that utilizes Hamilton-Jacobi (HJ) reachability analysis for error dynamics resulting from wind disturbances. By considering the forward reachability sets propagation on an Euclidean Signed Distance Field (ESDF) map, safety is guaranteed. Additionally, a Nonlinear Model Predictive Control (NMPC) controller considering wind disturbance compensation is implemented for robust trajectory tracking. Simulation and real-world experiments verify the effectiveness of our framework. The video and supplementary material will be available at https://github.com/Ma29-HIT/SEAL/.
We offer a new in-depth investigation of global path planning (GPP) for unmanned ground vehicles, an autonomous mining sampling robot named ROMIE. GPP is essential for ROMIE's optimal performance, which is translated into solving the traveling salesman problem, a complex graph theory challenge that is crucial for determining the most effective route to cover all sampling locations in a mining field. This problem is central to enhancing ROMIE's operational efficiency and competitiveness against human labor by optimizing cost and time. The primary aim of this research is to advance GPP by developing, evaluating, and improving a cost-efficient software and web application. We delve into an extensive comparison and analysis of Google operations research (OR)-Tools optimization algorithms. Our study is driven by the goal of applying and testing the limits of OR-Tools capabilities by integrating Reinforcement Learning techniques for the first time. This enables us to compare these methods with OR-Tools, assessing their computational effectiveness and real-world application efficiency. Our analysis seeks to provide insights into the effectiveness and practical application of each technique. Our findings indicate that Q-Learning stands out as the optimal strategy, demonstrating superior efficiency by deviating only 1.2% on average from the optimal solutions across our datasets.
This paper introduces the "IoT Integration Protocol for Enhanced Hospital Care", a comprehensive framework designed to leverage Internet of Things (IoT) technology to enhance patient care, improve operational efficiency, and ensure data security in hospital settings. With the growing emphasis on utilizing advanced technologies in healthcare, this protocol aims to harness the potential of IoT devices to optimize patient monitoring, enable remote care, and support clinical decision-making. By integrating IoT seamlessly into nursing workflows and patient care plans, hospitals can achieve higher levels of patient-centric care and real-time data insights, leading to better treatment outcomes and resource allocation. This paper outlines the protocol's objectives, key components, and expected benefits, while emphasizing the importance of ethical considerations and ongoing evaluation to ensure successful implementation.
Accurate motion understanding of the dynamic objects within the scene in bird's-eye-view (BEV) is critical to ensure a reliable obstacle avoidance system and smooth path planning for autonomous vehicles. However, this task has received relatively limited exploration when compared to object detection and segmentation with only a few recent vision-based approaches presenting preliminary findings that significantly deteriorate in low-light, nighttime, and adverse weather conditions such as rain. Conversely, LiDAR and radar sensors remain almost unaffected in these scenarios, and radar provides key velocity information of the objects. Therefore, we introduce BEVMOSNet, to our knowledge, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in BEV. In addition, we perform a deeper analysis to find out the optimal strategy for deformable cross-attention-guided sensor fusion for cross-sensor knowledge sharing in BEV. While evaluating BEVMOSNet on the nuScenes dataset, we show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg (Sigatapu et al., 2023), and 2.35% compared to the multimodel SimpleBEV (Harley et al., 2022), extended for the motion segmentation task, establishing this method as the state-of-the-art in BEV motion segmentation.
Particle Therapy (PT) has emerged as a powerful tool in cancer treatment, leveraging the unique dose distribution of charged particles to deliver high radiation levels to the tumor while minimizing damage to surrounding healthy tissue. Despite its advantages, further improvements in Treatment Planning Systems (TPS) are needed to address uncertainties related to fragmentation process, which can affect both dose deposition and effectiveness. These fragmentation effects also play a critical role in Radiation Protection in Space, where astronauts are exposed to high level of radiation, necessitating precise models for shielding optimization. The FOOT (FragmentatiOn Of Target) experiment addresses these challenges by measuring fragmentation cross-section with high precision, providing essential data for improving TPS for PT and space radiation protection strategies. This thesis contributes to the FOOT experiment in two key areas. First, it focuses on the performances of the vertex detector, which is responsible for reconstructing particle tracks and fragmentation vertexes with high spatial resolution. The study evaluates the detector's reconstruction algorithm and its efficiency to detect particles. Second the thesis present a preliminary calculation of fragmentation cross section, incorporating the vertex detector for the first time in these measurements.
Efficient and safe trajectory planning plays a critical role in the application of quadrotor unmanned aerial vehicles. Currently, the inherent trade-off between constraint compliance and computational efficiency enhancement in UAV trajectory optimization problems has not been sufficiently addressed. To enhance the performance of UAV trajectory optimization, we propose a spatial-temporal iterative optimization framework. Firstly, B-splines are utilized to represent UAV trajectories, with rigorous safety assurance achieved through strict enforcement of constraints on control points. Subsequently, a set of QP-LP subproblems via spatial-temporal decoupling and constraint linearization is derived. Finally, an iterative optimization strategy incorporating guidance gradients is employed to obtain high-performance UAV trajectories in different scenarios. Both simulation and real-world experimental results validate the efficiency and high-performance of the proposed optimization framework in generating safe and fast trajectories. Our source codes will be released for community reference at https://hitsz-mas.github.io/STORM
Autonomous navigation is a fundamental task for robot vacuum cleaners in indoor environments. Since their core function is to clean entire areas, robots inevitably encounter dead zones in cluttered and narrow scenarios. Existing planning methods often fail to escape due to complex environmental constraints, high-dimensional search spaces, and high difficulty maneuvers. To address these challenges, this paper proposes an embodied escaping model that leverages reinforcement learning-based policy with an efficient action mask for dead zone escaping. To alleviate the issue of the sparse reward in training, we introduce a hybrid training policy that improves learning efficiency. In handling redundant and ineffective action options, we design a novel action representation to reshape the discrete action space with a uniform turning radius. Furthermore, we develop an action mask strategy to select valid action quickly, balancing precision and efficiency. In real-world experiments, our robot is equipped with a Lidar, IMU, and two-wheel encoders. Extensive quantitative and qualitative experiments across varying difficulty levels demonstrate that our robot can consistently escape from challenging dead zones. Moreover, our approach significantly outperforms compared path planning and reinforcement learning methods in terms of success rate and collision avoidance.
Generative AI (GenAI) tools enhance social media video creation by streamlining tasks such as scriptwriting, visual and audio generation, and editing. These tools enable the creation of new content, including text, images, audio, and video, with platforms like ChatGPT and MidJourney becoming increasingly popular among YouTube creators. Despite their growing adoption, knowledge of their specific use cases across the video production process remains limited. This study analyzes 274 YouTube how-to videos to explore GenAI's role in planning, production, editing, and uploading. The findings reveal that YouTubers use GenAI to identify topics, generate scripts, create prompts, and produce visual and audio materials. Additionally, GenAI supports editing tasks like upscaling visuals and reformatting content while also suggesting titles and subtitles. Based on these findings, we discuss future directions for incorporating GenAI to support various video creation tasks.
End-to-end autonomous driving frameworks enable seamless integration of perception and planning but often rely on one-shot trajectory prediction, which may lead to unstable control and vulnerability to occlusions in single-frame perception. To address this, we propose the Momentum-Aware Driving (MomAD) framework, which introduces trajectory momentum and perception momentum to stabilize and refine trajectory predictions. MomAD comprises two core components: (1) Topological Trajectory Matching (TTM) employs Hausdorff Distance to select the optimal planning query that aligns with prior paths to ensure coherence;(2) Momentum Planning Interactor (MPI) cross-attends the selected planning query with historical queries to expand static and dynamic perception files. This enriched query, in turn, helps regenerate long-horizon trajectory and reduce collision risks. To mitigate noise arising from dynamic environments and detection errors, we introduce robust instance denoising during training, enabling the planning model to focus on critical signals and improve its robustness. We also propose a novel Trajectory Prediction Consistency (TPC) metric to quantitatively assess planning stability. Experiments on the nuScenes dataset demonstrate that MomAD achieves superior long-term consistency (>=3s) compared to SOTA methods. Moreover, evaluations on the curated Turning-nuScenes shows that MomAD reduces the collision rate by 26% and improves TPC by 0.97m (33.45%) over a 6s prediction horizon, while closedloop on Bench2Drive demonstrates an up to 16.3% improvement in success rate.
We propose an integrated planning framework for quadrupedal locomotion over dynamically changing, unforeseen terrains. Existing approaches either rely on heuristics for instantaneous foothold selection--compromising safety and versatility--or solve expensive trajectory optimization problems with complex terrain features and long time horizons. In contrast, our framework leverages reactive synthesis to generate correct-by-construction controllers at the symbolic level, and mixed-integer convex programming (MICP) for dynamic and physically feasible footstep planning for each symbolic transition. We use a high-level manager to reduce the large state space in synthesis by incorporating local environment information, improving synthesis scalability. To handle specifications that cannot be met due to dynamic infeasibility, and to minimize costly MICP solves, we leverage a symbolic repair process to generate only necessary symbolic transitions. During online execution, re-running the MICP with real-world terrain data, along with runtime symbolic repair, bridges the gap between offline synthesis and online execution. We demonstrate, in simulation, our framework's capabilities to discover missing locomotion skills and react promptly in safety-critical environments, such as scattered stepping stones and rebars.
This paper presents ArticuBot, in which a single learned policy enables a robotics system to open diverse categories of unseen articulated objects in the real world. This task has long been challenging for robotics due to the large variations in the geometry, size, and articulation types of such objects. Our system, Articubot, consists of three parts: generating a large number of demonstrations in physics-based simulation, distilling all generated demonstrations into a point cloud-based neural policy via imitation learning, and performing zero-shot sim2real transfer to real robotics systems. Utilizing sampling-based grasping and motion planning, our demonstration generalization pipeline is fast and effective, generating a total of 42.3k demonstrations over 322 training articulated objects. For policy learning, we propose a novel hierarchical policy representation, in which the high-level policy learns the sub-goal for the end-effector, and the low-level policy learns how to move the end-effector conditioned on the predicted goal. We demonstrate that this hierarchical approach achieves much better object-level generalization compared to the non-hierarchical version. We further propose a novel weighted displacement model for the high-level policy that grounds the prediction into the existing 3D structure of the scene, outperforming alternative policy representations. We show that our learned policy can zero-shot transfer to three different real robot settings: a fixed table-top Franka arm across two different labs, and an X-Arm on a mobile base, opening multiple unseen articulated objects across two labs, real lounges, and kitchens. Videos and code can be found on our project website: https://articubot.github.io/.
The axion is a compelling hypothetical particle that could account for the dark matter in our universe, while simultaneously explaining why quark interactions within the neutron do not appear to give rise to an electric dipole moment. The most sensitive axion detection technique in the 1 to 10 GHz frequency range makes use of the axion-photon coupling and is called the axion haloscope. Within a high Q cavity immersed in a strong magnetic field, axions are converted to microwave photons. As searches scan up in axion mass, towards the parameter space favored by theoretical predictions, individual cavity sizes decrease in order to achieve higher frequencies. This shrinking cavity volume translates directly to a loss in signal-to-noise, motivating the plan to replace individual cavity detectors with arrays of cavities. When the transition from one to (N) multiple cavities occurs, haloscope searches are anticipated to become much more complicated to operate: requiring N times as many measurements but also the new requirement that N detectors function in lock step. To offset this anticipated increase in detector complexity, we aim to develop new tools for diagnosing low temperature RF experiments using neural networks for pattern recognition. Current haloscope experiments monitor the scattering parameters of their RF receiver for periodically measuring cavity quality factor and coupling. However off-resonant data remains relatively useless. In this paper, we ask whether the off resonant information contained in these VNA scans could be used to diagnose equipment failures/anomalies and measure physical conditions (e.g., temperatures and ambient magnetic field strengths). We demonstrate a proof-of-concept that AI techniques can help manage the overall complexity of an axion haloscope search for operators.
The highly nonlinear dynamics of vehicles present a major challenge for the practical implementation of optimal and Model Predictive Control (MPC) approaches in path planning and following. Koopman operator theory offers a global linear representation of nonlinear dynamical systems, making it a promising framework for optimization-based vehicle control. This paper introduces a novel deep learning-based Koopman modeling approach that employs deep neural networks to capture the full vehicle dynamics-from pedal and steering inputs to chassis states-within a curvilinear Frenet frame. The superior accuracy of the Koopman model compared to identified linear models is shown for a double lane change maneuver. Furthermore, it is shown that an MPC controller deploying the Koopman model provides significantly improved performance while maintaining computational efficiency comparable to a linear MPC.
Multi-Agent Path Finding (MAPF), which focuses on finding collision-free paths for multiple robots, is crucial for applications ranging from aerial swarms to warehouse automation. Solving MAPF is NP-hard so learning-based approaches for MAPF have gained attention, particularly those leveraging deep neural networks. Nonetheless, despite the community's continued efforts, all learning-based MAPF planners still rely on decentralized planning due to variability in the number of agents and map sizes. We have developed the first centralized learning-based policy for MAPF problem called RAILGUN. RAILGUN is not an agent-based policy but a map-based policy. By leveraging a CNN-based architecture, RAILGUN can generalize across different maps and handle any number of agents. We collect trajectories from rule-based methods to train our model in a supervised way. In experiments, RAILGUN outperforms most baseline methods and demonstrates great zero-shot generalization capabilities on various tasks, maps and agent numbers that were not seen in the training dataset.
Heating electrification presents opportunities and challenges for energy affordability. Without careful planning and policy, the costs of natural gas service will be borne by a shrinking customer base, driving up expenses for those who are left behind. This affordability issue is worsened by new fossil fuel investments, which risk locking communities into carbon-intensive infrastructure. Here, we introduce a framework to quantify the distributional effects of natural gas phasedown on energy affordability, integrating detailed household data with utility financial and planning documents. Applying our framework first to Massachusetts and then nationwide, we show that vulnerable communities face disproportionate affordability risks in building energy transitions. Households that do not electrify may bear up to 50% higher energy costs over the next decade. Targeted electrification may help to alleviate immediate energy burdens, but household heating transitions will ultimately require coordinated, neighborhood-scale strategies that consider the high fixed costs of legacy infrastructure.
We introduce LiteWebAgent, an open-source suite for VLM-based web agent applications. Our framework addresses a critical gap in the web agent ecosystem with a production-ready solution that combines minimal serverless backend configuration, intuitive user and browser interfaces, and extensible research capabilities in agent planning, memory, and tree search. For the core LiteWebAgent agent framework, we implemented a simple yet effective baseline using recursive function calling, providing with decoupled action generation and action grounding. In addition, we integrate advanced research components such as agent planning, agent workflow memory, and tree search in a modular and extensible manner. We then integrate the LiteWebAgent agent framework with frontend and backend as deployed systems in two formats: (1) a production Vercel-based web application, which provides users with an agent-controlled remote browser, (2) a Chrome extension leveraging LiteWebAgent's API to control an existing Chrome browser via CDP (Chrome DevTools Protocol). The LiteWebAgent framework is available at https://github.com/PathOnAI/LiteWebAgent, with deployed frontend at https://lite-web-agent.vercel.app/.
4D flow MRI allows for the estimation of three-dimensional relative pressure fields, providing rich pressure information, unlike catheterization and Doppler echocardiography, which provide one-dimensional pressure drops only. The accuracy of one-dimensional pressure drops derived from 4D flow has been explored in previous literature, but additional work must be done to evaluate the accuracy of three-dimensional relative pressure fields. This work presents an analysis of three state-of-the-art relative pressure estimators: virtual Work-Energy Relative Pressure (vWERP), the Pressure Poisson Estimator (PPE), and the Stokes Estimator (STE). Spatiotemporal behavior and sensitivity to noise were determined in silico. Estimators were validated with a type B aortic dissection (TBAD) flow phantom with varying tear geometry and an array of twelve catheter pressure measurements. Finally, the performance of each estimator was evaluated across eight patient cases. In silico pressure field errors were lower in STE compared to PPE, although PPE pressures were less affected by noise. High velocity gradients and low spatial resolution contributed most significantly to local variations in 3D error fields. Low temporal resolution leads to highly transient peak pressure events being averaged, systematically underestimating peak pressures. In the flow phantom analysis, vWERP was the most accurate method, followed by STE and PPE. Each pressure estimator strongly correlated with ground truth pressure values despite the tendency to underestimate peak pressures. Patient case results demonstrated that the pressure estimators could be feasibly integrated into a clinical workflow.
Current embodied reasoning agents struggle to plan for long-horizon tasks that require to physically interact with the world to obtain the necessary information (e.g. 'sort the objects from lightest to heaviest'). The improvement of the capabilities of such an agent is highly dependent on the availability of relevant training environments. In order to facilitate the development of such systems, we introduce a novel simulation environment (built on top of robosuite) that makes use of the MuJoCo physics engine and high-quality renderer Blender to provide realistic visual observations that are also accurate to the physical state of the scene. It is the first simulator focusing on long-horizon robot manipulation tasks preserving accurate physics modeling. MuBlE can generate mutlimodal data for training and enable design of closed-loop methods through environment interaction on two levels: visual - action loop, and control - physics loop. Together with the simulator, we propose SHOP-VRB2, a new benchmark composed of 10 classes of multi-step reasoning scenarios that require simultaneous visual and physical measurements.
Proton computed tomography (pCT) aims to facilitate precise dose planning for hadron therapy, a promising and effective method for cancer treatment. Hadron therapy utilizes protons and heavy ions to deliver well focused doses of radiation, leveraging the Bragg peak phenomenon to target tumors while sparing healthy tissues. The Bergen pCT Collaboration aims to develop a novel pCT scanner, and accompanying reconstruction algorithms to overcome current limitations. This paper focuses on advancing the track- and image reconstruction algorithms, thereby enhancing the precision of the dose planning and reducing side effects of hadron therapy. A neural network aided track reconstruction method is presented.
From early Movement Primitive (MP) techniques to modern Vision-Language Models (VLMs), autonomous manipulation has remained a pivotal topic in robotics. As two extremes, VLM-based methods emphasize zero-shot and adaptive manipulation but struggle with fine-grained planning. In contrast, MP-based approaches excel in precise trajectory generalization but lack decision-making ability. To leverage the strengths of the two frameworks, we propose VL-MP, which integrates VLM with Kernelized Movement Primitives (KMP) via a low-distortion decision information transfer bridge, enabling fine-grained robotic manipulation under ambiguous situations. One key of VL-MP is the accurate representation of task decision parameters through semantic keypoints constraints, leading to more precise task parameter generation. Additionally, we introduce a local trajectory feature-enhanced KMP to support VL-MP, thereby achieving shape preservation for complex trajectories. Extensive experiments conducted in complex real-world environments validate the effectiveness of VL-MP for adaptive and fine-grained manipulation.
Motion planning with simple objectives, such as collision-avoidance and goal-reaching, can be solved efficiently using modern planners. However, the complexity of the allowed tasks for these planners is limited. On the other hand, signal temporal logic (STL) can specify complex requirements, but STL-based motion planning and control algorithms often face scalability issues, especially in large multi-robot systems with complex dynamics. In this paper, we propose an algorithm that leverages the best of the two worlds. We first use a single-robot motion planner to efficiently generate a set of alternative reference paths for each robot. Then coordination requirements are specified using STL, which is defined over the assignment of paths and robots' progress along those paths. We use a Mixed Integer Linear Program (MILP) to compute task assignments and robot progress targets over time such that the STL specification is satisfied. Finally, a local controller is used to track the target progress. Simulations demonstrate that our method can handle tasks with complex constraints and scales to large multi-robot teams and intricate task allocation scenarios.