4D flow MRI allows for the estimation of three-dimensional relative pressure fields, providing rich pressure information, unlike catheterization and Doppler echocardiography, which provide one-dimensional pressure drops only. The accuracy of one-dimensional pressure drops derived from 4D flow has been explored in previous literature, but additional work must be done to evaluate the accuracy of three-dimensional relative pressure fields. This work presents an analysis of three state-of-the-art relative pressure estimators: virtual Work-Energy Relative Pressure (vWERP), the Pressure Poisson Estimator (PPE), and the Stokes Estimator (STE). Spatiotemporal behavior and sensitivity to noise were determined in silico. Estimators were validated with a type B aortic dissection (TBAD) flow phantom with varying tear geometry and an array of twelve catheter pressure measurements. Finally, the performance of each estimator was evaluated across eight patient cases. In silico pressure field errors were lower in STE compared to PPE, although PPE pressures were less affected by noise. High velocity gradients and low spatial resolution contributed most significantly to local variations in 3D error fields. Low temporal resolution leads to highly transient peak pressure events being averaged, systematically underestimating peak pressures. In the flow phantom analysis, vWERP was the most accurate method, followed by STE and PPE. Each pressure estimator strongly correlated with ground truth pressure values despite the tendency to underestimate peak pressures. Patient case results demonstrated that the pressure estimators could be feasibly integrated into a clinical workflow.
Current embodied reasoning agents struggle to plan for long-horizon tasks that require to physically interact with the world to obtain the necessary information (e.g. 'sort the objects from lightest to heaviest'). The improvement of the capabilities of such an agent is highly dependent on the availability of relevant training environments. In order to facilitate the development of such systems, we introduce a novel simulation environment (built on top of robosuite) that makes use of the MuJoCo physics engine and high-quality renderer Blender to provide realistic visual observations that are also accurate to the physical state of the scene. It is the first simulator focusing on long-horizon robot manipulation tasks preserving accurate physics modeling. MuBlE can generate mutlimodal data for training and enable design of closed-loop methods through environment interaction on two levels: visual - action loop, and control - physics loop. Together with the simulator, we propose SHOP-VRB2, a new benchmark composed of 10 classes of multi-step reasoning scenarios that require simultaneous visual and physical measurements.
Proton computed tomography (pCT) aims to facilitate precise dose planning for hadron therapy, a promising and effective method for cancer treatment. Hadron therapy utilizes protons and heavy ions to deliver well focused doses of radiation, leveraging the Bragg peak phenomenon to target tumors while sparing healthy tissues. The Bergen pCT Collaboration aims to develop a novel pCT scanner, and accompanying reconstruction algorithms to overcome current limitations. This paper focuses on advancing the track- and image reconstruction algorithms, thereby enhancing the precision of the dose planning and reducing side effects of hadron therapy. A neural network aided track reconstruction method is presented.
From early Movement Primitive (MP) techniques to modern Vision-Language Models (VLMs), autonomous manipulation has remained a pivotal topic in robotics. As two extremes, VLM-based methods emphasize zero-shot and adaptive manipulation but struggle with fine-grained planning. In contrast, MP-based approaches excel in precise trajectory generalization but lack decision-making ability. To leverage the strengths of the two frameworks, we propose VL-MP, which integrates VLM with Kernelized Movement Primitives (KMP) via a low-distortion decision information transfer bridge, enabling fine-grained robotic manipulation under ambiguous situations. One key of VL-MP is the accurate representation of task decision parameters through semantic keypoints constraints, leading to more precise task parameter generation. Additionally, we introduce a local trajectory feature-enhanced KMP to support VL-MP, thereby achieving shape preservation for complex trajectories. Extensive experiments conducted in complex real-world environments validate the effectiveness of VL-MP for adaptive and fine-grained manipulation.
Motion planning with simple objectives, such as collision-avoidance and goal-reaching, can be solved efficiently using modern planners. However, the complexity of the allowed tasks for these planners is limited. On the other hand, signal temporal logic (STL) can specify complex requirements, but STL-based motion planning and control algorithms often face scalability issues, especially in large multi-robot systems with complex dynamics. In this paper, we propose an algorithm that leverages the best of the two worlds. We first use a single-robot motion planner to efficiently generate a set of alternative reference paths for each robot. Then coordination requirements are specified using STL, which is defined over the assignment of paths and robots' progress along those paths. We use a Mixed Integer Linear Program (MILP) to compute task assignments and robot progress targets over time such that the STL specification is satisfied. Finally, a local controller is used to track the target progress. Simulations demonstrate that our method can handle tasks with complex constraints and scales to large multi-robot teams and intricate task allocation scenarios.
Autonomous navigation is reshaping various domains in people's life by enabling efficient and safe movement in complex environments. Reliable navigation requires algorithmic approaches that compute optimal or near-optimal trajectories while satisfying task-specific constraints and ensuring obstacle avoidance. However, existing methods struggle with slow convergence and suboptimal solutions, particularly in complex environments, limiting their real-world applicability. To address these limitations, this paper presents the Multi-Strategy Enhanced Crayfish Optimization Algorithm (MCOA), a novel approach integrating three key strategies: 1) Refractive Opposition Learning, enhancing population diversity and global exploration, 2) Stochastic Centroid-Guided Exploration, balancing global and local search to prevent premature convergence, and 3) Adaptive Competition-Based Selection, dynamically adjusting selection pressure for faster convergence and improved solution quality. Empirical evaluations underscore the remarkable planning speed and the amazing solution quality of MCOA in both 3D Unmanned Aerial Vehicle (UAV) and 2D mobile robot path planning. Against 11 baseline algorithms, MCOA achieved a 69.2% reduction in computational time and a 16.7% improvement in minimizing overall path cost in 3D UAV scenarios. Furthermore, in 2D path planning, MCOA outperformed baseline approaches by 44% on average, with an impressive 75.6% advantage in the largest 60*60 grid setting. These findings validate MCOA as a powerful tool for optimizing autonomous navigation in complex environments. The source code is available at: https://github.com/coedv-hub/MCOA.
Hub Covering Problems arise in various practical domains, such as urban planning, cargo delivery systems, airline networks, telecommunication network design, and e-mobility. The task is to select a set of hubs that enable tours between designated origin-destination pairs while ensuring that any tour includes no more than two hubs and that either the overall tour length or the longest individual edge is kept within prescribed limits. In literature, three primary variants of this problem are distinguished by their specific constraints. Each version exists in a single and multi allocation version, resulting in multiple distinct problem statements. Furthermore, the capacitated versions of these problems introduce additional restrictions on the maximum number of hubs that can be opened. It is currently unclear whether some variants are more complex than others, and no approximation bound is known. In this paper, we establish a hierarchy among these problems, demonstrating that certain variants are indeed special cases of others. For each problem, we either determine the absence of any approximation bound or provide both upper and lower bounds on the approximation guarantee.
The UAV-VLPA* (Visual-Language-Planning-and-Action) system represents a cutting-edge advancement in aerial robotics, designed to enhance communication and operational efficiency for unmanned aerial vehicles (UAVs). By integrating advanced planning capabilities, the system addresses the Traveling Salesman Problem (TSP) to optimize flight paths, reducing the total trajectory length by 18.5\% compared to traditional methods. Additionally, the incorporation of the A* algorithm enables robust obstacle avoidance, ensuring safe and efficient navigation in complex environments. The system leverages satellite imagery processing combined with the Visual Language Model (VLM) and GPT's natural language processing capabilities, allowing users to generate detailed flight plans through simple text commands. This seamless fusion of visual and linguistic analysis empowers precise decision-making and mission planning, making UAV-VLPA* a transformative tool for modern aerial operations. With its unmatched operational efficiency, navigational safety, and user-friendly functionality, UAV-VLPA* sets a new standard in autonomous aerial robotics, paving the way for future innovations in the field.
Autonomous navigation of car-like robots on uneven terrain poses unique challenges compared to flat terrain, particularly in traversability assessment and terrain-associated kinematic modelling for motion planning. This paper introduces SEB-Naver, a novel SE(2)-based local navigation framework designed to overcome these challenges. First, we propose an efficient traversability assessment method for SE(2) grids, leveraging GPU parallel computing to enable real-time updates and maintenance of local maps. Second, inspired by differential flatness, we present an optimization-based trajectory planning method that integrates terrain-associated kinematic models, significantly improving both planning efficiency and trajectory quality. Finally, we unify these components into SEB-Naver, achieving real-time terrain assessment and trajectory optimization. Extensive simulations and real-world experiments demonstrate the effectiveness and efficiency of our approach. The code is at https://github.com/ZJU-FAST-Lab/seb_naver.
The Entrance Dependent Vehicle Routing Problem (EDVRP) is a variant of the Vehicle Routing Problem (VRP) where the scale of cities influences routing outcomes, necessitating consideration of their entrances. This paper addresses EDVRP in agriculture, focusing on multi-parameter vehicle planning for irregularly shaped fields. To address the limitations of traditional methods, such as heuristic approaches, which often overlook field geometry and entrance constraints, we propose a Joint Probability Distribution Sampling Neural Network (JPDS-NN) to effectively solve the EDVRP. The network uses an encoder-decoder architecture with graph transformers and attention mechanisms to model routing as a Markov Decision Process, and is trained via reinforcement learning for efficient and rapid end-to-end planning. Experimental results indicate that JPDS-NN reduces travel distances by 48.4-65.4%, lowers fuel consumption by 14.0-17.6%, and computes two orders of magnitude faster than baseline methods, while demonstrating 15-25% superior performance in dynamic arrangement scenarios. Ablation studies validate the necessity of cross-attention and pre-training. The framework enables scalable, intelligent routing for large-scale farming under dynamic constraints.
Diffusion models have recently gained significant attention in robotics due to their ability to generate multi-modal distributions of system states and behaviors. However, a key challenge remains: ensuring precise control over the generated outcomes without compromising realism. This is crucial for applications such as motion planning or trajectory forecasting, where adherence to physical constraints and task-specific objectives is essential. We propose a novel framework that enhances controllability in diffusion models by leveraging multi-modal prior distributions and enforcing strong modal coupling. This allows us to initiate the denoising process directly from distinct prior modes that correspond to different possible system behaviors, ensuring sampling to align with the training distribution. We evaluate our approach on motion prediction using the Waymo dataset and multi-task control in Maze2D environments. Experimental results show that our framework outperforms both guidance-based techniques and conditioned models with unimodal priors, achieving superior fidelity, diversity, and controllability, even in the absence of explicit conditioning. Overall, our approach provides a more reliable and scalable solution for controllable motion generation in robotics.
Selecting the right monitoring level in Remote Patient Monitoring (RPM) systems for e-healthcare is crucial for balancing patient outcomes, various resources, and patient's quality of life. A prior work has used one-dimensional health representations, but patient health is inherently multidimensional and typically consists of many measurable physiological factors. In this paper, we introduce a multidimensional health state model within the RPM framework and use dynamic programming to study optimal monitoring strategies. Our analysis reveals that the optimal control is characterized by switching curves (for two-dimensional health states) or switching hyper-surfaces (in general): patients switch to intensive monitoring when health measurements cross a specific multidimensional surface. We further study how the optimal switching curve varies for different medical conditions and model parameters. This finding of the optimal control structure provides actionable insights for clinicians and aids in resource planning. The tunable modeling framework enhances the applicability and effectiveness of RPM services across various medical conditions.
Autonomous robotic systems hold potential for improving renal tumor resection accuracy and patient outcomes. We present a fluorescence-guided robotic system capable of planning and executing incision paths around exophytic renal tumors with a clinically relevant resection margin. Leveraging point cloud observations, the system handles irregular tumor shapes and distinguishes healthy from tumorous tissue based on near-infrared imaging, akin to indocyanine green staining in partial nephrectomy. Tissue-mimicking phantoms are crucial for the development of autonomous robotic surgical systems for interventions where acquiring ex-vivo animal tissue is infeasible, such as cancer of the kidney and renal pelvis. To this end, we propose novel hydrogel-based kidney phantoms with exophytic tumors that mimic the physical and visual behavior of tissue, and are compatible with electrosurgical instruments, a common limitation of silicone-based phantoms. In contrast to previous hydrogel phantoms, we mix the material with near-infrared dye to enable fluorescence-guided tumor segmentation. Autonomous real-world robotic experiments validate our system and phantoms, achieving an average margin accuracy of 1.44 mm in a completion time of 69 sec.
Object Goal Navigation-requiring an agent to locate a specific object in an unseen environment-remains a core challenge in embodied AI. Although recent progress in Vision-Language Model (VLM)-based agents has demonstrated promising perception and decision-making abilities through prompting, none has yet established a fully modular world model design that reduces risky and costly interactions with the environment by predicting the future state of the world. We introduce WMNav, a novel World Model-based Navigation framework powered by Vision-Language Models (VLMs). It predicts possible outcomes of decisions and builds memories to provide feedback to the policy module. To retain the predicted state of the environment, WMNav proposes the online maintained Curiosity Value Map as part of the world model memory to provide dynamic configuration for navigation policy. By decomposing according to a human-like thinking process, WMNav effectively alleviates the impact of model hallucination by making decisions based on the feedback difference between the world model plan and observation. To further boost efficiency, we implement a two-stage action proposer strategy: broad exploration followed by precise localization. Extensive evaluation on HM3D and MP3D validates WMNav surpasses existing zero-shot benchmarks in both success rate and exploration efficiency (absolute improvement: +3.2% SR and +3.2% SPL on HM3D, +13.5% SR and +1.1% SPL on MP3D). Project page: https://b0b8k1ng.github.io/WMNav/.
The very few studies that have attempted to formulate multi-agent reinforcement learning (RL) algorithms for adaptive traffic signal control have mainly used value-based RL methods although recent literature has shown that policy-based methods may perform better in partially observable environments. Additionally, because of the simplifying assumptions on signal timing made almost universally across previous studies, RL methods remain largely untested for real-world signal timing plans. This study formulates a multi-agent proximal policy optimization (MA-PPO) algorithm to implement adaptive and coordinated traffic control along an arterial corridor. The formulated MA-PPO has centralized critic architecture under the centralized training and decentralized execution framework. All agents are formulated to allow selection and implementation of up to eight signal phases as commonly implemented in the field controllers. The formulated algorithm is tested on a simulated real-world corridor with seven intersections, actual/complete traffic movements and signal phases, traffic volumes, and network geometry including intersection spacings. The performance of the formulated MA-PPO adaptive control algorithm is compared with the field implemented coordinated and actuated signal control (ASC) plans modeled using Vissim-MaxTime software in the loop simulation (SILs). The speed of convergence for each agent largely depended on the size of the action space which in turn depended on the number and sequence of signal phases. Compared with the currently implemented ASC signal timings, MA-PPO showed a travel time reduction of about 14% and 29%, respectively for the two through movements across the entire test corridor. Through volume sensitivity experiments, the formulated MA-PPO showed good stability, robustness and adaptability to changes in traffic demand.
As autonomous systems are increasingly deployed in open and uncertain settings, there is a growing need for trustworthy world models that can reliably predict future high-dimensional observations. The learned latent representations in world models lack direct mapping to meaningful physical quantities and dynamics, limiting their utility and interpretability in downstream planning, control, and safety verification. In this paper, we argue for a fundamental shift from physically informed to physically interpretable world models - and crystallize four principles that leverage symbolic knowledge to achieve these ends: (1) structuring latent spaces according to the physical intent of variables, (2) learning aligned invariant and equivariant representations of the physical world, (3) adapting training to the varied granularity of supervision signals, and (4) partitioning generative outputs to support scalability and verifiability. We experimentally demonstrate the value of each principle on two benchmarks. This paper opens several intriguing research directions to achieve and capitalize on full physical interpretability in world models.
Motion planning in navigation systems is highly susceptible to upstream perceptual errors, particularly in human detection and tracking. To mitigate this issue, the concept of guidance points--a novel directional cue within a reinforcement learning-based framework--is introduced. A structured method for identifying guidance points is developed, consisting of obstacle boundary extraction, potential guidance point detection, and redundancy elimination. To integrate guidance points into the navigation pipeline, a perception-to-planning mapping strategy is proposed, unifying guidance points with other perceptual inputs and enabling the RL agent to effectively leverage the complementary relationships among raw laser data, human detection and tracking, and guidance points. Qualitative and quantitative simulations demonstrate that the proposed approach achieves the highest success rate and near-optimal travel times, greatly improving both safety and efficiency. Furthermore, real-world experiments in dynamic corridors and lobbies validate the robot's ability to confidently navigate around obstacles and robustly avoid pedestrians.
We propose an approach to formally specifying the behavioral properties of systems that rely on a perception model for interactions with the physical world. The key idea is to introduce embeddings -- mathematical representations of a real-world concept -- as a first-class construct in a specification language, where properties are expressed in terms of distances between a pair of ideal and observed embeddings. To realize this approach, we propose a new type of temporal logic called Embedding Temporal Logic (ETL), and describe how it can be used to express a wider range of properties about AI-enabled systems than previously possible. We demonstrate the applicability of ETL through a preliminary evaluation involving planning tasks in robots that are driven by foundation models; the results are promising, showing that embedding-based specifications can be used to steer a system towards desirable behaviors.
Rural and urban areas exhibit distinct mobility patterns, yet a systematic understanding of how these trends differ across regions and contexts remains underexplored. By using origin-destination matrices from Location-Based Services data in Colombia, India, and Mexico, we delineate urban and rural boundaries through network percolation, reducing reliance on conventional urbanisation metrics tied to the built environment. We gauge mobility dynamics across regions developing a measure for routing inefficiency, which measures how much longer empirical trips are than their optimal shortest path. Our findings reveal that rural areas experience greater inefficiencies, particularly for longer trips made later in the day. At the urban level, we determine the misalignment between urban mobility efficiency and public transit accessibility, by measuring the difference between their respective vector fields. We observe that most cities experience misalignment during regular commuting hours, with Colombian cities exhibiting particularly high alignment. Meanwhile, mobility inefficiency in rural areas are associated with their orientation around their most proximate city. City-level analyses uncover disparities in the functions of rural and urban areas, with significant variations between weekdays and weekends, reflecting distinct roles in commuting and access to services. These findings highlight the importance of tailored, context-sensitive approaches to improving connectivity and reducing disparities. This study offers new insights into the spatial and temporal dynamics of mobility inefficiency, contributing to equitable regional planning and sustainable mobility solutions.
Primary cosmic rays when interact with our atmosphere, produce a cascade of lighter secondary particles namely pion, kaon, neutrons, muons, electrons, positrons and neutrinos. Muons are one of the most abundant and easily detectable particles at the ground surface using a large variety of particle detectors. Resistive Plate Chambers (RPCs) of 2 m x 2 m in dimension were developed to be used in large scale as the active detector elements in the Iron Calorimeter (ICAL) which was planned to be built by the India-based Neutrino Observatory (INO). As a spin-off of this work, a portable stack of eight, one square foot RPC detectors is developed. While the main purpose of this Cosmic Muon Tracker (CMT) detector is training of students, outreach and science popularisation, it could also be used to conduct small-scale particle detector experiments. We will discuss design, integration, characterisation and some of the applications of this detector in this paper.
We introduce a novel gradient-based approach for solving sequential tasks by dynamically adjusting the underlying myopic potential field in response to feedback and the world's regularities. This adjustment implicitly considers subgoals encoded in these regularities, enabling the solution of long sequential tasks, as demonstrated by solving the traditional planning domain of Blocks World - without any planning. Unlike conventional planning methods, our feedback-driven approach adapts to uncertain and dynamic environments, as demonstrated by one hundred real-world trials involving drawer manipulation. These experiments highlight the robustness of our method compared to planning and show how interactive perception and error recovery naturally emerge from gradient descent without explicitly implementing them. This offers a computationally efficient alternative to planning for a variety of sequential tasks, while aligning with observations on biological problem-solving strategies.
We present the design and science case for a new array of radio antennas to be located at the Pierre Auger Observatory. Six stations of three SKALA antennas each will be deployed around a single water-Cherenkov surface detector triggering the radio readout. The planned antenna layout will allow for the detection of cosmic rays above a few tens of PeV and reach full efficiency for vertical air showers at several hundred PeV in primary energy. The array will thus be a pathfinder to demonstrate that fully-efficient radio detection in combination with the underground muon detectors already present at the location is possible. This will enable combined studies of the muon component and the depth of shower maximum, relevant for hadronic interaction models studies and more accurate determination of the cosmic-ray mass composition in the energy range of the Galactic-to-extragalactic transition.
This paper introduces RoboDexVLM, an innovative framework for robot task planning and grasp detection tailored for a collaborative manipulator equipped with a dexterous hand. Previous methods focus on simplified and limited manipulation tasks, which often neglect the complexities associated with grasping a diverse array of objects in a long-horizon manner. In contrast, our proposed framework utilizes a dexterous hand capable of grasping objects of varying shapes and sizes while executing tasks based on natural language commands. The proposed approach has the following core components: First, a robust task planner with a task-level recovery mechanism that leverages vision-language models (VLMs) is designed, which enables the system to interpret and execute open-vocabulary commands for long sequence tasks. Second, a language-guided dexterous grasp perception algorithm is presented based on robot kinematics and formal methods, tailored for zero-shot dexterous manipulation with diverse objects and commands. Comprehensive experimental results validate the effectiveness, adaptability, and robustness of RoboDexVLM in handling long-horizon scenarios and performing dexterous grasping. These results highlight the framework's ability to operate in complex environments, showcasing its potential for open-vocabulary dexterous manipulation. Our open-source project page can be found at https://henryhcliu.github.io/robodexvlm.
We present estimates for the production cross sections of exotic states ($\Lambda_c, \Sigma_c, \Xi_c, D\, \mathrm{and}\, D_s$) from secondary $\overline B + B$ annihilations in p+A reactions from $E_\mathrm{lab}=10-30A$~GeV. We focus specifically on the newly planned hadron physics program of CBM at FAIR. These estimates for the production of exotic states are based on the achievable number of $\overline B + B$ annihilations and their invariant mass distributions calculated in the UrQMD transport model.
Viewpoint planning is crucial for 3D data collection and autonomous navigation, yet existing methods often miss key optimization objectives for static LiDAR, resulting in suboptimal network designs. The Viewpoint Planning Problem (VPP), which builds upon the Art Gallery Problem (AGP), requires not only full coverage but also robust registrability and connectivity under limited sensor views. We introduce a greedy optimization algorithm that tackles these VPP and AGP challenges through a novel Visibility Field (VF) approach. The VF captures visibility characteristics unique to static LiDAR, enabling a reduction from 2D to 1D by focusing on medial axis and joints. This leads to a minimal, fully connected viewpoint network with comprehensive coverage and minimal redundancy. Experiments across diverse environments show that our method achieves high efficiency and scalability, matching or surpassing expert designs. Compared to state-of-the-art methods, our approach achieves comparable viewpoint counts (VC) while reducing Weighted Average Path Length (WAPL) by approximately 95\%, indicating a much more compact and connected network. Dataset and source code will be released upon acceptance.
Path planning for robotic exploration is challenging, requiring reasoning over unknown spaces and anticipating future observations. Efficient exploration requires selecting budget-constrained paths that maximize information gain. Despite advances in autonomous exploration, existing algorithms still fall short of human performance, particularly in structured environments where predictive cues exist but are underutilized. Guided by insights from our user study, we introduce MapExRL, which improves robot exploration efficiency in structured indoor environments by enabling longer-horizon planning through reinforcement learning (RL) and global map predictions. Unlike many RL-based exploration methods that use motion primitives as the action space, our approach leverages frontiers for more efficient model learning and longer horizon reasoning. Our framework generates global map predictions from the observed map, which our policy utilizes, along with the prediction uncertainty, estimated sensor coverage, frontier distance, and remaining distance budget, to assess the strategic long-term value of frontiers. By leveraging multiple frontier scoring methods and additional context, our policy makes more informed decisions at each stage of the exploration. We evaluate our framework on a real-world indoor map dataset, achieving up to an 18.8% improvement over the strongest state-of-the-art baseline, with even greater gains compared to conventional frontier-based algorithms.
Formulating the intended behavior of a dynamic system can be challenging. Signal temporal logic (STL) is frequently used for this purpose due to its suitability in formalizing comprehensible, modular, and versatile spatiotemporal specifications. Due to scaling issues with respect to the complexity of the specifications and the potential occurrence of non-differentiable terms, classical optimization methods often solve STL-based problems inefficiently. Smoothing and approximation techniques can alleviate these issues but require changing the optimization problem. This paper proposes a novel sampling-based method based on model predictive path integral control to solve optimal control problems with STL cost functions. We demonstrate the effectiveness of our method on benchmark motion planning problems and compare its performance with state-of-the-art methods. The results show that our method efficiently solves optimal control problems with STL costs.
This paper contributes the Aerial Gym Simulator, a highly parallelized, modular framework for simulation and rendering of arbitrary multirotor platforms based on NVIDIA Isaac Gym. Aerial Gym supports the simulation of under-, fully- and over-actuated multirotors offering parallelized geometric controllers, alongside a custom GPU-accelerated rendering framework for ray-casting capable of capturing depth, segmentation and vertex-level annotations from the environment. Multiple examples for key tasks, such as depth-based navigation through reinforcement learning are provided. The comprehensive set of tools developed within the framework makes it a powerful resource for research on learning for control, planning, and navigation using state information as well as exteroceptive sensor observations. Extensive simulation studies are conducted and successful sim2real transfer of trained policies is demonstrated. The Aerial Gym Simulator is open-sourced at: https://github.com/ntnu-arl/aerial_gym_simulator.
Anticipating future events is crucial for various application domains such as healthcare, smart home technology, and surveillance. Narrative event descriptions provide context-rich information, enhancing a system's future planning and decision-making capabilities. We propose a novel task: $\textit{long-term future narration generation}$, which extends beyond traditional action anticipation by generating detailed narrations of future daily activities. We introduce a visual-language model, ViNa, specifically designed to address this challenging task. ViNa integrates long-term videos and corresponding narrations to generate a sequence of future narrations that predict subsequent events and actions over extended time horizons. ViNa extends existing multimodal models that perform only short-term predictions or describe observed videos by generating long-term future narrations for a broader range of daily activities. We also present a novel downstream application that leverages the generated narrations called future video retrieval to help users improve planning for a task by visualizing the future. We evaluate future narration generation on the largest egocentric dataset Ego4D.
This study presents an AI-driven pipeline for automated retinal segmentation and thickness analysis in diabetic retinopathy (DR) using SD-OCT imaging. A deep neural network was trained to segment ten retinal layers, intra-retinal fluid, and hyperreflective foci (HRF), with performance evaluated across multiple architectures. SwinUNETR achieved the highest segmentation accuracy, while VM-Unet excelled in specific layers. Analysis revealed distinct thickness variations between NPDR and PDR, with correlations between layer thickness and visual acuity. The proposed method enhances DR assessment by reducing manual annotation effort and providing clinically relevant thickness maps for disease monitoring and treatment planning.