planning - 2025-04-16

Leveraging Point Transformers for Detecting Anatomical Landmarks in Digital Dentistry

Authors:Tibor Kubík, Oldřich Kodym, Petr Šilling, Kateřina Trávníčková, Tomáš Mojžiš, Jan Matula

Date:2025-04-15 17:34:56

The increasing availability of intraoral scanning devices has heightened their importance in modern clinical orthodontics. Clinicians utilize advanced Computer-Aided Design techniques to create patient-specific treatment plans that include laboriously identifying crucial landmarks such as cusps, mesial-distal locations, facial axis points, and tooth-gingiva boundaries. Detecting such landmarks automatically presents challenges, including limited dataset sizes, significant anatomical variability among subjects, and the geometric nature of the data. We present our experiments from the 3DTeethLand Grand Challenge at MICCAI 2024. Our method leverages recent advancements in point cloud learning through transformer architectures. We designed a Point Transformer v3 inspired module to capture meaningful geometric and anatomical features, which are processed by a lightweight decoder to predict per-point distances, further processed by graph-based non-minima suppression. We report promising results and discuss insights on learned feature interpretability.

Looking beyond the next token

Authors:Abitha Thankaraj, Yiding Jiang, J. Zico Kolter, Yonatan Bisk

Date:2025-04-15 16:09:06

The structure of causal language model training assumes that each token can be accurately predicted from the previous context. This contrasts with humans' natural writing and reasoning process, where goals are typically known before the exact argument or phrasings. While this mismatch has been well studied in the literature, the working assumption has been that architectural changes are needed to address this mismatch. We argue that rearranging and processing the training data sequences can allow models to more accurately imitate the true data-generating process, and does not require any other changes to the architecture or training infrastructure. We demonstrate that this technique, Trelawney, and the inference algorithms derived from it allow us to improve performance on several key benchmarks that span planning, algorithmic reasoning, and story generation tasks. Finally, our method naturally enables the generation of long-term goals at no additional cost. We investigate how using the model's goal-generation capability can further improve planning and reasoning. Additionally, we believe Trelawney could potentially open doors to new capabilities beyond the current language modeling paradigm.

The Robotability Score: Enabling Harmonious Robot Navigation on Urban Streets

Authors:Matt Franchi, Maria Teresa Parreira, Fanjun Bu, Wendy Ju

Date:2025-04-15 13:11:47

This paper introduces the Robotability Score ($R$), a novel metric that quantifies the suitability of urban environments for autonomous robot navigation. Through expert interviews and surveys, we identify and weigh key features contributing to R for wheeled robots on urban streets. Our findings reveal that pedestrian density, crowd dynamics and pedestrian flow are the most critical factors, collectively accounting for 28% of the total score. Computing robotability across New York City yields significant variation; the area of highest R is 3.0 times more "robotable" than the area of lowest R. Deployments of a physical robot on high and low robotability areas show the adequacy of the score in anticipating the ease of robot navigation. This new framework for evaluating urban landscapes aims to reduce uncertainty in robot deployment while respecting established mobility patterns and urban planning principles, contributing to the discourse on harmonious human-robot environments.

Hessian stability and convergence rates for entropic and Sinkhorn potentials via semiconcavity

Authors:Giacomo Greco, Luca Tamanini

Date:2025-04-15 12:34:09

In this paper we determine quantitative stability bounds for the Hessian of entropic potentials, i.e., the dual solution to the entropic optimal transport problem. Up to authors' knowledge this is the first work addressing this second-order quantitative stability estimate in general unbounded settings. Our proof strategy relies on semiconcavity properties of entropic potentials and on the representation of entropic transport plans as laws of forward and backward diffusion processes, known as Schr\"odinger bridges. Moreover, our approach allows to deduce a stochastic proof of quantitative stability entropic estimates and integrated gradient estimates as well. Finally, as a direct consequence of these stability bounds, we deduce exponential convergence rates for gradient and Hessian of Sinkhorn iterates along Sinkhorn's algorithm, a problem that was still open in unbounded settings. Our rates have a polynomial dependence on the regularization parameter.

FCC feasibility studies: Impact of tracker- and calorimeter-detector performance on jet flavor identification and Higgs physics analyses

Authors:Haider Abidi, Ketevi A. Assamagan, Diallo Boye, Elizabeth Brost, Viviana Cavaliere, Anna E. Connelly, George Iakovidis, Ang Li, Marc-André Pleier, Andrea Sciandra, Michele Selvaggi, Scott Snyder, Robert Szafron, Abraham Tishelman-Charny, Iza Veliscek

Date:2025-04-15 11:51:47

The extensive and ambitious physics program planned at the Future Circular Collider for electrons and positrons (FCC-ee) imposes strict constraints on detector performance. This work investigates how different detector properties impact jet flavor identification and their subsequent effects on high-profile physics analyses. Using Higgs boson coupling measurements and searches for invisible Higgs decays as benchmarks, we systematically evaluate the sensitivity of these analyses to tracker and calorimeter detector configurations. We examine variations in single-point resolution, material budget, silicon layer placement, and particle identification capabilities, quantifying their effects on flavor-tagging performance. Additionally, we present the first comprehensive study of Higgs-to-invisible decay detection using full detector simulation, providing important insights for optimizing future detector designs at lepton colliders.

FreeDOM: Online Dynamic Object Removal Framework for Static Map Construction Based on Conservative Free Space Estimation

Authors:Chen Li, Wanlei Li, Wenhao Liu, Yixiang Shu, Yunjiang Lou

Date:2025-04-15 11:16:09

Online map construction is essential for autonomous robots to navigate in unknown environments. However, the presence of dynamic objects may introduce artifacts into the map, which can significantly degrade the performance of localization and path planning. To tackle this problem, a novel online dynamic object removal framework for static map construction based on conservative free space estimation (FreeDOM) is proposed, consisting of a scan-removal front-end and a map-refinement back-end. First, we propose a multi-resolution map structure for fast computation and effective map representation. In the scan-removal front-end, we employ raycast enhancement to improve free space estimation and segment the LiDAR scan based on the estimated free space. In the map-refinement back-end, we further eliminate residual dynamic objects in the map by leveraging incremental free space information. As experimentally verified on SemanticKITTI, HeLiMOS, and indoor datasets with various sensors, our proposed framework overcomes the limitations of visibility-based methods and outperforms state-of-the-art methods with an average F1-score improvement of 9.7%.

Designing optimal subsidy schemes and recycling plans for sustainable treatment of construction and demolition waste

Authors:Lei Yu, Qian Ge, Ke Han, Wen Ji, Yueqi Liu

Date:2025-04-15 08:00:46

More than 10 billion tons of construction and demolition waste (CW) are generated globally each year, exerting a significant impact on the environment. In the CW recycling process, the government and the carrier are the two primary stakeholders. The carrier is responsible for transporting CW from production sites to backfill sites or processing facilities, with a primary focus on transport efficiency and revenue. Meanwhile, the government aims to minimize pollution from the recycling system, which is influenced by transport modes, shipment distances, and the processing methods used for CW. This paper develops a bi-objective, bi-level optimization model to address these challenges. The upper-level model is a linear programming model that optimizes the government's subsidy scheme, while the lower-level model is a minimum-cost flow model that optimizes the carrier's recycling plan. A hybrid heuristic solution method is proposed to tackle the problem's complexity. A case study in Chengdu, China, demonstrates the computational efficiency of the model and its small solution gap. With an optimized subsidy scheme and recycling plan, pollution can be reduced by over 29.29% through a relatively small investment in subsidies.

Fine-Grained Rib Fracture Diagnosis with Hyperbolic Embeddings: A Detailed Annotation Framework and Multi-Label Classification Model

Authors:Shripad Pate, Aiman Farooq, Suvrankar Dutta, Musadiq Aadil Sheikh, Atin Kumar, Deepak Mishra

Date:2025-04-15 05:47:09

Accurate rib fracture identification and classification are essential for treatment planning. However, existing datasets often lack fine-grained annotations, particularly regarding rib fracture characterization, type, and precise anatomical location on individual ribs. To address this, we introduce a novel rib fracture annotation protocol tailored for fracture classification. Further, we enhance fracture classification by leveraging cross-modal embeddings that bridge radiological images and clinical descriptions. Our approach employs hyperbolic embeddings to capture the hierarchical nature of fracture, mapping visual features and textual descriptions into a shared non-Euclidean manifold. This framework enables more nuanced similarity computations between imaging characteristics and clinical descriptions, accounting for the inherent hierarchical relationships in fracture taxonomy. Experimental results demonstrate that our approach outperforms existing methods across multiple classification tasks, with average recall improvements of 6% on the AirRib dataset and 17.5% on the public RibFrac dataset.

A Sublinear Algorithm for Path Feasibility Among Rectangular Obstacles

Authors:Alex Fan, Alicia Li, Arul Kolla, Jason Gonzalez

Date:2025-04-15 04:40:25

The problem of finding a path between two points while avoiding obstacles is critical in robotic path planning. We focus on the feasibility problem: determining whether such a path exists. We model the robot as a query-specific rectangular object capable of moving parallel to its sides. The obstacles are axis-aligned, rectangular, and may overlap. Most previous works only consider nondisjoint rectangular objects and point-sized or statically sized robots. Our approach introduces a novel technique leveraging generalized Gabriel graphs and constructs a data structure to facilitate online queries regarding path feasibility with varying robot sizes in sublinear time. To efficiently handle feasibility queries, we propose an online algorithm utilizing sweep line to construct a generalized Gabriel graph under the $L_\infty$ norm, capturing key gap constraints between obstacles. We utilize a persistent disjoint-set union data structure to efficiently determine feasibility queries in $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ total space.

Hallucination-Aware Generative Pretrained Transformer for Cooperative Aerial Mobility Control

Authors:Hyojun Ahn, Seungcheol Oh, Gyu Seon Kim, Soyi Jung, Soohyun Park, Joongheon Kim

Date:2025-04-15 03:21:08

This paper proposes SafeGPT, a two-tiered framework that integrates generative pretrained transformers (GPTs) with reinforcement learning (RL) for efficient and reliable unmanned aerial vehicle (UAV) last-mile deliveries. In the proposed design, a Global GPT module assigns high-level tasks such as sector allocation, while an On-Device GPT manages real-time local route planning. An RL-based safety filter monitors each GPT decision and overrides unsafe actions that could lead to battery depletion or duplicate visits, effectively mitigating hallucinations. Furthermore, a dual replay buffer mechanism helps both the GPT modules and the RL agent refine their strategies over time. Simulation results demonstrate that SafeGPT achieves higher delivery success rates compared to a GPT-only baseline, while substantially reducing battery consumption and travel distance. These findings validate the efficacy of combining GPT-based semantic reasoning with formal safety guarantees, contributing a viable solution for robust and energy-efficient UAV logistics.

Following Is All You Need: Robot Crowd Navigation Using People As Planners

Authors:Yuwen Liao, Xinhang Xu, Ruofei Bai, Yizhuo Yang, Muqing Cao, Shenghai Yuan, Lihua Xie

Date:2025-04-15 03:11:10

Navigating in crowded environments requires the robot to be equipped with high-level reasoning and planning techniques. Existing works focus on developing complex and heavyweight planners while ignoring the role of human intelligence. Since humans are highly capable agents who are also widely available in a crowd navigation setting, we propose an alternative scheme where the robot utilises people as planners to benefit from their effective planning decisions and social behaviours. Through a set of rule-based evaluations, we identify suitable human leaders who exhibit the potential to guide the robot towards its goal. Using a simple base planner, the robot follows the selected leader through shorthorizon subgoals that are designed to be straightforward to achieve. We demonstrate through both simulated and real-world experiments that our novel framework generates safe and efficient robot plans compared to existing planners, even without predictive or data-driven modules. Our method also brings human-like robot behaviours without explicitly defining traffic rules and social norms. Code will be available at https://github.com/centiLinda/PeopleAsPlanner.git.

FlexiContracts: A Novel and Efficient Scheme for Upgrading Smart Contracts in Ethereum Blockchain

Authors:Tahrim Hossain, Sakib Hassan, Faisal Haque Bappy, Muhammad Nur Yanhaona, Sarker Ahmed Rumee, Moinul Zaber, Tariqul Islam

Date:2025-04-15 02:20:42

Blockchain technology has revolutionized contractual processes, enhancing efficiency and trust through smart contracts. Ethereum, as a pioneer in this domain, offers a platform for decentralized applications but is challenged by the immutability of smart contracts, which makes upgrades cumbersome. Existing design patterns, while addressing upgradability, introduce complexity, increased development effort, and higher gas costs, thus limiting their effectiveness. In response, we introduce FlexiContracts, an innovative scheme that reimagines the evolution of smart contracts on Ethereum. By enabling secure, in-place upgrades without losing historical data, FlexiContracts surpasses existing approaches, introducing a previously unexplored path in smart contract evolution. Its streamlined design transcends the limitations of current design patterns by simplifying smart contract development, eliminating the need for extensive upfront planning, and significantly reducing the complexity of the design process. This advancement fosters an environment for continuous improvement and adaptation to new requirements, redefining the possibilities for dynamic, upgradable smart contracts.

Superfast Configuration-Space Convex Set Computation on GPUs for Online Motion Planning

Authors:Peter Werner, Richard Cheng, Tom Stewart, Russ Tedrake, Daniela Rus

Date:2025-04-15 00:54:55

In this work, we leverage GPUs to construct probabilistically collision-free convex sets in robot configuration space on the fly. This extends the use of modern motion planning algorithms that leverage such representations to changing environments. These planners rapidly and reliably optimize high-quality trajectories, without the burden of challenging nonconvex collision-avoidance constraints. We present an algorithm that inflates collision-free piecewise linear paths into sequences of convex sets (SCS) that are probabilistically collision-free using massive parallelism. We then integrate this algorithm into a motion planning pipeline, which leverages dynamic roadmaps to rapidly find one or multiple collision-free paths, and inflates them. We then optimize the trajectory through the probabilistically collision-free sets, simultaneously using the candidate trajectory to detect and remove collisions from the sets. We demonstrate the efficacy of our approach on a simulation benchmark and a KUKA iiwa 7 robot manipulator with perception in the loop. On our benchmark, our approach runs 17.1 times faster and yields a 27.9% increase in reliability over the nonlinear trajectory optimization baseline, while still producing high-quality motion plans.

HyRRT-Connect: Bidirectional Motion Planning for Hybrid Dynamical Systems

Authors:Nan Wang, Ricardo G. Sanfelice

Date:2025-04-14 20:46:54

This paper proposes a bidirectional rapidly-exploring random trees (RRT) algorithm to solve the motion planning problem for hybrid systems. The proposed algorithm, called HyRRT-Connect, propagates in both forward and backward directions in hybrid time until an overlap between the forward and backward propagation results is detected. Then, HyRRT-Connect constructs a motion plan through the reversal and concatenation of functions defined on hybrid time domains, ensuring that the motion plan satisfies the given hybrid dynamics. To address the potential discontinuity along the flow caused by tolerating some distance between the forward and backward partial motion plans, we reconstruct the backward partial motion plan by a forward-in-hybrid-time simulation from the final state of the forward partial motion plan. effectively eliminating the discontinuity. The proposed algorithm is applied to an actuated bouncing ball system and a walking robot example to highlight its computational improvement.

Predicting Power Grid Failures Using Self-Organized Criticality: A Case Study of the Texas Grid 2014-2022

Authors:Mary Lai O. Salvaña, Gregory L. Tangonan

Date:2025-04-14 19:55:47

This study develops a novel predictive framework for power grid vulnerability based on the statistical signatures of Self-Organized Criticality (SOC). By analyzing the evolution of the power law critical exponents in outage size distributions from the Texas grid during 2014-2022, we demonstrate the method's ability for forecasting system-wide vulnerability to catastrophic failures. Our results reveal a systematic decline in the critical exponent from 1.45 in 2018 to 0.95 in 2020, followed by a drop below the theoretical critical threshold ($\alpha$ = 1) to 0.62 in 2021, coinciding precisely with the catastrophic February 2021 power crisis. Such predictive signal emerged 6-12 months before the crisis. By monitoring critical exponent transitions through subcritical and supercritical regimes, we provide quantitative early warning capabilities for catastrophic infrastructure failures, with significant implications for grid resilience planning, risk assessment, and emergency preparedness in increasingly stressed power systems.

Decoupled Diffusion Sparks Adaptive Scene Generation

Authors:Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, Hongyang Li

Date:2025-04-14 17:59:57

Controllable scene generation could reduce the cost of diverse data collection substantially for autonomous driving. Prior works formulate the traffic layout generation as predictive progress, either by denoising entire sequences at once or by iteratively predicting the next frame. However, full sequence denoising hinders online reaction, while the latter's short-sighted next-frame prediction lacks precise goal-state guidance. Further, the learned model struggles to generate complex or challenging scenarios due to a large number of safe and ordinal driving behaviors from open datasets. To overcome these, we introduce Nexus, a decoupled scene generation framework that improves reactivity and goal conditioning by simulating both ordinal and challenging scenarios from fine-grained tokens with independent noise states. At the core of the decoupled pipeline is the integration of a partial noise-masking training strategy and a noise-aware schedule that ensures timely environmental updates throughout the denoising process. To complement challenging scenario generation, we collect a dataset consisting of complex corner cases. It covers 540 hours of simulated data, including high-risk interactions such as cut-in, sudden braking, and collision. Nexus achieves superior generation realism while preserving reactivity and goal orientation, with a 40% reduction in displacement error. We further demonstrate that Nexus improves closed-loop planning by 20% through data augmentation and showcase its capability in safety-critical data generation.

Layered Multirate Control of Constrained Linear Systems

Authors:Charis Stamouli, Anastasios Tsiamis, Manfred Morari, George J. Pappas

Date:2025-04-14 17:48:34

Layered control architectures have been a standard paradigm for efficiently managing complex constrained systems. A typical architecture consists of: i) a higher layer, where a low-frequency planner controls a simple model of the system, and ii) a lower layer, where a high-frequency tracking controller guides a detailed model of the system toward the output of the higher-layer model. A fundamental problem in this layered architecture is the design of planners and tracking controllers that guarantee both higher- and lower-layer system constraints are satisfied. Toward addressing this problem, we introduce a principled approach for layered multirate control of linear systems subject to output and input constraints. Inspired by discrete-time simulation functions, we propose a streamlined control design that guarantees the lower-layer system tracks the output of the higher-layer system with computable precision. Using this design, we derive conditions and present a method for propagating the constraints of the lower-layer system to the higher-layer system. The propagated constraints are integrated into the design of an arbitrary planner that can handle higher-layer system constraints. Our framework ensures that the output constraints of the lower-layer system are satisfied at all high-level time steps, while respecting its input constraints at all low-level time steps. We apply our approach in a scenario of motion planning, highlighting its critical role in ensuring collision avoidance.

HybridCollab: Unifying In-Person and Remote Collaboration for Cardiovascular Surgical Planning in Mobile Augmented Reality

Authors:Pratham Darrpan Mehta, Rahul Ozhur Narayanan, Vidhi Kulkarni, Timothy Slesnick, Fawwaz Shaw, Duen Horng Chau

Date:2025-04-14 17:31:35

Surgical planning for congenital heart disease traditionally relies on collaborative group examinations of a patient's 3D-printed heart model, a process that lacks flexibility and accessibility. While mobile augmented reality (AR) offers a promising alternative with its portability and familiar interaction gestures, existing solutions limit collaboration to users in the same physical space. We developed HybridCollab, the first iOS AR application that introduces a novel paradigm that enables both in-person and remote medical teams to interact with a shared AR heart model in a single surgical planning session. For example, a team of two doctors in one hospital room can collaborate in real time with another team in a different hospital.Our approach is the first to leverage Apple's GameKit service for surgical planning, ensuring an identical collaborative experience for all participants, regardless of location. Additionally, co-located users can interact with the same anchored heart model in their shared physical space. By bridging the gap between remote and in-person collaboration across medical teams, HybridCollab has the potential for significant real-world impact, streamlining communication and enhancing the effectiveness of surgical planning. Watch the demo: https://youtu.be/hElqJYDuvLM.

An energy optimization method based on mixed-integer model and variational quantum computing algorithm for faster IMPT

Authors:Ya-Nan Zhu, Nimita Shinde, Bowen Lin, Hao Gao

Date:2025-04-14 15:24:23

Intensity-modulated proton therapy (IMPT) offers superior dose conformity with reduced exposure to surrounding healthy tissues compared to conventional photon therapy. Improving IMPT delivery efficiency reduces motion-related uncertainties, enhances plan robustness, and benefits breath-hold techniques by shortening treatment time. Among various factors, energy switching time plays a critical role, making energy layer optimization (ELO) essential. This work develops an energy layer optimization method based on mixed integer model and variational quantum computing algorithm to enhance the efficiency of IMPT. The energy layer optimization problem is modeled as a mixed-integer program, where continuous variables optimize the dose distribution and binary variables indicate energy layer selection. To solve it, iterative convex relaxation decouples the dose-volume constraints, followed by the alternating direction method of multipliers (ADMM) to separate mixed-variable optimization and the minimum monitor unit (MMU) constraint. The resulting beam intensity subproblem, subject to MMU, either admits a closed-form solution or is efficiently solvable via conjugate gradient. The binary subproblem is cast as a quadratic unconstrained binary optimization (QUBO) problem, solvable using variational quantum computing algorithms. With nearly the same plan quality, the proposed method noticeable reduces the number of the used energies. For example, compared to conventional IMPT, QC can reduce the number of energy layers from 61 to 35 in HN case, from 56 to 35 in lung case, and from 59 to 32 to abdomen case. The reduced number of energies also results in fewer delivery time, e.g., the delivery time is reduced from 100.6, 232.0, 185.3 seconds to 90.7, 215.4, 154.0 seconds, respectively.

Essay: A path for the construction of a Muon Collider

Authors:Diktys Stratakis

Date:2025-04-14 15:06:02

Muons are elementary particles and provide cleaner collision events that can explore higher energies compared to composite particles like protons. Muons are also far heavier than their electron cousins, meaning that they emit less synchrotron radiation that effectively limits the energies of circular electron-positron colliders. These characteristics open up the possibility for a Muon Collider to surpass the direct energy reach of the Large Hadron Collider while achieving unprecedented precision measurements of Standard Model processes. In this Essay, after briefly summarizing the progress achieved so far, I identify important missing R&D steps and envision a compelling plan to bring a Muon Collider to reality in the next two decades. A Muon Collider could allow for the exploration of physics that is not available with current technologies. For example, it may provide a way to study the Higgs boson directly or probe new particles, including those related to dark matter or other phenomena beyond the Standard Model.

A Quasi-Steady-State Black Box Simulation Approach for the Generation of g-g-g-v Diagrams

Authors:Frederik Werner, Simon Sagmeister, Mattia Piccinini, Johannes Betz

Date:2025-04-14 13:45:26

The classical g-g diagram, representing the achievable acceleration space for a vehicle, is commonly used as a constraint in trajectory planning and control due to its computational simplicity. To address non-planar road geometries, this concept can be extended to incorporate g-g constraints as a function of vehicle speed and vertical acceleration, commonly referred to as g-g-g-v diagrams. However, the estimation of g-g-g-v diagrams is an open problem. Existing simulation-based approaches struggle to isolate non-transient, open-loop stable states across all combinations of speed and acceleration, while optimization-based methods often require simplified vehicle equations and have potential convergence issues. In this paper, we present a novel, open-source, quasi-steady-state black box simulation approach that applies a virtual inertial force in the longitudinal direction. The method emulates the load conditions associated with a specified longitudinal acceleration while maintaining constant vehicle speed, enabling open-loop steering ramps in a purely QSS manner. Appropriate regulation of the ramp steer rate inherently mitigates transient vehicle dynamics when determining the maximum feasible lateral acceleration. Moreover, treating the vehicle model as a black box eliminates model mismatch issues, allowing the use of high-fidelity or proprietary vehicle dynamics models typically unsuited for optimization approaches. An open-source version of the proposed method is available at: https://github.com/TUM-AVS/GGGVDiagrams

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Authors:Junlei Zhang, Zichen Ding, Chang Ma, Zijie Chen, Qiushi Sun, Zhenzhong Lan, Junxian He

Date:2025-04-14 11:35:02

Graphical User Interface (GUI) agents offer cross-platform solutions for automating complex digital tasks, with significant potential to transform productivity workflows. However, their performance is often constrained by the scarcity of high-quality trajectory data. To address this limitation, we propose training Vision Language Models (VLMs) on data-rich, reasoning-intensive tasks during a dedicated mid-training stage, and then examine how incorporating these tasks facilitates generalization to GUI planning scenarios. Specifically, we explore a range of tasks with readily available instruction-tuning data, including GUI perception, multimodal reasoning, and textual reasoning. Through extensive experiments across 11 mid-training tasks, we demonstrate that: (1) Task generalization proves highly effective, yielding substantial improvements across most settings. For instance, multimodal mathematical reasoning enhances performance on AndroidWorld by an absolute 6.3%. Remarkably, text-only mathematical data significantly boosts GUI web agent performance, achieving a 5.6% improvement on WebArena and 5.4% improvement on AndroidWorld, underscoring notable cross-modal generalization from text-based to visual domains; (2) Contrary to prior assumptions, GUI perception data - previously considered closely aligned with GUI agent tasks and widely utilized for training - has a comparatively limited impact on final performance; (3) Building on these insights, we identify the most effective mid-training tasks and curate optimized mixture datasets, resulting in absolute performance gains of 8.0% on WebArena and 12.2% on AndroidWorld. Our work provides valuable insights into cross-domain knowledge transfer for GUI agents and offers a practical approach to addressing data scarcity challenges in this emerging field. The code, data and models will be available at https://github.com/hkust-nlp/GUIMid.

Impact of rainfall risk on rice production: realized volatility in mean model

Authors:Soham Ghosh, Sujay Mukhoti, Pritee Sharma

Date:2025-04-14 11:30:37

Rural economies are largely dependent upon agriculture, which is greatly determined by climatic conditions such as rainfall. This study aims to forecast agricultural production in Maharashtra, India, which utilises annual data from the year 1962 to 2021. Since rainfall plays a major role with respect to the crop yield, we analyze the impact of rainfall on crop yield using four time series models that includes ARIMA, ARIMAX, GARCH-ARIMA and GARCH-ARIMAX. We take advantage of rainfall as an external regressor to examine if it contributes to the performance of the model. 1-step, 2-step, and 3-step ahead forecasts are obtained and the model performance is assessed using MAE and RMSE. The models are able to more accurately predict when using rainfall as a predictor compared to when solely dependant on historical production trends (more improved outcomes are seen in the ARIMAX and GARCH-ARIMAX models). As such, these findings underscore the need for climate-aware forecasting techniques that provide useful information to policymakers and farmers to aid in agricultural planning.

Application of nanodiamond-polymer composite holographic gratings in a very cold neutron interferometer

Authors:Sonja Falmbigl, Roxana H. Ackermann, Elhoucine Hadden, Hanno Filter-Pieler, Tobias Jenke, Juergen Klepp, Christian Pruner, Yasuo Tomita, Martin Fally

Date:2025-04-14 11:26:03

In recent decades, photosensitive materials have been used for the development of optical devices not only for light, but also for cold and very cold neutrons. We show that holographically recorded gratings in nanodiamond-polymer composites (nDPC) form ideal diffraction elements for very cold neutrons. Their advantage of high diffraction efficiency, combined with low angular selectivity as a two-port beam splitter, meets the necessary conditions for application in a very cold neutron interferometer. We provide an overview of the latest achievements in the construction of such a triple Laue interferometer. A first operational test of the interferometer is planned immediately after this conference in May 2025.

A Computational Cognitive Model for Processing Repetitions of Hierarchical Relations

Authors:Zeng Ren, Xinyi Guan, Martin Rohrmeier

Date:2025-04-14 10:08:28

Patterns are fundamental to human cognition, enabling the recognition of structure and regularity across diverse domains. In this work, we focus on structural repeats, patterns that arise from the repetition of hierarchical relations within sequential data, and develop a candidate computational model of how humans detect and understand such structural repeats. Based on a weighted deduction system, our model infers the minimal generative process of a given sequence in the form of a Template program, a formalism that enriches the context-free grammar with repetition combinators. Such representation efficiently encodes the repetition of sub-computations in a recursive manner. As a proof of concept, we demonstrate the expressiveness of our model on short sequences from music and action planning. The proposed model offers broader insights into the mental representations and cognitive mechanisms underlying human pattern recognition.

Progressive Transfer Learning for Multi-Pass Fundus Image Restoration

Authors:Uyen Phan, Ozer Can Devecioglu, Serkan Kiranyaz, Moncef Gabbouj

Date:2025-04-14 09:28:10

Diabetic retinopathy is a leading cause of vision impairment, making its early diagnosis through fundus imaging critical for effective treatment planning. However, the presence of poor quality fundus images caused by factors such as inadequate illumination, noise, blurring and other motion artifacts yields a significant challenge for accurate DR screening. In this study, we propose progressive transfer learning for multi pass restoration to iteratively enhance the quality of degraded fundus images, ensuring more reliable DR screening. Unlike previous methods that often focus on a single pass restoration, multi pass restoration via PTL can achieve a superior blind restoration performance that can even improve most of the good quality fundus images in the dataset. Initially, a Cycle GAN model is trained to restore low quality images, followed by PTL induced restoration passes over the latest restored outputs to improve overall quality in each pass. The proposed method can learn blind restoration without requiring any paired data while surpassing its limitations by leveraging progressive learning and fine tuning strategies to minimize distortions and preserve critical retinal features. To evaluate PTL's effectiveness on multi pass restoration, we conducted experiments on DeepDRiD, a large scale fundus imaging dataset specifically curated for diabetic retinopathy detection. Our result demonstrates state of the art performance, showcasing PTL's potential as a superior approach to iterative image quality restoration.

NaviDiffusor: Cost-Guided Diffusion Model for Visual Navigation

Authors:Yiming Zeng, Hao Ren, Shuhang Wang, Junlong Huang, Hui Cheng

Date:2025-04-14 09:06:02

Visual navigation, a fundamental challenge in mobile robotics, demands versatile policies to handle diverse environments. Classical methods leverage geometric solutions to minimize specific costs, offering adaptability to new scenarios but are prone to system errors due to their multi-modular design and reliance on hand-crafted rules. Learning-based methods, while achieving high planning success rates, face difficulties in generalizing to unseen environments beyond the training data and often require extensive training. To address these limitations, we propose a hybrid approach that combines the strengths of learning-based methods and classical approaches for RGB-only visual navigation. Our method first trains a conditional diffusion model on diverse path-RGB observation pairs. During inference, it integrates the gradients of differentiable scene-specific and task-level costs, guiding the diffusion model to generate valid paths that meet the constraints. This approach alleviates the need for retraining, offering a plug-and-play solution. Extensive experiments in both indoor and outdoor settings, across simulated and real-world scenarios, demonstrate zero-shot transfer capability of our approach, achieving higher success rates and fewer collisions compared to baseline methods. Code will be released at https://github.com/SYSU-RoboticsLab/NaviD.

Fusing Bluetooth with Pedestrian Dead Reckoning: A Floor Plan-Assisted Positioning Approach

Authors:Wenxuan Pan, Yang Yang, Mingzhe Chen, Dong Wei, Caili Guo, Shiwen Mao

Date:2025-04-14 06:00:39

Floor plans can provide valuable prior information that helps enhance the accuracy of indoor positioning systems. However, existing research typically faces challenges in efficiently leveraging floor plan information and applying it to complex indoor layouts. To fully exploit information from floor plans for positioning, we propose a floor plan-assisted fusion positioning algorithm (FP-BP) using Bluetooth low energy (BLE) and pedestrian dead reckoning (PDR). In the considered system, a user holding a smartphone walks through a positioning area with BLE beacons installed on the ceiling, and can locate himself in real time. In particular, FP-BP consists of two phases. In the offline phase, FP-BP programmatically extracts map features from a stylized floor plan based on their binary masks, and constructs a mapping function to identify the corresponding map feature of any given position on the map. In the online phase, FP-BP continuously computes BLE positions and PDR results from BLE signals and smartphone sensors, where a novel grid-based maximum likelihood estimation (GML) algorithm is introduced to enhance BLE positioning. Then, a particle filter is used to fuse them and obtain an initial estimate. Finally, FP-BP performs post-position correction to obtain the final position based on its specific map feature. Experimental results show that FP-BP can achieve a real-time mean positioning accuracy of 1.19 m, representing an improvement of over 28% compared to existing floor plan-fused baseline algorithms.

Can VLMs Assess Similarity Between Graph Visualizations?

Authors:Seokweon Jung, Hyeon Jeon, Jeongmin Rhee, Jinwook Seo

Date:2025-04-14 04:08:27

Graph visualizations have been studied for tasks such as clustering and temporal analysis, but how these visual similarities relate to established graph similarity measures remains unclear. In this paper, we explore the potential of Vision Language Models (VLMs) to approximate human-like perception of graph similarity. We generate graph datasets of various sizes and densities and compare VLM-derived visual similarity scores with feature-based measures. Our findings indicate VLMs can assess graph similarity in a manner similar to feature-based measures, even though differences among the measures exist. In future work, we plan to extend our research by conducting experiments on human visual graph perception.

Score Matching Diffusion Based Feedback Control and Planning of Nonlinear Systems

Authors:Karthik Elamvazhuthi, Darshan Gadginmath, Fabio Pasqualetti

Date:2025-04-14 03:04:48

We propose a novel control-theoretic framework that leverages principles from generative modeling -- specifically, Denoising Diffusion Probabilistic Models (DDPMs) -- to stabilize control-affine systems with nonholonomic constraints. Unlike traditional stochastic approaches, which rely on noise-driven dynamics in both forward and reverse processes, our method crucially eliminates the need for noise in the reverse phase, making it particularly relevant for control applications. We introduce two formulations: one where noise perturbs all state dimensions during the forward phase while the control system enforces time reversal deterministically, and another where noise is restricted to the control channels, embedding system constraints directly into the forward process. For controllable nonlinear drift-free systems, we prove that deterministic feedback laws can exactly reverse the forward process, ensuring that the system's probability density evolves correctly without requiring artificial diffusion in the reverse phase. Furthermore, for linear time-invariant systems, we establish a time-reversal result under the second formulation. By eliminating noise in the backward process, our approach provides a more practical alternative to machine learning-based denoising methods, which are unsuitable for control applications due to the presence of stochasticity. We validate our results through numerical simulations on benchmark systems, including a unicycle model in a domain with obstacles, a driftless five-dimensional system, and a four-dimensional linear system, demonstrating the potential for applying diffusion-inspired techniques in linear, nonlinear, and settings with state space constraints.