planning - 2025-04-27

LUIDA: Large-scale Unified Infrastructure for Digital Assessments based on Commercial Metaverse Platform

Authors:Yong-Hao Hu, Sotaro Yokoi, Yuji Hatada, Yuichi Hiroi, Takuji Narumi, Takefumi Hiraki
Date:2025-04-24 16:11:12

Online experiments using metaverse platforms have gained significant traction in Human-Computer Interaction and Virtual Reality (VR) research. However, current research workflows are highly fragmented, as researchers must use separate tools for system implementation, participant recruitment, experiment execution, and data collection, reducing consistency and increasing workload. We present LUIDA (Large-scale Unified Infrastructure for Digital Assessments), a metaverse-based framework that integrates these fragmented processes. LUIDA automatically allocates interconnected virtual environments for parallel experiment execution and provides implementation templates adaptable to various VR research domains, requiring minimal metaverse development expertise. Our evaluation included two studies using a prototype built on Cluster, the commercial metaverse platform. First, VR researchers using LUIDA to develop and run experiments reported high usability scores (SUS: 73.75) and moderate workload (NASA-TLX: 24.11) for overall usage, with interviews confirming streamlined workflows compared to traditional laboratory experiments. Second, we conducted three replicated experiments with public Cluster users, each recruiting approximately 200 participants within one week. These experiments produced results that closely matched the original studies, validating the experimental integrity of LUIDA across research domains. After technical refinements, we plan to release LUIDA as an open platform, providing a standardized protocol to improve research efficiency and experimental reproducibility in VR studies.

BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring

Authors:Asier Bikandi, Muhammad Shaheer, Hriday Bavle, Jayan Jevanesan, Holger Voos, Jose Luis Sanchez-Lopez
Date:2025-04-24 16:02:02

Augmented reality (AR) applications for construction monitoring rely on real-time environmental tracking to visualize architectural elements. However, construction sites present significant challenges for traditional tracking methods due to featureless surfaces, dynamic changes, and drift accumulation, leading to misalignment between digital models and the physical world. This paper proposes a BIM-aware drift correction method to address these challenges. Instead of relying solely on SLAM-based localization, we align ``as-built" detected planes from the real-world environment with ``as-planned" architectural planes in BIM. Our method performs robust plane matching and computes a transformation (TF) between SLAM (S) and BIM (B) origin frames using optimization techniques, minimizing drift over time. By incorporating BIM as prior structural knowledge, we can achieve improved long-term localization and enhanced AR visualization accuracy in noisy construction environments. The method is evaluated through real-world experiments, showing significant reductions in drift-induced errors and optimized alignment consistency. On average, our system achieves a reduction of 52.24% in angular deviations and a reduction of 60.8% in the distance error of the matched walls compared to the initial manual alignment by the user.

Modular Cosmic Ray Detector (MCORD) and its Potential Use in Various Physics Experiments, Astrophysics and Geophysics

Authors:M. Bielewicz, M. Kiecana, A. Bancer, J. Grzyb, L. Swiderski, M. Grodzicka-Kobylka, T. Szczesniak, A. Dziedzic, K. Grodzicki, E. Jaworska, A. Syntfeld-Kazuch
Date:2025-04-24 15:04:43

As part of the collaboration building a set of detectors for the new collider, our group was tasked with designing and building a large-scale cosmic ray detector, which was to complement the capabilities of the MPD (Dubna) detec-tor set. The detector was planned as a trigger for cosmic ray particles and to be used to calibrate and test other systems. Additional functions were to be the detection of pairs of high-energy muons originating from some parti-cle decay processes generated during collisions and con-tinuous observation of the cosmic muon stream in order to detect multi muons events. From the very beginning, the detector was designed as a scalable and universal device for many applications. The following work will present the basic features and parameters of the Modular COsmic Ray Detector (MCORD) and examples of its possible use in high energy physics, astrophysics and geology. Thanks to its universal nature, MCORD can be potential used as a fast trigger, neutron veto detector, muon detector and as a tool in muon tomography.

Flying through cluttered and dynamic environments with LiDAR

Authors:Huajie Wu, Wenyi Liu, Yunfan Ren, Zheng Liu, Hairuo Wei, Fangcheng Zhu, Haotian Li, Fu Zhang
Date:2025-04-24 13:59:06

Navigating unmanned aerial vehicles (UAVs) through cluttered and dynamic environments remains a significant challenge, particularly when dealing with fast-moving or sudden-appearing obstacles. This paper introduces a complete LiDAR-based system designed to enable UAVs to avoid various moving obstacles in complex environments. Benefiting the high computational efficiency of perception and planning, the system can operate in real time using onboard computing resources with low latency. For dynamic environment perception, we have integrated our previous work, M-detector, into the system. M-detector ensures that moving objects of different sizes, colors, and types are reliably detected. For dynamic environment planning, we incorporate dynamic object predictions into the integrated planning and control (IPC) framework, namely DynIPC. This integration allows the UAV to utilize predictions about dynamic obstacles to effectively evade them. We validate our proposed system through both simulations and real-world experiments. In simulation tests, our system outperforms state-of-the-art baselines across several metrics, including success rate, time consumption, average flight time, and maximum velocity. In real-world trials, our system successfully navigates through forests, avoiding moving obstacles along its path.

Unsupervised Urban Land Use Mapping with Street View Contrastive Clustering and a Geographical Prior

Authors:Lin Che, Yizi Chen, Tanhua Jin, Martin Raubal, Konrad Schindler, Peter Kiefer
Date:2025-04-24 13:41:27

Urban land use classification and mapping are critical for urban planning, resource management, and environmental monitoring. Existing remote sensing techniques often lack precision in complex urban environments due to the absence of ground-level details. Unlike aerial perspectives, street view images provide a ground-level view that captures more human and social activities relevant to land use in complex urban scenes. Existing street view-based methods primarily rely on supervised classification, which is challenged by the scarcity of high-quality labeled data and the difficulty of generalizing across diverse urban landscapes. This study introduces an unsupervised contrastive clustering model for street view images with a built-in geographical prior, to enhance clustering performance. When combined with a simple visual assignment of the clusters, our approach offers a flexible and customizable solution to land use mapping, tailored to the specific needs of urban planners. We experimentally show that our method can generate land use maps from geotagged street view image datasets of two cities. As our methodology relies on the universal spatial coherence of geospatial data ("Tobler's law"), it can be adapted to various settings where street view images are available, to enable scalable, unsupervised land use mapping and updating. The code will be available at https://github.com/lin102/CCGP.

SimFLEX: a methodology for comparative analysis of urban areas for implementing new on-demand feeder bus services

Authors:Hanna Vasiutina, Olha Shulika, Michał Bujak, Farnoud Ghasemi, Rafał Kucharski
Date:2025-04-24 13:27:49

On-demand feeder bus services present an innovative solution to urban mobility challenges, yet their success depends on thorough assessment and strategic planning. Despite their potential, a comprehensive framework for evaluating feasibility and identifying suitable service areas remains underdeveloped. Simulation Framework for Feeder Location Evaluation (SimFLEX) uses spatial, demographic, and transport-specific data to run microsimulations and compute key performance indicators (KPIs), including service attractiveness, waiting time reduction, and added value. SimFLEX employs multiple replications to estimate demand and mode choices and integrates OpenTripPlanner (OTP) for public transport routing and ExMAS for calculating shared trip attributes and KPIs. For each demand scenario, we model the traveler learning process using the method of successive averages (MSA), stabilizing the system. After stabilization, we calculate KPIs for comparative and sensitivity analyzes. We applied SimFLEX to compare two remote urban areas in Krakow, Poland - Bronowice and Skotniki - the candidates for service launch. Our analysis revealed notable differences between analyzed areas: Skotniki exhibited higher service attractiveness (up to 30%) and added value (up to 7%), while Bronowice showed greater potential for reducing waiting times (by nearly 77%). To assess the reliability of our model output, we conducted a sensitivity analysis across a range of alternative-specific constants (ASC). The results consistently confirmed Skotniki as the superior candidate for service implementation. SimFLEX can be instrumental for policymakers to estimate new service performance in the considered area, publicly available and applicable to various use cases. It can integrate alternative models and approaches, making it a versatile tool for policymakers and urban planners to enhance urban mobility.

Learning Isometric Embeddings of Road Networks using Multidimensional Scaling

Authors:Juan Carlos Climent Pardo
Date:2025-04-24 13:20:32

The lack of generalization in learning-based autonomous driving applications is shown by the narrow range of road scenarios that vehicles can currently cover. A generalizable approach should capture many distinct road structures and topologies, as well as consider traffic participants, and dynamic changes in the environment, so that vehicles can navigate and perform motion planning tasks even in the most difficult situations. Designing suitable feature spaces for neural network-based motion planers that encapsulate all kinds of road scenarios is still an open research challenge. This paper tackles this learning-based generalization challenge and shows how graph representations of road networks can be leveraged by using multidimensional scaling (MDS) techniques in order to obtain such feature spaces. State-of-the-art graph representations and MDS approaches are analyzed for the autonomous driving use case. Finally, the option of embedding graph nodes is discussed in order to perform easier learning procedures and obtain dimensionality reduction.

Towards Equitable Rail Service Allocation Through Fairness-Oriented Timetabling in Liberalized Markets

Authors:David Muñoz-Valero, Juan Moreno-Garcia, Julio Alberto López-Gómez, Enrique Adrian Villarrubia-Martin
Date:2025-04-24 12:30:27

Over the last few decades, European rail transport has undergone major changes as part of the process of liberalization set out in European regulations. In this context of liberalization, railway undertakings compete with each other for the limited infrastructure capacity available to offer their rail services. The infrastructure manager is responsible for the equitable allocation of infrastructure between all companies in the market, which is essential to ensure the efficiency and sustainability of this competitive ecosystem. In this paper, a methodology based on Jain, Gini and Atkinson equity metrics is used to solve the rail service allocation problem in a liberalized railway market, analyzing the solutions obtained. The results show that the proposed methodology and the equity metrics used allow for equitable planning in different competitiveness scenarios. These results contrast with solutions where the objective of the infrastructure manager is to maximize its own profit, without regard for the equitable allocation of infrastructure. Therefore, the computational tests support the methodology and metrics used as a planning and decision support tool in a liberalized railway market.

Probabilistic modeling of delays for train journeys with transfers

Authors:Nikolaus Stratil-Sauer, Nils Breyer
Date:2025-04-24 12:13:55

Reliability plays a key role in the experience of a rail traveler. The reliability of journeys involving transfers is affected by the reliability of the transfers and the consequences of missing a transfer, as well as the possible delay of the train used to reach the destination. In this paper, we propose a flexible method to model the reliability of train journeys with any number of transfers. The method combines a transfer reliability model based on gradient boosting responsible for predicting the reliability of transfers between trains and a delay model based on probabilistic Bayesian regression, which is used to model train arrival delays. The models are trained on delay data from four Swedish train stations and evaluated on delay data from another two stations, in order to evaluate the generalization performance of the models. We show that the probabilistic delay model, which models train delays following a mixture distribution with two lognormal components, allows to much more realistically model the distribution of actual train delays compared to a standard lognormal model. Finally, we show how these models can be used together to sample the arrival delay at the final destination of the entire journey. The results indicate that the method accurately predicts the reliability for nine out of ten tested journeys. The method could be used to improve journey planners by providing reliability information to travelers. Further applications include timetable planning and transport modeling.

AGCo-MATA: Air-Ground Collaborative Multi-Agent Task Allocation in Mobile Crowdsensing

Authors:Tianhao Shao, Bohan Feng, Yingying Zhou, Bin Guo, Kaixing Zhao
Date:2025-04-24 10:01:54

Rapid progress in intelligent unmanned systems has presented new opportunities for mobile crowd sensing (MCS). Today, heterogeneous air-ground collaborative multi-agent framework, which comprise unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs), have presented superior flexibility and efficiency compared to traditional homogeneous frameworks in complex sensing tasks. Within this context, task allocation among different agents always play an important role in improving overall MCS quality. In order to better allocate tasks among heterogeneous collaborative agents, in this paper, we investigated two representative complex multi-agent task allocation scenarios with dual optimization objectives: (1) For AG-FAMT (Air-Ground Few Agents More Tasks) scenario, the objectives are to maximize the task completion while minimizing the total travel distance; (2) For AG-MAFT (Air-Ground More Agents Few Tasks) scenario, where the agents are allocated based on their locations, has the optimization objectives of minimizing the total travel distance while reducing travel time cost. To achieve this, we proposed a Multi-Task Minimum Cost Maximum Flow (MT-MCMF) optimization algorithm tailored for AG-FAMT, along with a multi-objective optimization algorithm called W-ILP designed for AG-MAFT, with a particular focus on optimizing the charging path planning of UAVs. Our experiments based on a large-scale real-world dataset demonstrated that the proposed two algorithms both outperform baseline approaches under varying experimental settings, including task quantity, task difficulty, and task distribution, providing a novel way to improve the overall quality of mobile crowdsensing tasks.

Highly Accurate and Diverse Traffic Data: The DeepScenario Open 3D Dataset

Authors:Oussema Dhaouadi, Johannes Meier, Luca Wahl, Jacques Kaiser, Luca Scalerandi, Nick Wandelburg, Zhuolun Zhou, Nijanthan Berinpanathan, Holger Banzhaf, Daniel Cremers
Date:2025-04-24 08:43:48

Accurate 3D trajectory data is crucial for advancing autonomous driving. Yet, traditional datasets are usually captured by fixed sensors mounted on a car and are susceptible to occlusion. Additionally, such an approach can precisely reconstruct the dynamic environment in the close vicinity of the measurement vehicle only, while neglecting objects that are further away. In this paper, we introduce the DeepScenario Open 3D Dataset (DSC3D), a high-quality, occlusion-free dataset of 6 degrees of freedom bounding box trajectories acquired through a novel monocular camera drone tracking pipeline. Our dataset includes more than 175,000 trajectories of 14 types of traffic participants and significantly exceeds existing datasets in terms of diversity and scale, containing many unprecedented scenarios such as complex vehicle-pedestrian interaction on highly populated urban streets and comprehensive parking maneuvers from entry to exit. DSC3D dataset was captured in five various locations in Europe and the United States and include: a parking lot, a crowded inner-city, a steep urban intersection, a federal highway, and a suburban intersection. Our 3D trajectory dataset aims to enhance autonomous driving systems by providing detailed environmental 3D representations, which could lead to improved obstacle interactions and safety. We demonstrate its utility across multiple applications including motion prediction, motion planning, scenario mining, and generative reactive traffic agents. Our interactive online visualization platform and the complete dataset are publicly available at app.deepscenario.com, facilitating research in motion prediction, behavior modeling, and safety validation.

Tokenizing Stock Prices for Enhanced Multi-Step Forecast and Prediction

Authors:Zhuohang Zhu, Haodong Chen, Qiang Qu, Xiaoming Chen, Vera Chung
Date:2025-04-24 07:15:05

Effective stock price forecasting (estimating future prices) and prediction (estimating future price changes) are pivotal for investors, regulatory agencies, and policymakers. These tasks enable informed decision-making, risk management, strategic planning, and superior portfolio returns. Despite their importance, forecasting and prediction are challenging due to the dynamic nature of stock price data, which exhibit significant temporal variations in distribution and statistical properties. Additionally, while both forecasting and prediction targets are derived from the same dataset, their statistical characteristics differ significantly. Forecasting targets typically follow a log-normal distribution, characterized by significant shifts in mean and variance over time, whereas prediction targets adhere to a normal distribution. Furthermore, although multi-step forecasting and prediction offer a broader perspective and richer information compared to single-step approaches, it is much more challenging due to factors such as cumulative errors and long-term temporal variance. As a result, many previous works have tackled either single-step stock price forecasting or prediction instead. To address these issues, we introduce a novel model, termed Patched Channel Integration Encoder (PCIE), to tackle both stock price forecasting and prediction. In this model, we utilize multiple stock channels that cover both historical prices and price changes, and design a novel tokenization method to effectively embed these channels in a cross-channel and temporally efficient manner. Specifically, the tokenization process involves univariate patching and temporal learning with a channel-mixing encoder to reduce cumulative errors. Comprehensive experiments validate that PCIE outperforms current state-of-the-art models in forecast and prediction tasks.

Effect of Electrode Array Position on Electric Field Intensity in Glioblastoma Patients Undergoing Electric Field Therapy

Authors:Yousun Ko, Sangcheol Kim, Tae Hyun Kim, Dongho Shin, Haksoo Kim, Sung Uk Lee, Jonghyun Kim, Myonggeun Yoon
Date:2025-04-24 01:36:29

Background: The intensity of the electric field applied to a brain tumor by electric field therapy is influenced by the position of the electrode array, which should be optimized based on the patient's head shape and tumor characteristics. This study assessed the effects of varying electrode positions on electric field intensity in glioblastoma multiforme (GBM) patients. Methods: This study enrolled 13 GBM patients. The center of the MR slice corresponding to the center of the tumor was set as the reference point for the electrodes, creating pairs of electrode arrays in the top-rear and left-right positions. Based on this reference plan, four additional treatment plans were generated by rotating three of the four electrode arrays, all except the top electrode array, by 15$^\circ$ and 30$^\circ$ from their reference positions, resulting in a total of five treatment plans per patient. Electric field frequency was set at 200 kHz, and current density at 31 mArms/cm$^2$. The minimum and mean electric field intensities, homogeneity index (HI), and coverage index (CovI) were calculated and compared. Results: The optimal plans showed differences ranging from-0.39% to 24.20% for minimum intensity and -14.29% to 16.67% for mean intensity compared to reference plans. HI and CovI varied from 0.00% to 48.65% and 0.00% to 95.3%, respectively. The average improvements across all patients were 8.96% for minimum intensity, 5.11% for mean intensity, 15.65% for HI, and 17.84% for CovI. Conclusions: Optimizing electrode angle improves electric field therapy outcomes in GBM patients by maximizing field intensity and coverage. Keywords: electric field therapy; glioblastoma multiforme (GBM); treatment planning system (TPS); electrode array position; tumor coverage

Evaluating Learned Query Performance Prediction Models at LinkedIn: Challenges, Opportunities, and Findings

Authors:Chujun Song, Slim Bouguerra, Erik Krogen, Daniel Abadi
Date:2025-04-24 01:35:34

Recent advancements in learning-based query performance prediction models have demonstrated remarkable efficacy. However, these models are predominantly validated using synthetic datasets focused on cardinality or latency estimations. This paper explores the application of these models to LinkedIn's complex real-world OLAP queries executed on Trino, addressing four primary research questions: (1) How do these models perform on real-world industrial data with limited information? (2) Can these models generalize to new tasks, such as CPU time prediction and classification? (3) What additional information available from the query plan could be utilized by these models to enhance their performance? (4) What are the theoretical performance limits of these models given the available data? To address these questions, we evaluate several models-including TLSTM, TCNN, QueryFormer, and XGBoost, against the industrial query workload at LinkedIn, and extend our analysis to CPU time regression and classification tasks. We also propose a multi-task learning approach to incorporate underutilized operator-level metrics that could enhance model understanding. Additionally, we empirically analyze the inherent upper bound that can be achieved from the models.

Soft-Photon Contribution into Two-Photon Exchange Corrections for Azimuthal Asymmetries of SIDIS

Authors:Stinson Lee, Andrei Afanasev
Date:2025-04-23 22:27:08

It is demonstrated that two-photon exchange (TPE) corrections to the cross-section of unpolarized semi-inclusive deep-inelastic scattering (SIDIS) generate azimuthal-dependent terms and the corresponding $\braket{{\cos{(n\phi)}}}$ moments. A quark-diquark model of a nucleon was used in the calculations along with a soft-photon approximation. The infrared divergences in the intermediate steps of calculation are regularized with the fictitious photon mass that cancels out in the final result when the soft-photon bremsstrahlung process and interference terms are added. The calculations employ Mathematica ``LoopTools" package to evaluate the loop integrals. TPE corrections are analyzed in the kinematics of planned experiments at Jefferson Lab.

Latent Diffusion Planning for Imitation Learning

Authors:Amber Xie, Oleh Rybkin, Dorsa Sadigh, Chelsea Finn
Date:2025-04-23 17:53:34

Recent progress in imitation learning has been enabled by policy architectures that scale to complex visuomotor tasks, multimodal distributions, and large datasets. However, these methods often rely on learning from large amount of expert demonstrations. To address these shortcomings, we propose Latent Diffusion Planning (LDP), a modular approach consisting of a planner which can leverage action-free demonstrations, and an inverse dynamics model which can leverage suboptimal data, that both operate over a learned latent space. First, we learn a compact latent space through a variational autoencoder, enabling effective forecasting of future states in image-based domains. Then, we train a planner and an inverse dynamics model with diffusion objectives. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches, as they cannot leverage such additional data.

Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms

Authors:Hsin-Jung Yang, Mahsa Khosravi, Benjamin Walt, Girish Krishnan, Soumik Sarkar
Date:2025-04-23 17:41:55

Soft continuum arms (SCAs) soft and deformable nature presents challenges in modeling and control due to their infinite degrees of freedom and non-linear behavior. This work introduces a reinforcement learning (RL)-based framework for visual servoing tasks on SCAs with zero-shot sim-to-real transfer capabilities, demonstrated on a single section pneumatic manipulator capable of bending and twisting. The framework decouples kinematics from mechanical properties using an RL kinematic controller for motion planning and a local controller for actuation refinement, leveraging minimal sensing with visual feedback. Trained entirely in simulation, the RL controller achieved a 99.8% success rate. When deployed on hardware, it achieved a 67% success rate in zero-shot sim-to-real transfer, demonstrating robustness and adaptability. This approach offers a scalable solution for SCAs in 3D visual servoing, with potential for further refinement and expanded applications.

Computing Optimal Transport Plans via Min-Max Gradient Flows

Authors:Lauren Conger, Franca Hoffmann, Ricardo Baptista, Eric Mazumdar
Date:2025-04-23 17:11:34

We pose the Kantorovich optimal transport problem as a min-max problem with a Nash equilibrium that can be obtained dynamically via a two-player game, providing a framework for approximating optimal couplings. We prove convergence of the timescale-separated gradient descent dynamics to the optimal transport plan, and implement the gradient descent algorithm with a particle method, where the marginal constraints are enforced weakly using the KL divergence, automatically selecting a dynamical adaptation of the regularizer. The numerical results highlight the different advantages of using the standard Kullback-Leibler (KL) divergence versus the reverse KL divergence with this approach, opening the door for new methodologies.

Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models

Authors:Ilyass Taouil, Haizhou Zhao, Angela Dai, Majid Khadiv
Date:2025-04-23 16:07:02

This paper uses the capabilities of latent diffusion models (LDMs) to generate realistic RGB human-object interaction scenes to guide humanoid loco-manipulation planning. To do so, we extract from the generated images both the contact locations and robot configurations that are then used inside a whole-body trajectory optimization (TO) formulation to generate physically consistent trajectories for humanoids. We validate our full pipeline in simulation for different long-horizon loco-manipulation scenarios and perform an extensive analysis of the proposed contact and robot configuration extraction pipeline. Our results show that using the information extracted from LDMs, we can generate physically consistent trajectories that require long-horizon reasoning.

Evaluating the Impact of CT-to-RED Calibration Curves on Dosimetric Accuracy in Brain Radiotherapy Dose Distribution

Authors:Hossam Donya, Duong Thanh Tai, Islam G. Ali
Date:2025-04-23 15:24:57

Accurate dose calculation is crucial in radiotherapy, as tissue relative electron densities (RED) derived from CT scans play a vital role. This study investigated the impact of different CT-to-RED calibration curves on brain cancer treatment plans. Three calibration curves were compared: CIRS phantom-derived, Catphan phantom-derived, and the default curve in the Monaco Treatment Planning System. Ten volumetric modulated arc therapy (VMAT) plans were generated and recalculated using each curve. Dosimetric parameters for Planning Target Volume (PTV) and Organs at Risk (OARs) were analyzed. Results showed significant differences in PTV dose distribution between the CIRS-derived and default curves, while no significant differences were found between Catphan-derived and default curves. The CIRS-derived curve demonstrated superior performance in representing brain tissue electron densities. These findings emphasize the importance of using site-specific CT-to-RED calibration curves for accurate dose calculations in brain radiotherapy, potentially improving treatment safety and efficacy

Credible plan-driven RAG method for Multi-hop Question Answering

Authors:Ningning Zhang, Chi Zhang, Zhizhong Tan, Xingxing Yang, Weiping Deng, Wenyong Wang
Date:2025-04-23 15:03:17

Multi-hop question answering (QA) presents a considerable challenge for Retrieval-Augmented Generation (RAG), requiring the structured decomposition of complex queries into logical reasoning paths and the generation of dependable intermediate results. However, deviations in reasoning paths or errors in intermediate results, which are common in current RAG methods, may propagate and accumulate throughout the reasoning process, diminishing the accuracy of the answer to complex queries. To address this challenge, we propose the Plan-then-Act-and-Review (PAR RAG) framework, which is organized into three key stages: planning, act, and review, and aims to offer an interpretable and incremental reasoning paradigm for accurate and reliable multi-hop question answering by mitigating error propagation.PAR RAG initially applies a top-down problem decomposition strategy, formulating a comprehensive plan that integrates multiple executable steps from a holistic viewpoint. This approach avoids the pitfalls of local optima common in traditional RAG methods, ensuring the accuracy of the entire reasoning path. Subsequently, PAR RAG incorporates a plan execution mechanism based on multi-granularity verification. By utilizing both coarse-grained similarity information and fine-grained relevant data, the framework thoroughly checks and adjusts intermediate results, ensuring process accuracy while effectively managing error propagation and amplification. Experimental results on multi-hop QA datasets demonstrate that the PAR RAG framework substantially outperforms existing state-of-the-art methods in key metrics, including EM and F1 scores.

STFM: A Spatio-Temporal Information Fusion Model Based on Phase Space Reconstruction for Sea Surface Temperature Prediction

Authors:Yin Wang, Chunlin Gong, Xiang Wu, Hanleran Zhang
Date:2025-04-23 14:14:59

The sea surface temperature (SST), a key environmental parameter, is crucial to optimizing production planning, making its accurate prediction a vital research topic. However, the inherent nonlinearity of the marine dynamic system presents significant challenges. Current forecasting methods mainly include physics-based numerical simulations and data-driven machine learning approaches. The former, while describing SST evolution through differential equations, suffers from high computational complexity and limited applicability, whereas the latter, despite its computational benefits, requires large datasets and faces interpretability challenges. This study presents a prediction framework based solely on data-driven techniques. Using phase space reconstruction, we construct initial-delay attractor pairs with a mathematical homeomorphism and design a Spatio-Temporal Fusion Mapping (STFM) to uncover their intrinsic connections. Unlike conventional models, our method captures SST dynamics efficiently through phase space reconstruction and achieves high prediction accuracy with minimal training data in comparative tests

MOSAIC: A Skill-Centric Algorithmic Framework for Long-Horizon Manipulation Planning

Authors:Itamar Mishani, Yorai Shaoul, Maxim Likhachev
Date:2025-04-23 14:09:42

Planning long-horizon motions using a set of predefined skills is a key challenge in robotics and AI. Addressing this challenge requires methods that systematically explore skill combinations to uncover task-solving sequences, harness generic, easy-to-learn skills (e.g., pushing, grasping) to generalize across unseen tasks, and bypass reliance on symbolic world representations that demand extensive domain and task-specific knowledge. Despite significant progress, these elements remain largely disjoint in existing approaches, leaving a critical gap in achieving robust, scalable solutions for complex, long-horizon problems. In this work, we present MOSAIC, a skill-centric framework that unifies these elements by using the skills themselves to guide the planning process. MOSAIC uses two families of skills: Generators compute executable trajectories and world configurations, and Connectors link these independently generated skill trajectories by solving boundary value problems, enabling progress toward completing the overall task. By breaking away from the conventional paradigm of incrementally discovering skills from predefined start or goal states--a limitation that significantly restricts exploration--MOSAIC focuses planning efforts on regions where skills are inherently effective. We demonstrate the efficacy of MOSAIC in both simulated and real-world robotic manipulation tasks, showcasing its ability to solve complex long-horizon planning problems using a diverse set of skills incorporating generative diffusion models, motion planning algorithms, and manipulation-specific models. Visit https://skill-mosaic.github.io for demonstrations and examples.

DYNUS: Uncertainty-aware Trajectory Planner in Dynamic Unknown Environments

Authors:Kota Kondo, Mason Peterson, Nicholas Rober, Juan Rached Viso, Lucas Jia, Jialin Chen, Harvey Merton, Jonathan P. How
Date:2025-04-23 14:05:04

This paper introduces DYNUS, an uncertainty-aware trajectory planner designed for dynamic unknown environments. Operating in such settings presents many challenges -- most notably, because the agent cannot predict the ground-truth future paths of obstacles, a previously planned trajectory can become unsafe at any moment, requiring rapid replanning to avoid collisions. Recently developed planners have used soft-constraint approaches to achieve the necessary fast computation times; however, these methods do not guarantee collision-free paths even with static obstacles. In contrast, hard-constraint methods ensure collision-free safety, but typically have longer computation times. To address these issues, we propose three key contributions. First, the DYNUS Global Planner (DGP) and Temporal Safe Corridor Generation operate in spatio-temporal space and handle both static and dynamic obstacles in the 3D environment. Second, the Safe Planning Framework leverages a combination of exploratory, safe, and contingency trajectories to flexibly re-route when potential future collisions with dynamic obstacles are detected. Finally, the Fast Hard-Constraint Local Trajectory Formulation uses a variable elimination approach to reduce the problem size and enable faster computation by pre-computing dependencies between free and dependent variables while still ensuring collision-free trajectories. We evaluated DYNUS in a variety of simulations, including dense forests, confined office spaces, cave systems, and dynamic environments. Our experiments show that DYNUS achieves a success rate of 100% and travel times that are approximately 25.0% faster than state-of-the-art methods. We also evaluated DYNUS on multiple platforms -- a quadrotor, a wheeled robot, and a quadruped -- in both simulation and hardware experiments.

PP-Tac: Paper Picking Using Tactile Feedback in Dexterous Robotic Hands

Authors:Pei Lin, Yuzhe Huang, Wanlin Li, Jianpeng Ma, Chenxi Xiao, Ziyuan Jiao
Date:2025-04-23 12:10:11

Robots are increasingly envisioned as human companions, assisting with everyday tasks that often involve manipulating deformable objects. Although recent advances in robotic hardware and embodied AI have expanded their capabilities, current systems still struggle with handling thin, flat, and deformable objects such as paper and fabric. This limitation arises from the lack of suitable perception techniques for robust state estimation under diverse object appearances, as well as the absence of planning techniques for generating appropriate grasp motions. To bridge these gaps, this paper introduces PP-Tac, a robotic system for picking up paper-like objects. PP-Tac features a multi-fingered robotic hand with high-resolution omnidirectional tactile sensors \sensorname. This hardware configuration enables real-time slip detection and online frictional force control that mitigates such slips. Furthermore, grasp motion generation is achieved through a trajectory synthesis pipeline, which first constructs a dataset of finger's pinching motions. Based on this dataset, a diffusion-based policy is trained to control the hand-arm robotic system. Experiments demonstrate that PP-Tac can effectively grasp paper-like objects of varying material, thickness, and stiffness, achieving an overall success rate of 87.5\%. To our knowledge, this work is the first attempt to grasp paper-like deformable objects using a tactile dexterous hand. Our project webpage can be found at: https://peilin-666.github.io/projects/PP-Tac/

Path Matters: Industrial Data Meet Quantum Optimization

Authors:Lukas Schmidbauer, Carlos A. Riofrío, Florian Heinrich, Vanessa Junk, Ulrich Schwenk, Thomas Husslein, Wolfgang Mauerer
Date:2025-04-23 10:45:38

Real-world optimization problems must undergo a series of transformations before becoming solvable on current quantum hardware. Even for a fixed problem, the number of possible transformation paths -- from industry-relevant formulations through binary constrained linear programs (BILPs), to quadratic unconstrained binary optimization (QUBO), and finally to a hardware-executable representation -- is remarkably large. Each step introduces free parameters, such as Lagrange multipliers, encoding strategies, slack variables, rounding schemes or algorithmic choices -- making brute-force exploration of all paths intractable. In this work, we benchmark a representative subset of these transformation paths using a real-world industrial production planning problem with industry data: the optimization of work allocation in a press shop producing vehicle parts. We focus on QUBO reformulations and algorithmic parameters for both quantum annealing (QA) and the Linear Ramp Quantum Approximate Optimization Algorithm (LR-QAOA). Our goal is to identify a reduced set of effective configurations applicable to similar industrial settings. Our results show that QA on D-Wave hardware consistently produces near-optimal solutions, whereas LR-QAOA on IBM quantum devices struggles to reach comparable performance. Hence, the choice of hardware and solver strategy significantly impacts performance. The problem formulation and especially the penalization strategy determine the solution quality. Most importantly, mathematically-defined penalization strategies are equally successful as hand-picked penalty factors, paving the way for automated QUBO formulation. Moreover, we observe a strong correlation between simulated and quantum annealing performance metrics, offering a scalable proxy for predicting QA behavior on larger problem instances.

Structuring Competency-Based Courses Through Skill Trees

Authors:Hildo Bijl
Date:2025-04-23 10:44:36

Computer science education has seen two important trends. One has been a shift from raw theory towards skills: competency-based teaching. Another has been increasing student numbers, with as a result more automation in teaching. When automating education, it is crucial to properly structure courses, both to manage digitalized educational resources and to facilitate automated coaching algorithms. Currently existing structuring methodologies are focused around theory and not around skills, and are incapable of modeling the dependency links between skills. Because of this, a new didactic framework is needed. This paper presents a new method of structuring educational contents around skills: something that a student is expected to be able to do. It defines Skill Trees that show dependencies between skills, and subsequently couples these to Concept Trees that contain intuitive ideas/notional machines. Due to the algorithmic nature of computer science, this step-wise approach is especially well-suited to this field of education. Next to formal definitions on Skill Trees and Concept Trees, guidelines are given on how to design them and how to plan a course using them. The Skill Trees framework has been applied to improve the structure of a university database course. Student interviews indicated reduced confusion/stress and less study time required for students to meet their desired skill level.

Partitioning of multiple brain metastases improves dose gradients in single-isocenter radiosurgery

Authors:Johan Sundström, Anton Finnson, Elin Hynning, Geert De Kerf, Albin Fredriksson
Date:2025-04-23 09:02:57

Background: A growing number of cancer patients with brain metastases can benefit from stereotactic radiosurgery (SRS) thanks to recent advances in systemic therapies. With an increasing patient load, single-isocenter treatments on widely available C-arm linear accelerators are an attractive option. However, the planning of such treatments is challenging for multi-target cases due to the island blocking problem, which occurs when the multi-leaf collimator cannot conform to all targets simultaneously. Purpose: We propose a multi-target partitioning algorithm that mitigates excessive exposure of normal tissue caused by the island blocking problem. Methods: The algorithm divides (partitions) the set of targets into subsets to treat with separate arc passes, optimizing both subsets and collimator angles to minimize island blocking. The algorithm was incorporated into a fully automated treatment planning script and evaluated on 20 simulated patient cases, each with 10 brain metastases and 21 Gy prescriptions. It was also retrospectively evaluated on six clinical cases. Results: Partitioning significantly improved the gradient index, global efficiency index, and brain V12Gy compared to simultaneous treatment of all metastases. For example, the average gradient index improved from 5.9 to 3.3, global efficiency index from 0.32 to 0.46, and normal brain V12Gy from 49 cm3 to 26 cm3 between 3 and 9 arcs. The proposed algorithm outperformed baselines in utilizing a limited number of arcs. All target partitioning strategies increased the total number of monitor units (MUs). Conclusions: The dose gradient in single-isocenter VMAT plans can be substantially improved by treating a smaller subset of metastases at a time. This requires more MUs and arcs, implying a trade-off between delivery time and plan quality which can be explored using the algorithm proposed in this paper.

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance

Authors:Meng Chu, Yukang Chen, Haokun Gui, Shaozuo Yu, Yi Wang, Jiaya Jia
Date:2025-04-23 08:32:25

Tourism and travel planning increasingly rely on digital assistance, yet existing multimodal AI systems often lack specialized knowledge and contextual understanding of urban environments. We present TraveLLaMA, a specialized multimodal language model designed for urban scene understanding and travel assistance. Our work addresses the fundamental challenge of developing practical AI travel assistants through a novel large-scale dataset of 220k question-answer pairs. This comprehensive dataset uniquely combines 130k text QA pairs meticulously curated from authentic travel forums with GPT-enhanced responses, alongside 90k vision-language QA pairs specifically focused on map understanding and scene comprehension. Through extensive fine-tuning experiments on state-of-the-art vision-language models (LLaVA, Qwen-VL, Shikra), we demonstrate significant performance improvements ranging from 6.5\%-9.4\% in both pure text travel understanding and visual question answering tasks. Our model exhibits exceptional capabilities in providing contextual travel recommendations, interpreting map locations, and understanding place-specific imagery while offering practical information such as operating hours and visitor reviews. Comparative evaluations show TraveLLaMA significantly outperforms general-purpose models in travel-specific tasks, establishing a new benchmark for multi-modal travel assistance systems.

Circinus: Efficient Query Planner for Compound ML Serving

Authors:Banruo Liu, Wei-Yu Lin, Minghao Fang, Yihan Jiang, Fan Lai
Date:2025-04-23 03:57:24

The rise of compound AI serving -- integrating multiple operators in a pipeline that may span edge and cloud tiers -- enables end-user applications such as autonomous driving, generative AI-powered meeting companions, and immersive gaming. Achieving high service goodput -- i.e., meeting service level objectives (SLOs) for pipeline latency, accuracy, and costs -- requires effective planning of operator placement, configuration, and resource allocation across infrastructure tiers. However, the diverse SLO requirements, varying edge capabilities, and high query volumes create an enormous planning search space, rendering current solutions fundamentally limited for real-time serving and cost-efficient deployments. This paper presents Circinus, an SLO-aware query planner for large-scale compound AI workloads. Circinus novelly decomposes multi-query planning and multi-dimensional SLO objectives while preserving global decision quality. By exploiting plan similarities within and across queries, it significantly reduces search steps. It further improves per-step efficiency with a precision-aware plan profiler that incrementally profiles and strategically applies early stopping based on imprecise estimates of plan performance. At scale, Circinus selects query-plan combinations to maximize global SLO goodput. Evaluations in real-world settings show that Circinus improves service goodput by 3.2-5.0$\times$, accelerates query planning by 4.2-5.8$\times$, achieving query response in seconds, while reducing deployment costs by 3.2-4.0$\times$ over state of the arts even in their intended single-tier deployments.