planning - 2025-08-19

Human Digital Twin: Data, Models, Applications, and Challenges

Authors:Rong Pan, Hongyue Sun, Xiaoyu Chen, Giulia Pedrielli, Jiapeng Huang
Date:2025-08-18 17:50:25

Human digital twins (HDTs) are dynamic, data-driven virtual representations of individuals, continuously updated with multimodal data to simulate, monitor, and predict health trajectories. By integrating clinical, physiological, behavioral, and environmental inputs, HDTs enable personalized diagnostics, treatment planning, and anomaly detection. This paper reviews current approaches to HDT modeling, with a focus on statistical and machine learning techniques, including recent advances in anomaly detection and failure prediction. It also discusses data integration, computational methods, and ethical, technological, and regulatory challenges in deploying HDTs for precision healthcare.

Contrastive Representations for Temporal Reasoning

Authors:Alicja Ziarko, Michal Bortkiewicz, Michal Zawalski, Benjamin Eysenbach, Piotr Milos
Date:2025-08-18 17:20:08

In classical AI, perception relies on learning state-based representations, while planning, which can be thought of as temporal reasoning over action sequences, is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both perceptual and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure due to its reliance on spurious features. To address this, we introduce Combinatorial Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik's Cube. In particular, for the Rubik's Cube, CRTR learns representations that generalize across all initial states and allow it to solve the puzzle using fewer search steps than BestFS, though with longer solutions. To our knowledge, this is the first method that efficiently solves arbitrary Cube states using only learned representations, without relying on an external search algorithm.

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Authors:Rui Shao, Wei Li, Lingsen Zhang, Renshan Zhang, Zhiyang Liu, Ran Chen, Liqiang Nie
Date:2025-08-18 16:45:48

Robotic manipulation, a key frontier in robotics and embodied AI, requires precise motor control and multimodal understanding, yet traditional rule-based methods fail to scale or generalize in unstructured, novel environments. In recent years, Vision-Language-Action (VLA) models, built upon Large Vision-Language Models (VLMs) pretrained on vast image-text datasets, have emerged as a transformative paradigm. This survey provides the first systematic, taxonomy-oriented review of large VLM-based VLA models for robotic manipulation. We begin by clearly defining large VLM-based VLA models and delineating two principal architectural paradigms: (1) monolithic models, encompassing single-system and dual-system designs with differing levels of integration; and (2) hierarchical models, which explicitly decouple planning from execution via interpretable intermediate representations. Building on this foundation, we present an in-depth examination of large VLM-based VLA models: (1) integration with advanced domains, including reinforcement learning, training-free optimization, learning from human videos, and world model integration; (2) synthesis of distinctive characteristics, consolidating architectural traits, operational strengths, and the datasets and benchmarks that support their development; (3) identification of promising directions, including memory mechanisms, 4D perception, efficient adaptation, multi-agent cooperation, and other emerging capabilities. This survey consolidates recent advances to resolve inconsistencies in existing taxonomies, mitigate research fragmentation, and fill a critical gap through the systematic integration of studies at the intersection of large VLMs and robotic manipulation. We provide a regularly updated project page to document ongoing progress: https://github.com/JiuTian-VL/Large-VLM-based-VLA-for-Robotic-Manipulation.

Reinforced Context Order Recovery for Adaptive Reasoning and Planning

Authors:Long Ma, Fangwei Zhong, Yizhou Wang
Date:2025-08-18 16:42:55

Modern causal language models, followed by rapid developments in discrete diffusion models, can now produce a wide variety of interesting and useful content. However, these families of models are predominantly trained to output tokens with a fixed (left-to-right) or random order, which may deviate from the logical order in which tokens are generated originally. In this paper, we observe that current causal and diffusion models encounter difficulties in problems that require adaptive token generation orders to solve tractably, which we characterize with the $\mathcal{V}$-information framework. Motivated by this, we propose Reinforced Context Order Recovery (ReCOR), a reinforcement-learning-based framework to extract adaptive, data-dependent token generation orders from text data without annotations. Self-supervised by token prediction statistics, ReCOR estimates the hardness of predicting every unfilled token and adaptively selects the next token during both training and inference. Experiments on challenging reasoning and planning datasets demonstrate the superior performance of ReCOR compared with baselines, sometimes outperforming oracle models supervised with the ground-truth order.

Hierarchical Evaluation Function (HEF): A Multi-Metric Approach for Optimizing Demand Forecasting Models

Authors:Adolfo González, Víctor Parada
Date:2025-08-18 16:25:49

Demand forecasting is essential for strategic planning in competitive environments, enabling resource optimization and improved responsiveness to market dynamics. However, multivariate time series modeling faces challenges due to data complexity, uncertainty, and frequent regime shifts. Traditional evaluation metrics can introduce biases and limit generalization. This work compares two custom evaluation functions: FMAE (Focused Mean Absolute Error), focused on minimizing absolute errors, and HEF (Hierarchical Evaluation Function), designed to weight global metrics and penalize large deviations. Experiments were conducted under different data splits (91:9, 80:20, 70:30) using three optimizers (Grid Search, PSO, Optuna), assessing fit, relative accuracy, robustness, and computational efficiency. Results show that HEF consistently outperforms FMAE in global metrics (R2, Relative Accuracy, RMSE, RMSSE), enhancing model robustness and explanatory power. These findings were confirmed via visualizations and statistical tests. Conversely, FMAE offers advantages in local metrics (MAE, MASE) and execution time, making it suitable for short-term scenarios. The study highlights a methodological trade-off: HEF is ideal for strategic planning, while FMAE is better suited for operational efficiency. A replicable framework is proposed for optimizing predictive models in dynamic environments.

BOW: Bayesian Optimization over Windows for Motion Planning in Complex Environments

Authors:Sourav Raxit, Abdullah Al Redwan Newaz, Paulo Padrao, Jose Fuentes, Leonardo Bobadilla
Date:2025-08-18 16:19:28

This paper introduces the BOW Planner, a scalable motion planning algorithm designed to navigate robots through complex environments using constrained Bayesian optimization (CBO). Unlike traditional methods, which often struggle with kinodynamic constraints such as velocity and acceleration limits, the BOW Planner excels by concentrating on a planning window of reachable velocities and employing CBO to sample control inputs efficiently. This approach enables the planner to manage high-dimensional objective functions and stringent safety constraints with minimal sampling, ensuring rapid and secure trajectory generation. Theoretical analysis confirms the algorithm's asymptotic convergence to near-optimal solutions, while extensive evaluations in cluttered and constrained settings reveal substantial improvements in computation times, trajectory lengths, and solution times compared to existing techniques. Successfully deployed across various real-world robotic systems, the BOW Planner demonstrates its practical significance through exceptional sample efficiency, safety-aware optimization, and rapid planning capabilities, making it a valuable tool for advancing robotic applications. The BOW Planner is released as an open-source package and videos of real-world and simulated experiments are available at https://bow-web.github.io.

On the complexity of constrained reconfiguration and motion planning

Authors:Nicolas Bousquet, Remy El Sabeh, Amer E. Mouawad, Naomi Nishimura
Date:2025-08-18 15:50:57

Coordinating the motion of multiple agents in constrained environments is a fundamental challenge in robotics, motion planning, and scheduling. A motivating example involves $n$ robotic arms, each represented as a line segment. The objective is to rotate each arm to its vertical orientation, one at a time (clockwise or counterclockwise), without collisions nor rotating any arm more than once. This scenario is an example of the more general $k$-Compatible Ordering problem, where $n$ agents, each capable of $k$ state-changing actions, must transition to specific target states under constraints encoded as a set $\mathcal{G}$ of $k$ pairs of directed graphs. We show that $k$-Compatible Ordering is $\mathsf{NP}$-complete, even when $\mathcal{G}$ is planar, degenerate, or acyclic. On the positive side, we provide polynomial-time algorithms for cases such as when $k = 1$ or $\mathcal{G}$ has bounded treewidth. We also introduce generalized variants supporting multiple state-changing actions per agent, broadening the applicability of our framework. These results extend to a wide range of scheduling, reconfiguration, and motion planning applications in constrained environments.

PC-Sampler: Position-Aware Calibration of Decoding Bias in Masked Diffusion Models

Authors:Pengcheng Huang, Shuhao Liu, Zhenghao Liu, Yukun Yan, Shuo Wang, Zulong Chen, Tong Xiao
Date:2025-08-18 15:38:37

Recent advances in masked diffusion models (MDMs) have established them as powerful non-autoregressive alternatives for sequence generation. Nevertheless, our preliminary experiments reveal that the generation quality of MDMs is still highly sensitive to the choice of decoding strategy. In particular, widely adopted uncertainty-based samplers suffer from two key limitations: a lack of global trajectory control and a pronounced bias toward trivial tokens in the early stages of decoding. These shortcomings restrict the full potential of MDMs. In this work, we introduce Position-Aware Confidence-Calibrated Sampling (PC-Sampler), a novel decoding strategy that unifies global trajectory planning with content-aware informativeness maximization. PC-Sampler incorporates a position-aware weighting mechanism to regulate the decoding path and a calibrated confidence score to suppress the premature selection of trivial tokens. Extensive experiments on three advanced MDMs across seven challenging benchmarks-including logical reasoning and planning tasks-demonstrate that PC-Sampler consistently outperforms existing MDM decoding strategies by more than 10% on average, significantly narrowing the performance gap with state-of-the-art autoregressive models. All codes are available at https://github.com/NEUIR/PC-Sampler.

Vitamin N: Benefits of Different Forms of Public Greenery for Urban Health

Authors:Sanja Šćepanović, Sagar Joglekar, Stephen Law, Daniele Quercia, Ke Zhou, Alice Battiston, Rossano Schifanella
Date:2025-08-18 15:17:33

Urban greenery is often linked to better health, yet findings from past research have been inconsistent. One reason is that official greenery metrics measure the amount or nearness of greenery but ignore how often people actually may potentially see or use it in daily life. To address this gap, we introduced a new classification that separates on-road greenery, which people see while walking through streets, from off-road greenery, which requires planned visits. We did so by combining aerial imagery of Greater London and greenery data from OpenStreetMap with quantified greenery from over 100,000 Google Street View images and accessibility estimates based on 160,000 road segments. We linked these measures to 7.45 billion medical prescriptions issued by the National Health Service and processed through our methodology. These prescriptions cover five conditions: diabetes, hypertension, asthma, depression, and anxiety, as well as opioid use. As hypothesized, we found that green on-road was more strongly linked to better health than four widely used official measures. For example, hypertension prescriptions dropped by 3.68% in wards with on-road greenery above the median citywide level compared to those below it. If all below-median wards reached the citywide median in on-road greenery, prescription costs could fall by up to {\pounds}3.15 million each year. These results suggest that greenery seen in daily life may be more relevant than public yet secluded greenery, and that official metrics commonly used in the literature have important limitations.

Scaling Whole-body Multi-contact Manipulation with Contact Optimization

Authors:Victor Levé, João Moura, Sachiya Fujita, Tamon Miyake, Steve Tonneau, Sethu Vijayakumar
Date:2025-08-18 14:56:37

Daily tasks require us to use our whole body to manipulate objects, for instance when our hands are unavailable. We consider the issue of providing humanoid robots with the ability to autonomously perform similar whole-body manipulation tasks. In this context, the infinite possibilities for where and how contact can occur on the robot and object surfaces hinder the scalability of existing planning methods, which predominantly rely on discrete sampling. Given the continuous nature of contact surfaces, gradient-based optimization offers a more suitable approach for finding solutions. However, a key remaining challenge is the lack of an efficient representation of robot surfaces. In this work, we propose (i) a representation of robot and object surfaces that enables closed-form computation of proximity points, and (ii) a cost design that effectively guides whole-body manipulation planning. Our experiments demonstrate that the proposed framework can solve problems unaddressed by existing methods, and achieves a 77% improvement in planning time over the state of the art. We also validate the suitability of our approach on real hardware through the whole-body manipulation of boxes by a humanoid robot.

Multi-Phase Automated Segmentation of Dental Structures in CBCT Using a Lightweight Auto3DSeg and SegResNet Implementation

Authors:Dominic LaBella, Keshav Jha, Jared Robbins, Esther Yu
Date:2025-08-18 14:35:26

Cone-beam computed tomography (CBCT) has become an invaluable imaging modality in dentistry, enabling 3D visualization of teeth and surrounding structures for diagnosis and treatment planning. Automated segmentation of dental structures in CBCT can efficiently assist in identifying pathology (e.g., pulpal or periapical lesions) and facilitate radiation therapy planning in head and neck cancer patients. We describe the DLaBella29 team's approach for the MICCAI 2025 ToothFairy3 Challenge, which involves a deep learning pipeline for multi-class tooth segmentation. We utilized the MONAI Auto3DSeg framework with a 3D SegResNet architecture, trained on a subset of the ToothFairy3 dataset (63 CBCT scans) with 5-fold cross-validation. Key preprocessing steps included image resampling to 0.6 mm isotropic resolution and intensity clipping. We applied an ensemble fusion using Multi-Label STAPLE on the 5-fold predictions to infer a Phase 1 segmentation and then conducted tight cropping around the easily segmented Phase 1 mandible to perform Phase 2 segmentation on the smaller nerve structures. Our method achieved an average Dice of 0.87 on the ToothFairy3 challenge out-of-sample validation set. This paper details the clinical context, data preparation, model development, results of our approach, and discusses the relevance of automated dental segmentation for improving patient care in radiation oncology.

Simultaneous Contact Sequence and Patch Planning for Dynamic Locomotion

Authors:Victor Dhédin, Haizhou Zhao, Majid Khadiv
Date:2025-08-18 13:53:38

Legged robots have the potential to traverse highly constrained environments with agile maneuvers. However, planning such motions requires solving a highly challenging optimization problem with a mixture of continuous and discrete decision variables. In this paper, we present a full pipeline based on Monte-Carlo tree search (MCTS) and whole-body trajectory optimization (TO) to perform simultaneous contact sequence and patch selection on highly challenging environments. Through extensive simulation experiments, we show that our framework can quickly find a diverse set of dynamically consistent plans. We experimentally show that these plans are transferable to a real quadruped robot. We further show that the same framework can find highly complex acyclic humanoid maneuvers. To the best of our knowledge, this is the first demonstration of simultaneous contact sequence and patch selection for acyclic multi-contact locomotion using the whole-body dynamics of a quadruped.

Evaluating the Quality of Open Building Datasets for Mapping Urban Inequality: A Comparative Analysis Across 5 Cities

Authors:Franz Okyere, Meng Lu, Ansgar Brunn
Date:2025-08-18 12:14:57

While informal settlements lack focused development and are highly dynamic, the quality of spatial data for these places may be uncertain. This study evaluates the quality and biases of AI-generated Open Building Datasets (OBDs) generated by Google and Microsoft against OpenStreetMap (OSM) data, across diverse global cities including Accra, Nairobi, Caracas, Berlin, and Houston. The Intersection over Union (IoU), overlap analysis and a positional accuracy algorithm are used to analyse the similarity and alignment of the datasets. The paper also analyses the size distribution of the building polygon area, and completeness using predefined but regular spatial units. The results indicate significant variance in data quality, with Houston and Berlin demonstrating high alignment and completeness, reflecting their structured urban environments. There are gaps in the datasets analysed, and cities like Accra and Caracas may be under-represented. This could highlight difficulties in capturing complex or informal regions. The study also notes different building size distributions, which may be indicative of the global socio-economic divide. These findings may emphasise the need to consider the quality of global building datasets to avoid misrepresentation, which is an important element of planning and resource distribution.

CAMAR: Continuous Actions Multi-Agent Routing

Authors:Artem Pshenitsyn, Aleksandr Panov, Alexey Skrynnik
Date:2025-08-18 11:32:26

Multi-agent reinforcement learning (MARL) is a powerful paradigm for solving cooperative and competitive decision-making problems. While many MARL benchmarks have been proposed, few combine continuous state and action spaces with challenging coordination and planning tasks. We introduce CAMAR, a new MARL benchmark designed explicitly for multi-agent pathfinding in environments with continuous actions. CAMAR supports cooperative and competitive interactions between agents and runs efficiently at up to 100,000 environment steps per second. We also propose a three-tier evaluation protocol to better track algorithmic progress and enable deeper analysis of performance. In addition, CAMAR allows the integration of classical planning methods such as RRT and RRT* into MARL pipelines. We use them as standalone baselines and combine RRT* with popular MARL algorithms to create hybrid approaches. We provide a suite of test scenarios and benchmarking tools to ensure reproducibility and fair comparison. Experiments show that CAMAR presents a challenging and realistic testbed for the MARL community.

Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics

Authors:Giovanni Briglia, Francesco Fabiano, Stefano Mariani
Date:2025-08-18 11:26:20

Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing significant improvements in the scalability of multi-agent epistemic planning.

The Geometry of Motion, Vol. I: Mechanics as Geometries

Authors:Patrick Iglesias-Zemmour
Date:2025-08-18 10:46:30

This work presents a group-theoretic interpretation of the historical evolution of mechanics, proposing that each fundamental theory of motion corresponds to a distinct geometry in the sense of Felix Klein. The character of each geometry is uniquely determined by its Inertia Group-the group of spacetime transformations that preserves its privileged class of inertial motions. We trace the three major epistemological ruptures in the history of mechanics by translating foundational physical principles into the unambiguous language of group theory. The analysis begins with Aristotelian mechanics, where absolute Space and Time are shown to be homogeneous spaces of the Group of Aristotle, defined by its preservation of rest. The second rupture, driven by the relativity of motion (Bruno, Galileo), leads to the abandonment of absolute Space and the construction of the Galilean Group, which preserves the class of uniform rectilinear motions. The final rupture, precipitated by the crisis in electromagnetism, results in the dissolution of absolute Time and the emergence of the Poincar\'e Group, preserving affine lines and the Minkowski metric. Central results of this approach are theorems demonstrating, with mathematical certainty, the non-existence of a Galilean-invariant "Space" and a Poincar\'e-invariant "Time," where invariance is defined with respect to the action of the corresponding inertia group. This geometric framework provides a unified perspective on the transition from classical to modern physics and allows for a rigorous distinction between 'primary' epistemological ruptures, which alter the Inertia Group itself, and 'secondary' ruptures, which introduce new formalisms for the management of dynamics within a stable geometry.

Relativistic atomic structure calculations in support of spectroscopy

Authors:L. F. Pašteka, E. Eliav, M. L. Reitsma, A. Borschevsky
Date:2025-08-18 09:23:27

Theory can provide important support at all the stages of spectroscopic experiments, from planning the measurements to the interpretation of the results. Such support is particularly valuable for the challenging experiments on heavy, unstable, and superheavy elements and for precision measurements aimed at testing the Standard Model of particle physics. To be reliable and useful in experimental context, theoretical predictions should be based on high-accuracy calculations. For heavy elements, such calculations must treat both relativistic effects and electron correlation on the highest possible level. Relativistic coupled cluster is considered one of the most powerful methods for accurate calculations on heavy many-electron atoms and molecules. This approach is highly accurate and versatile and can be used to obtain energies and a variety of atomic and molecular properties. Furthermore, its robust and transparent formulation allows for systematic improvement of the accuracy of the calculated results and for assigning uncertainties on theoretical values. The Fock-space coupled cluster (FSCC) variant of this method is particularly useful in the context of spectroscopic measurements as it provides access to atomic spectra and properties of the excited states. In this review, we present in detail the relativistic coupled cluster approach and its FSCC variant. We provide a description of the computational procedure used for accurate calculations and for assigning uncertainties. Outstanding recent examples of application to atomic properties, focusing on the experimental context are presented. Finally, we provide a brief discussion of the perspectives for future developments and applications of the CC approach.

MCTR: Midpoint Corrected Triangulation for Autonomous Racing via Digital Twin Simulation in CARLA

Authors:Junhao Ye, Cheng Hu, Yiqin Wang, Weizhan Huang, Nicolas Baumann, Jie He, Meixun Qu, Lei Xie, Hongye Su
Date:2025-08-18 08:53:07

In autonomous racing, reactive controllers eliminate the computational burden of the full See-Think-Act autonomy stack by directly mapping sensor inputs to control actions. This bypasses the need for explicit localization and trajectory planning. A widely adopted baseline in this category is the Follow-The-Gap method, which performs trajectory planning using LiDAR data. Building on FTG, the Delaunay Triangulation-based Racing algorithm introduces further enhancements. However, DTR's use of circumcircles for trajectory generation often results in insufficiently smooth paths, ultimately degrading performance. Additionally, the commonly used F1TENTH-simulator for autonomous racing competitions lacks support for 3D LiDAR perception, limiting its effectiveness in realistic testing. To address these challenges, this work proposes the MCTR algorithm. MCTR improves trajectory smoothness through the use of Curvature Corrected Moving Average and implements a digital twin system within the CARLA simulator to validate the algorithm's robustness under 3D LiDAR perception. The proposed algorithm has been thoroughly validated through both simulation and real-world vehicle experiments.

The Maximum Coverage Model and Recommendation System for UAV Vertiports Location Planning

Authors:Chunliang Hua, Xiao Hu, Jiayang Sun, Zeyuan Yang
Date:2025-08-18 06:31:08

As urban aerial mobility (UAM) infrastructure development accelerates globally, cities like Shenzhen are planning large-scale vertiport networks (e.g., 1,200+ facilities by 2026). Existing planning frameworks remain inadequate for this complexity due to historical limitations in data granularity and real-world applicability. This paper addresses these gaps by first proposing the Capacitated Dynamic Maximum Covering Location Problem (CDMCLP), a novel optimization framework that simultaneously models urban-scale spatial-temporal demand, heterogeneous user behaviors, and infrastructure capacity constraints. Building on this foundation, we introduce an Integrated Planning Recommendation System that combines CDMCLP with socio-economic factors and dynamic clustering initialization. This system leverages adaptive parameter tuning based on empirical user behavior to generate practical planning solutions. Validation in a Chinese center city demonstrates the effectiveness of the new optimization framework and recommendation system. Under the evaluation and optimization of CDMCLP, the quantitative performance of traditional location methods are exposed and can be improved by 38\%--52\%, while the recommendation system shows user-friendliness and the effective integration of complex elements. By integrating mathematical rigor with practical implementation considerations, this hybrid approach bridges the gap between theoretical location modeling and real-world UAM infrastructure planning, offering municipalities a pragmatic tool for vertiport network design.

ViLaD: A Large Vision Language Diffusion Framework for End-to-End Autonomous Driving

Authors:Can Cui, Yupeng Zhou, Juntong Peng, Sung-Yeon Park, Zichong Yang, Prashanth Sankaranarayanan, Jiaru Zhang, Ruqi Zhang, Ziran Wang
Date:2025-08-18 04:01:56

End-to-end autonomous driving systems built on Vision Language Models (VLMs) have shown significant promise, yet their reliance on autoregressive architectures introduces some limitations for real-world applications. The sequential, token-by-token generation process of these models results in high inference latency and cannot perform bidirectional reasoning, making them unsuitable for dynamic, safety-critical environments. To overcome these challenges, we introduce ViLaD, a novel Large Vision Language Diffusion (LVLD) framework for end-to-end autonomous driving that represents a paradigm shift. ViLaD leverages a masked diffusion model that enables parallel generation of entire driving decision sequences, significantly reducing computational latency. Moreover, its architecture supports bidirectional reasoning, allowing the model to consider both past and future simultaneously, and supports progressive easy-first generation to iteratively improve decision quality. We conduct comprehensive experiments on the nuScenes dataset, where ViLaD outperforms state-of-the-art autoregressive VLM baselines in both planning accuracy and inference speed, while achieving a near-zero failure rate. Furthermore, we demonstrate the framework's practical viability through a real-world deployment on an autonomous vehicle for an interactive parking task, confirming its effectiveness and soundness for practical applications.

The highest mass Kepler red giants -- II. Spectroscopic parameters, the amplitude-activity relation, and unexpected halo orbits

Authors:Courtney L. Crawford, Yaguang Li, Daniel Huber, Jie Yu, Timothy R. Bedding, Sarah L. Martell, Benjamin T. Montet, Dennis Stello, Howard Isaacson, Andrew W. Howard, Benjamin J. Fulton, Jingwen Zhang, Alex S. Polanski, Lauren M. Weiss
Date:2025-08-18 02:41:53

The high-mass (M$>$2 \Msolar{}) Kepler red giant stars are less well-studied than their lower-mass counterparts. In the previous article, we presented a sample of 48 high-mass Kepler red giants and measured their asteroseismic parameters. This article presents spectroscopic measurements from the same sample, using high-resolution Keck/HIRES spectra to determine \Teff{}, [Fe/H], \logg{}, and $v \sin i$. We refined our previous estimates of the stellar masses and radii based on the new \Teff{}. We also examined spectral features that could indicate binary activity, such as the Li line and [C/N] ratios. We found no Li-rich stars or clear [C/N] anomalies, but we observed a correlation between [C/N] and [Fe/H]. We measured chromospheric activity using the $S$-index of the Ca II H \& K lines and found no correlation with internal magnetic fields. However, we confirmed an anti-correlation between surface chromospheric activity and radial mode oscillation amplitudes, which indicates that strong surface magnetic fields weaken stellar oscillations. Finally, we used the Gaia DR3 astrometric data to show that our sample of stars have orbits consistent with all three Galactic kinematic regions. Although these stars are quite young, their orbits carry them into the thick disk and even the halo, raising questions about the accuracy and viability of kinematics in unravelling Galactic history. In future work, we plan to use the spectroscopic parameters measured here to provide better constraints for boutique frequency modelling, which will allow us to test the asteroseismic scaling relations at the high-mass regime.

Techno-Economic Planning of Spatially-Resolved Battery Storage Systems in Renewable-Dominant Grids Under Weather Variability

Authors:Seyed Ehsan Ahmadi, Elnaz Kabir, Mohammad Fattahi, Mousa Marzband, Dongjun Li
Date:2025-08-17 23:33:36

The ongoing energy transition is significantly increasing the share of renewable energy sources (RES) in power systems; however, their intermittency and variability pose substantial challenges, including load shedding and system congestion. This study examines the role of the battery storage system (BSS) in mitigating these challenges by balancing power supply and demand. We optimize the location, size, and type of batteries using a two-stage stochastic program, with the second stage involving hourly operational decisions over an entire year. Unlike previous research, we incorporate the comprehensive technical and economic characteristics of battery technologies. The New York State (NYS) power system, currently undergoing a significant shift towards increased RES generation, serves as our case study. Using available load and weather data from 1980-2019, we account for the uncertainty of both load and RES generation through a sample average approximation approach. Our findings indicate that BSS can reduce renewable curtailment by 34% and load shedding by 21%, contributing to a more resilient power system in achieving NYS 2030 energy targets. Furthermore, the cost of employing BSS for the reduction of load shedding and RES curtailment does not increase linearly with additional capacity, revealing a complex relationship between costs and renewable penetration. This study provides valuable insights for the strategic BSS deployment to achieve a cost-effective and reliable power system in the energy transition as well as the feasibility of the NYS 2030 energy targets.

An Initial Study of Bird's-Eye View Generation for Autonomous Vehicles using Cross-View Transformers

Authors:Felipe Carlos dos Santos, Eric Aislan Antonelo, Gustavo Claudio Karl Couto
Date:2025-08-17 23:05:00

Bird's-Eye View (BEV) maps provide a structured, top-down abstraction that is crucial for autonomous-driving perception. In this work, we employ Cross-View Transformers (CVT) for learning to map camera images to three BEV's channels - road, lane markings, and planned trajectory - using a realistic simulator for urban driving. Our study examines generalization to unseen towns, the effect of different camera layouts, and two loss formulations (focal and L1). Using training data from only a town, a four-camera CVT trained with the L1 loss delivers the most robust test performance, evaluated in a new town. Overall, our results underscore CVT's promise for mapping camera inputs to reasonably accurate BEV maps.

An Introduction to Sliced Optimal Transport

Authors:Khai Nguyen
Date:2025-08-17 22:53:19

Sliced Optimal Transport (SOT) is a rapidly developing branch of optimal transport (OT) that exploits the tractability of one-dimensional OT problems. By combining tools from OT, integral geometry, and computational statistics, SOT enables fast and scalable computation of distances, barycenters, and kernels for probability measures, while retaining rich geometric structure. This paper provides a comprehensive review of SOT, covering its mathematical foundations, methodological advances, computational methods, and applications. We discuss key concepts of OT and one-dimensional OT, the role of tools from integral geometry such as Radon transform in projecting measures, and statistical techniques for estimating sliced distances. The paper further explores recent methodological advances, including non-linear projections, improved Monte Carlo approximations, statistical estimation techniques for one-dimensional optimal transport, weighted slicing techniques, and transportation plan estimation methods. Variational problems, such as minimum sliced Wasserstein estimation, barycenters, gradient flows, kernel constructions, and embeddings are examined alongside extensions to unbalanced, partial, multi-marginal, and Gromov-Wasserstein settings. Applications span machine learning, statistics, computer graphics and computer visions, highlighting SOT's versatility as a practical computational tool. This work will be of interest to researchers and practitioners in machine learning, data sciences, and computational disciplines seeking efficient alternatives to classical OT.

Machine Learning-Based Manufacturing Cost Prediction from 2D Engineering Drawings via Geometric Features

Authors:Ahmet Bilal Arıkan, Şener Özönder, Mustafa Taha Koçyiğit, Hüseyin Oktay Altun, H. Kübra Küçükkartal, Murat Arslanoğlu, Fatih Çağırankaya, Berk Ayvaz
Date:2025-08-17 17:16:38

We present an integrated machine learning framework that transforms how manufacturing cost is estimated from 2D engineering drawings. Unlike traditional quotation workflows that require labor-intensive process planning, our approach about 200 geometric and statistical descriptors directly from 13,684 DWG drawings of automotive suspension and steering parts spanning 24 product groups. Gradient-boosted decision tree models (XGBoost, CatBoost, LightGBM) trained on these features achieve nearly 10% mean absolute percentage error across groups, demonstrating robust scalability beyond part-specific heuristics. By coupling cost prediction with explainability tools such as SHAP, the framework identifies geometric design drivers including rotated dimension maxima, arc statistics and divergence metrics, offering actionable insights for cost-aware design. This end-to-end CAD-to-cost pipeline shortens quotation lead times, ensures consistent and transparent cost assessments across part families and provides a deployable pathway toward real-time, ERP-integrated decision support in Industry 4.0 manufacturing environments.

Geodesic Tracing-Based Kinematic Integration of Rolling and Sliding Contact on Manifold Meshes for Dexterous In-Hand Manipulation

Authors:Sunyu Wang, Arjun S. Lakshmipathy, Jean Oh, Nancy S. Pollard
Date:2025-08-17 17:13:25

Reasoning about rolling and sliding contact, or roll-slide contact for short, is critical for dexterous manipulation tasks that involve intricate geometries. But existing works on roll-slide contact mostly focus on continuous shapes with differentiable parametrizations. This work extends roll-slide contact modeling to manifold meshes. Specifically, we present an integration scheme based on geodesic tracing to first-order time-integrate roll-slide contact directly on meshes, enabling dexterous manipulation to reason over high-fidelity discrete representations of an object's true geometry. Using our method, we planned dexterous motions of a multi-finger robotic hand manipulating five objects in-hand in simulation. The planning was achieved with a least-squares optimizer that strives to maintain the most stable instantaneous grasp by minimizing contact sliding and spinning. Then, we evaluated our method against a baseline using collision detection and a baseline using primitive shapes. The results show that our method performed the best in accuracy and precision, even for coarse meshes. We conclude with a future work discussion on incorporating multiple contacts and contact forces to achieve accurate and robust mesh-based surface contact modeling.

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving

Authors:Nan Song, Bozhou Zhang, Xiatian Zhu, Jiankang Deng, Li Zhang
Date:2025-08-17 15:42:54

Large vision-language models (VLMs) have shown promising capabilities in scene understanding, enhancing the explainability of driving behaviors and interactivity with users. Existing methods primarily fine-tune VLMs on on-board multi-view images and scene reasoning text, but this approach often lacks the holistic and nuanced scene recognition and powerful spatial awareness required for autonomous driving, especially in complex situations. To address this gap, we propose a novel vision-language framework tailored for autonomous driving, called LMAD. Our framework emulates modern end-to-end driving paradigms by incorporating comprehensive scene understanding and a task-specialized structure with VLMs. In particular, we introduce preliminary scene interaction and specialized expert adapters within the same driving task structure, which better align VLMs with autonomous driving scenarios. Furthermore, our approach is designed to be fully compatible with existing VLMs while seamlessly integrating with planning-oriented driving systems. Extensive experiments on the DriveLM and nuScenes-QA datasets demonstrate that LMAD significantly boosts the performance of existing VLMs on driving reasoning tasks,setting a new standard in explainable autonomous driving.

Engaging young people for a more inclusive national energy transition: A participatory modelling framework

Authors:Muhammad Shahzad Javed, Karin Fossheim, Maximilian Roithner, Matylda N. Guzik, James Price, Beate Seibt, Marianne Zeyringer
Date:2025-08-17 12:51:37

Participatory research in energy system modelling can generate bottom-up knowledge to explore co-designed future net-zero energy system scenarios. However, it often fails to facilitate collective learning, explore explicitly informed perspectives, and frequently ignores underrepresented groups like youth, among whom distrust about the energy transformation process is growing. By modifying a national electricity system model to reflect young people's socio-techno-environmental insights gathered through school workshops, this study presents a framework for envisioning future net-zero power systems in Norway. Given pupil priorities regarding certain power system aspects and their cumulative impact, substantial shifts occur in national renewable capacity potentials (approximately plus or minus 50%), system costs (-7% to +25%), technology mixes (notably onshore wind from 40% to 0%), transmission capacities (near doubling), and regional equity assessments. We find that costly youth-driven system designs do not necessarily guarantee equitable systems. Although applied to young people in Norway, the proposed workshop-informed modelling framework serves as a tool to meaningfully engage diverse groups and capture their perspectives, thereby further democratising energy system planning. The approach is expected to help address social acceptance challenges through enhanced understanding of trade-offs in the energy transformation process.

EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos

Authors:Junyi Ma, Erhang Zhang, Yin-Dong Zheng, Yuchen Xie, Yixuan Zhou, Hesheng Wang
Date:2025-08-17 12:38:56

Analyzing hand-object interaction in egocentric vision facilitates VR/AR applications and human-robot policy transfer. Existing research has mostly focused on modeling the behavior paradigm of interactive actions (i.e., ``how to interact''). However, the more challenging and fine-grained problem of capturing the critical moments of contact and separation between the hand and the target object (i.e., ``when to interact'') is still underexplored, which is crucial for immersive interactive experiences in mixed reality and robotic motion planning. Therefore, we formulate this problem as temporal interaction localization (TIL). Some recent works extract semantic masks as TIL references, but suffer from inaccurate object grounding and cluttered scenarios. Although current temporal action localization (TAL) methods perform well in detecting verb-noun action segments, they rely on category annotations during training and exhibit limited precision in localizing hand-object contact/separation moments. To address these issues, we propose a novel zero-shot approach dubbed EgoLoc to localize hand-object contact and separation timestamps in egocentric videos. EgoLoc introduces hand-dynamics-guided sampling to generate high-quality visual prompts. It exploits the vision-language model to identify contact/separation attributes, localize specific timestamps, and provide closed-loop feedback for further refinement. EgoLoc eliminates the need for object masks and verb-noun taxonomies, leading to generalizable zero-shot implementation. Comprehensive experiments on the public dataset and our novel benchmarks demonstrate that EgoLoc achieves plausible TIL for egocentric videos. It is also validated to effectively facilitate multiple downstream applications in egocentric vision and robotic manipulation tasks. Code and relevant data will be released at https://github.com/IRMVLab/EgoLoc.

Mantis: A Simulation-Grounded Foundation Model for Disease Forecasting

Authors:Carson Dudley, Reiden Magdaleno, Christopher Harding, Ananya Sharma, Emily Martin, Marisa Eisenberg
Date:2025-08-17 06:55:29

Infectious disease forecasting in novel outbreaks or low resource settings has been limited by the need for disease-specific data, bespoke training, and expert tuning. We introduce Mantis, a foundation model trained entirely on mechanistic simulations, which enables out-of-the-box forecasting across diseases, regions, and outcomes, even in settings with limited historical data. Mantis is built on over 400 million simulated days of outbreak dynamics spanning diverse pathogens, transmission modes, interventions, and surveillance artifacts. Despite requiring no real-world data during training, Mantis outperformed 39 expert-tuned models we tested across six diseases, including all models in the CDC's COVID-19 Forecast Hub. Mantis generalized to novel epidemiological regimes, including diseases with held-out transmission mechanisms, demonstrating that it captures fundamental contagion dynamics. Critically, Mantis is mechanistically interpretable, enabling public health decision-makers to identify the latent drivers behind its predictions. Finally, Mantis delivers accurate forecasts at 8-week horizons, more than doubling the actionable range of most models, enabling proactive public health planning. Together, these capabilities position Mantis as a foundation for next-generation disease forecasting systems: general, interpretable, and deployable where traditional models fail.