planning - 2025-11-10

EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation

Authors:Samarth Chopra, Alex McMoil, Ben Carnovale, Evan Sokolson, Rajkumar Kubendran, Samuel Dickerson

Date:2025-11-07 16:24:45

While Vision-Language-Action (VLA) models map visual inputs and language instructions directly to robot actions, they often rely on costly hardware and struggle in novel or cluttered scenes. We introduce EverydayVLA, a 6-DOF manipulator that can be assembled for under $300, capable of modest payloads and workspace. A single unified model jointly outputs discrete and continuous actions, and our adaptive-horizon ensemble monitors motion uncertainty to trigger on-the-fly re-planning for safe, reliable operation. On LIBERO, EverydayVLA matches state-of-the-art success rates, and in real-world tests it outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution. By combining a state-of-the-art VLA with cost-effective hardware, EverydayVLA democratizes access to a robotic foundation model and paves the way for economical use in homes and research labs alike. Experiment videos and details: https://everydayvla.github.io/

Reasoning Is All You Need for Urban Planning AI

Authors:Sijie Yang, Jiatong Li, Filip Biljecki

Date:2025-11-07 15:59:06

AI has proven highly successful at urban planning analysis -- learning patterns from data to predict future conditions. The next frontier is AI-assisted decision-making: agents that recommend sites, allocate resources, and evaluate trade-offs while reasoning transparently about constraints and stakeholder values. Recent breakthroughs in reasoning AI -- CoT prompting, ReAct, and multi-agent collaboration frameworks -- now make this vision achievable. This position paper presents the Agentic Urban Planning AI Framework for reasoning-capable planning agents that integrates three cognitive layers (Perception, Foundation, Reasoning) with six logic components (Analysis, Generation, Verification, Evaluation, Collaboration, Decision) through a multi-agents collaboration framework. We demonstrate why planning decisions require explicit reasoning capabilities that are value-based (applying normative principles), rule-grounded (guaranteeing constraint satisfaction), and explainable (generating transparent justifications) -- requirements that statistical learning alone cannot fulfill. We compare reasoning agents with statistical learning, present a comprehensive architecture with benchmark evaluation metrics, and outline critical research challenges. This framework shows how AI agents can augment human planners by systematically exploring solution spaces, verifying regulatory compliance, and deliberating over trade-offs transparently -- not replacing human judgment but amplifying it with computational reasoning capabilities.

SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning

Authors:Tzu-Yuan Huang, Armin Lederer, Dai-Jie Wu, Xiaobing Dai, Sihua Zhang, Stefan Sosnowski, Shao-Hua Sun, Sandra Hirche

Date:2025-11-07 15:46:44

Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fundamental and crucial requirement for the safety and admissibility of planned trajectories on various systems. Moreover, existing FM planners do not ensure the dynamical consistency, which potentially renders trajectories inexecutable. We address these shortcomings by proposing SAD-Flower, a novel framework for generating Safe, Admissible, and Dynamically consistent trajectories. Our approach relies on an augmentation of the flow with a virtual control input. Thereby, principled guidance can be derived using techniques from nonlinear control theory, providing formal guarantees for state constraints, action constraints, and dynamic consistency. Crucially, SAD-Flower operates without retraining, enabling test-time satisfaction of unseen constraints. Through extensive experiments across several tasks, we demonstrate that SAD-Flower outperforms various generative-model-based baselines in ensuring constraint satisfaction.

Force-Safe Environment Maps and Real-Time Detection for Soft Robot Manipulators

Authors:Akua K. Dickson, Juan C. Pacheco Garcia, Andrew P. Sabelhaus

Date:2025-11-07 15:07:01

Soft robot manipulators have the potential for deployment in delicate environments to perform complex manipulation tasks. However, existing obstacle detection and avoidance methods do not consider limits on the forces that manipulators may exert upon contact with delicate obstacles. This work introduces a framework that maps force safety criteria from task space (i.e. positions along the robot's body) to configuration space (i.e. the robot's joint angles) and enables real-time force safety detection. We incorporate limits on allowable environmental contact forces for given task-space obstacles, and map them into configuration space (C-space) through the manipulator's forward kinematics. This formulation ensures that configurations classified as safe are provably below the maximum force thresholds, thereby allowing us to determine force-safe configurations of the soft robot manipulator in real-time. We validate our approach in simulation and hardware experiments on a two-segment pneumatic soft robot manipulator. Results demonstrate that the proposed method accurately detects force safety during interactions with deformable obstacles, thereby laying the foundation for real-time safe planning of soft manipulators in delicate, cluttered environments.

Beyond Master and Apprentice: Grounding Foundation Models for Symbiotic Interactive Learning in a Shared Latent Space

Authors:Linus Nwankwo, Björn Ellensohn, Christian Rauch, Elmar Rueckert

Date:2025-11-07 12:43:07

Today's autonomous agents can understand free-form natural language instructions and execute long-horizon tasks in a manner akin to human-level reasoning. These capabilities are mostly driven by large-scale pre-trained foundation models (FMs). However, the approaches with which these models are grounded for human-robot interaction (HRI) perpetuate a master-apprentice model, where the apprentice (embodied agent) passively receives and executes the master's (human's) commands without reciprocal learning. This reactive interaction approach does not capture the co-adaptive dynamics inherent in everyday multi-turn human-human interactions. To address this, we propose a Symbiotic Interactive Learning (SIL) approach that enables both the master and the apprentice to co-adapt through mutual, bidirectional interactions. We formalised SIL as a co-adaptation process within a shared latent task space, where the agent and human maintain joint belief states that evolve based on interaction history. This enables the agent to move beyond reactive execution to proactive clarification, adaptive suggestions, and shared plan refinement. To realise these novel behaviours, we leveraged pre-trained FMs for spatial perception and reasoning, alongside a lightweight latent encoder that grounds the models' outputs into task-specific representations. Furthermore, to ensure stability as the tasks evolve, we augment SIL with a memory architecture that prevents the forgetting of learned task-space representations. We validate SIL on both simulated and real-world embodied tasks, including instruction following, information retrieval, query-oriented reasoning, and interactive dialogues. Demos and resources are public at:~\href{https://linusnep.github.io/SIL/}{https://linusnep.github.io/SIL/}.

Rotational Splittings in Diatomic Molecules of Interest to Searches for New Physics

Authors:Ayaki Sunaga, Timo Fleig

Date:2025-11-07 12:13:54

Diatomic molecules with an energetically low-lying $^3 \Delta_1$ state are attractive platforms to detect new physics beyond the Standard Model, such as parity- and time-reversal violating phenomena. One of the advantages of using a $^3 \Delta_1$ state is its tiny $\Lambda$-splitting due to the coupling between the electronic and rotational angular momenta, which facilitates polarizing the molecules in small external electric fields. Theoretical estimation of the magnitude of the $\Lambda$-splitting is helpful for planning new experiments. In this study, we present a theoretical model to calculate the $\Lambda$-splitting. Our model integrates the relativistic four-component wavefunction and the traditional rotational Hamiltonian based on Hund's case (a). The multireference character of the wavefunction is taken into account. Our calculations for PtH and ThF$^+$ molecules qualitatively agree with experiment. The $\Lambda$-splitting of TaO$^+$ for the rotational ground state is predicted to be around 9 kHz. This tiny splitting can reduce the systematic uncertainty, but in a practical experiment, it may cause depolarization during rotation ramp-up.

PySlyde: A Lightweight, Open-Source Toolkit for Pathology Preprocessing

Authors:Gregory Verghese, Anthony Baptista, Chima Eke, Holly Rafique, Mengyuan Li, Fathima Mohamed, Ananya Bhalla, Lucy Ryan, Michael Pitcher, Enrico Parisini, Concetta Piazzese, Liz Ing-Simmons, Anita Grigoriadis

Date:2025-11-07 12:03:24

The integration of artificial intelligence (AI) into pathology is advancing precision medicine by improving diagnosis, treatment planning, and patient outcomes. Digitised whole-slide images (WSIs) capture rich spatial and morphological information vital for understanding disease biology, yet their gigapixel scale and variability pose major challenges for standardisation and analysis. Robust preprocessing, covering tissue detection, tessellation, stain normalisation, and annotation parsing is critical but often limited by fragmented and inconsistent workflows. We present PySlyde, a lightweight, open-source Python toolkit built on OpenSlide to simplify and standardise WSI preprocessing. PySlyde provides an intuitive API for slide loading, annotation management, tissue detection, tiling, and feature extraction, compatible with modern pathology foundation models. By unifying these processes, it streamlines WSI preprocessing, enhances reproducibility, and accelerates the generation of AI-ready datasets, enabling researchers to focus on model development and downstream analysis.

Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing 177Lu-based Peptide Receptor Radionuclide Therapy

Authors:Simon Baur, Tristan Ruhwedel, Ekin Böke, Zuzanna Kobus, Gergana Lishkova, Christoph Wetz, Holger Amthauer, Christoph Roderburg, Frank Tacke, Julian M. Rogasch, Wojciech Samek, Henning Jann, Jackie Ma, Johannes Eschrich

Date:2025-11-07 11:39:21

Peptide receptor radionuclide therapy (PRRT) is an established treatment for metastatic neuroendocrine tumors (NETs), yet long-term disease control occurs only in a subset of patients. Predicting progression-free survival (PFS) could support individualized treatment planning. This study evaluates laboratory, imaging, and multimodal deep learning models for PFS prediction in PRRT-treated patients. In this retrospective, single-center study 116 patients with metastatic NETs undergoing 177Lu-DOTATOC were included. Clinical characteristics, laboratory values, and pretherapeutic somatostatin receptor positron emission tomography/computed tomographies (SR-PET/CT) were collected. Seven models were trained to classify low- vs. high-PFS groups, including unimodal (laboratory, SR-PET, or CT) and multimodal fusion approaches. Explainability was evaluated by feature importance analysis and gradient maps. Forty-two patients (36%) had short PFS (< 1 year), 74 patients long PFS (>1 year). Groups were similar in most characteristics, except for higher baseline chromogranin A (p = 0.003), elevated gamma-GT (p = 0.002), and fewer PRRT cycles (p < 0.001) in short-PFS patients. The Random Forest model trained only on laboratory biomarkers reached an AUROC of 0.59 +- 0.02. Unimodal three-dimensional convolutional neural networks using SR-PET or CT performed worse (AUROC 0.42 +- 0.03 and 0.54 +- 0.01, respectively). A multimodal fusion model laboratory values, SR-PET, and CT -augmented with a pretrained CT branch - achieved the best results (AUROC 0.72 +- 0.01, AUPRC 0.80 +- 0.01). Multimodal deep learning combining SR-PET, CT, and laboratory biomarkers outperformed unimodal approaches for PFS prediction after PRRT. Upon external validation, such models may support risk-adapted follow-up strategies.

Variational models of robust optimal transport

Authors:Luigi De Masi, Andrea Marchese, Annalisa Massaccesi

Date:2025-11-07 10:57:17

This paper introduces two variational formulations for a model of robust optimal transport, that is, the problem of designing optimal transport networks that are resilient to potential damages, balancing construction costs against the benefit of maintaining partial functionality when parts of the network are damaged. We propose a Eulerian formulation, where the network is modeled by a rectifiable measure and recovery plans are represented by 1-dimensional normal currents. This framework allows for changes in the direction of the transportation in response to damages but restricts damages to be characteristic functions of closed sets. We also propose a Lagrangian formulation, where the network is a traffic plan (that is, a measure on the space of Lipschitz curves) and recovery plans are sub-traffic plans. This approach prescribes the network's orientation but allows for a wider class of damages. We prove existence of minimizers in both settings. The two models are compared through examples that illustrate their main differences: the Eulerian formulation's necessity for an unoriented network to achieve existence, the Lagrangian formulation's ability to handle general damages and its requirement for a positive distance between the supports of the source and target measures.

TAPOM: Task-Space Topology-Guided Motion Planning for Manipulating Elongated Object in Cluttered Environments

Authors:Zihao Li, Yiming Zhu, Zhe Zhong, Qinyuan Ren, Yijiang Huang

Date:2025-11-07 07:49:54

Robotic manipulation in complex, constrained spaces is vital for widespread applications but challenging, particularly when navigating narrow passages with elongated objects. Existing planning methods often fail in these low-clearance scenarios due to the sampling difficulties or the local minima. This work proposes Topology-Aware Planning for Object Manipulation (TAPOM), which explicitly incorporates task-space topological analysis to enable efficient planning. TAPOM uses a high-level analysis to identify critical pathways and generate guiding keyframes, which are utilized in a low-level planner to find feasible configuration space trajectories. Experimental validation demonstrates significantly high success rates and improved efficiency over state-of-the-art methods on low-clearance manipulation tasks. This approach offers broad implications for enhancing manipulation capabilities of robots in complex real-world environments.

Carbon Price Forecasting with Structural Breaks: A Comparative Study of Deep Learning Models

Authors:Runsheng Ren, Jing Li, Yanxiu Li, Shixun Huang, Jun Shen, Wanqing Li, John Le, Sheng Wang

Date:2025-11-07 05:16:56

Accurately forecasting carbon prices is essential for informed energy market decision-making, guiding sustainable energy planning, and supporting effective decarbonization strategies. However, it remains challenging due to structural breaks and high-frequency noise caused by frequent policy interventions and market shocks. Existing studies, including the most recent baseline approaches, have attempted to incorporate breakpoints but often treat denoising and modeling as separate processes and lack systematic evaluation across advanced deep learning architectures, limiting the robustness and the generalization capability. To address these gaps, this paper proposes a comprehensive hybrid framework that integrates structural break detection (Bai-Perron, ICSS, and PELT algorithms), wavelet signal denoising, and three state-of-the-art deep learning models (LSTM, GRU, and TCN). Using European Union Allowance (EUA) spot prices from 2007 to 2024 and exogenous features such as energy prices and policy indicators, the framework constructs univariate and multivariate datasets for comparative evaluation. Experimental results demonstrate that our proposed PELT-WT-TCN achieves the highest prediction accuracy, reducing forecasting errors by 22.35% in RMSE and 18.63% in MAE compared to the state-of-the-art baseline model (Breakpoints with Wavelet and LSTM), and by 70.55% in RMSE and 74.42% in MAE compared to the original LSTM without decomposition from the same baseline study. These findings underscore the value of integrating structural awareness and multiscale decomposition into deep learning architectures to enhance accuracy and interpretability in carbon price forecasting and other nonstationary financial time series.

iFlyBot-VLM Technical Report

Authors:Xin Nie, Zhiyuan Cheng, Yuan Zhang, Chao Ji, Jiajia Wu, Yuhan Zhang, Jia Pan

Date:2025-11-07 04:27:15

We introduce iFlyBot-VLM, a general-purpose Vision-Language Model (VLM) used to improve the domain of Embodied Intelligence. The central objective of iFlyBot-VLM is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robotic motion control. To this end, the model abstracts complex visual and spatial information into a body-agnostic and transferable Operational Language, thereby enabling seamless perception-action closed-loop coordination across diverse robotic platforms. The architecture of iFlyBot-VLM is systematically designed to realize four key functional capabilities essential for embodied intelligence: 1) Spatial Understanding and Metric Reasoning; 2) Interactive Target Grounding; 3) Action Abstraction and Control Parameter Generation; 4) Task Planning and Skill Sequencing. We envision iFlyBot-VLM as a scalable and generalizable foundation model for embodied AI, facilitating the progression from specialized task-oriented systems toward generalist, cognitively capable agents. We conducted evaluations on 10 current mainstream embodied intelligence-related VLM benchmark datasets, such as Blink and Where2Place, and achieved optimal performance while preserving the model's general capabilities. We will publicly release both the training data and model weights to foster further research and development in the field of Embodied Intelligence.

A benchmark multimodal oro-dental dataset for large vision-language models

Authors:Haoxin Lv, Ijazul Haq, Jin Du, Jiaxin Ma, Binnian Zhu, Xiaobing Dang, Chaoan Liang, Ruxu Du, Yingjie Zhang, Muhammad Saqib

Date:2025-11-07 03:14:20

The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, we present a comprehensive multimodal dataset, comprising 8775 dental checkups from 4800 patients collected over eight years (2018-2025), with patients ranging from 10 to 90 years of age. The dataset includes 50000 intraoral images, 8056 radiographs, and detailed textual records, including diagnoses, treatment plans, and follow-up notes. The data were collected under standard ethical guidelines and annotated for benchmarking. To demonstrate its utility, we fine-tuned state-of-the-art large vision-language models, Qwen-VL 3B and 7B, and evaluated them on two tasks: classification of six oro-dental anomalies and generation of complete diagnostic reports from multimodal inputs. We compared the fine-tuned models with their base counterparts and GPT-4o. The fine-tuned models achieved substantial gains over these baselines, validating the dataset and underscoring its effectiveness in advancing AI-driven oro-dental healthcare solutions. The dataset is publicly available, providing an essential resource for future research in AI dentistry.

Real-Time Reasoning Agents in Evolving Environments

Authors:Yule Wen, Yixin Ye, Yanzhe Zhang, Diyi Yang, Hao Zhu

Date:2025-11-07 00:51:02

Agents in the real world must make not only logical but also timely judgments. This requires continuous awareness of the dynamic environment: hazards emerge, opportunities arise, and other agents act, while the agent's reasoning is still unfolding. Despite advances in language model reasoning, existing approaches fail to account for this dynamic nature. We introduce real-time reasoning as a new problem formulation for agents in evolving environments and build Real-Time Reasoning Gym to demonstrate it. We study two paradigms for deploying language models in agents: (1) reactive agents, which employ language models with bounded reasoning computation for rapid responses, and (2) planning agents, which allow extended reasoning computation for complex problems. Our experiments show that even state-of-the-art models struggle with making logical and timely judgments in either paradigm. To address this limitation, we propose AgileThinker, which simultaneously engages both reasoning paradigms. AgileThinker consistently outperforms agents engaging only one reasoning paradigm as the task difficulty and time pressure rise, effectively balancing reasoning depth and response latency. Our work establishes real-time reasoning as a critical testbed for developing practical agents and provides a foundation for research in temporally constrained AI systems, highlighting a path toward real-time capable agents.

Conformalized Non-uniform Sampling Strategies for Accelerated Sampling-based Motion Planning

Authors:Shubham Natraj, Bruno Sinopoli, Yiannis Kantaros

Date:2025-11-06 21:54:34

Sampling-based motion planners (SBMPs) are widely used to compute dynamically feasible robot paths. However, their reliance on uniform sampling often leads to poor efficiency and slow planning in complex environments. We introduce a novel non-uniform sampling strategy that integrates into existing SBMPs by biasing sampling toward `certified' regions. These regions are constructed by (i) generating an initial, possibly infeasible, path using any heuristic path predictor (e.g., A* or vision-language models) and (ii) applying conformal prediction to quantify the predictor's uncertainty. This process yields prediction sets around the initial-guess path that are guaranteed, with user-specified probability, to contain the optimal solution. To our knowledge, this is the first non-uniform sampling approach for SBMPs that provides such probabilistically correct guarantees on the sampling regions. Extensive evaluations demonstrate that our method consistently finds feasible paths faster and generalizes better to unseen environments than existing baselines.

Agentic Refactoring: An Empirical Study of AI Coding Agents

Authors:Kosei Horikawa, Hao Li, Yutaro Kashiwa, Bram Adams, Hajimu Iida, Ahmed E. Hassan

Date:2025-11-06 21:24:38

Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape. These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks. Agents have become active participants in refactoring, a cornerstone of sustainable software development aimed at improving internal code quality without altering observable behavior. Despite their increasing adoption, there is a critical lack of empirical understanding regarding how agentic refactoring is utilized in practice, how it compares to human-driven refactoring, and what impact it has on code quality. To address this empirical gap, we present a large-scale study of AI agent-generated refactorings in real-world open-source Java projects, analyzing 15,451 refactoring instances across 12,256 pull requests and 14,988 commits derived from the AIDev dataset. Our empirical analysis shows that refactoring is a common and intentional activity in this development paradigm, with agents explicitly targeting refactoring in 26.1% of commits. Analysis of refactoring types reveals that agentic efforts are dominated by low-level, consistency-oriented edits, such as Change Variable Type (11.8%), Rename Parameter (10.4%), and Rename Variable (8.5%), reflecting a preference for localized improvements over the high-level design changes common in human refactoring. Additionally, the motivations behind agentic refactoring focus overwhelmingly on internal quality concerns, with maintainability (52.5%) and readability (28.1%). Furthermore, quantitative evaluation of code quality metrics shows that agentic refactoring yields small but statistically significant improvements in structural metrics, particularly for medium-level changes, reducing class size and complexity (e.g., Class LOC median $\Delta$ = -15.25).

Comparative Analysis of 10 - 50 MeV Solar Proton Events at Lagrange Point 1 and the Geostationary Orbit

Authors:Aatiya Ali, Viacheslav Sadykov

Date:2025-11-06 20:23:54

Solar proton events (SPEs) pose radiation hazards, disrupt technology, and impact operations on Earth and in space, making continuous monitoring essential. We compare 10-50 MeV proton flux measurements from SOHO/EPHIN at Lagrange Point 1 (L1) with those from NOAA/GOES in geostationary orbit (GEO) during Solar Cycle 23 and most of Cycle 24. We identify 83 >=10 pfu SPEs observed at both locations and classify them into S1-S4 categories (comparable to NOAA's solar radiation storm scales). EPHIN detected earlier onsets and longer durations across all categories, along with earlier peaks and ends for S1-S3, while GOES recorded slightly earlier peak and end times for S4. S1 median timing offsets (EPHIN relative to GOES) were -20 +/- 50 min (onsets), -1.00 +/- 1.42 hr (peaks), and -1.08 +/- 2.21 hr (ends), with similar trends for S2-S3 and near-simultaneity for S4 (peaks ~ -0.17 +/- 1.62 hr; ends ~ +0.04 +/- 3.33 hr). Flux comparisons show that EPHIN measurements modestly exceed GOES for S1 (median ratios ~1.11 for peaks and ~1.06 for fluence) and are lower than GOES for stronger events (peaks ~0.97 +/- 0.29, 0.84 +/- 0.21; fluence ~0.84 +/- 0.16, 0.75 +/- 0.16 for S2-S3). The EPHIN-to-GOES peak flux and fluence ratios reach 0.16 +/- 0.03 and 0.29 +/- 0.07, respectively, for S4 events, originating from contamination of lower-energy GOES channels. Correlation analyses show no significant flux dependence on geomagnetic indices, field strength, or spacecraft position, suggesting minimal near-Earth modulation of >=10 MeV proton access at GEO. These results highlight systematic differences in how SPEs manifest at L1 versus GEO and offer practical guidance for forecasting beyond Earth's magnetosphere, supporting mission planning for near-Earth and cislunar exploration, including Artemis.

OPF-Based Optimal Power System Network Restoration Considering Frequency Dynamics

Authors:Dawn Virginillo, Asja Derviškadić, Mario Paolone

Date:2025-11-06 19:53:16

Due to recent blackout and system split incidents in power grids worldwide, as well as increased system complexity in view of the energy transition, there has been increasing interest in re-evaluating existing Power System Restoration (PSR) plans. In restoration scenarios, due to low island inertia, it is necessary to ensure not only the static, but also the dynamic stability of the system. In this paper, we pose and solve a formulation of the optimal PSR problem including frequency dynamics. We validate the switching constraints for global optimality within a static version of the formulation using a brute-force tree search method. We apply the dynamic problem formulation to the IEEE 9-Bus model, and show that the optimal switching sequence using the static formulation would violate dynamic constraints, illustrating the importance of dynamic considerations in PSR planning.

ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning & Scheduling

Authors:Caelan Garrett, Fabio Ramos

Date:2025-11-06 19:17:42

Bimanual and humanoid robots are appealing because of their human-like ability to leverage multiple arms to efficiently complete tasks. However, controlling multiple arms at once is computationally challenging due to the growth in the hybrid discrete-continuous action space. Task and Motion Planning (TAMP) algorithms can efficiently plan in hybrid spaces but generally produce plans, where only one arm is moving at a time, rather than schedules that allow for parallel arm motion. In order to extend TAMP to produce schedules, we present ScheduleStream, the first general-purpose framework for planning & scheduling with sampling operations. ScheduleStream models temporal dynamics using hybrid durative actions, which can be started asynchronously and persist for a duration that's a function of their parameters. We propose domain-independent algorithms that solve ScheduleStream problems without any application-specific mechanisms. We apply ScheduleStream to Task and Motion Planning & Scheduling (TAMPAS), where we use GPU acceleration within samplers to expedite planning. We compare ScheduleStream algorithms to several ablations in simulation and find that they produce more efficient solutions. We demonstrate ScheduleStream on several real-world bimanual robot tasks at https://schedulestream.github.io.

ARETE: an R package for Automated REtrieval from TExt with large language models

Authors:Vasco V. Branco, Jandó Benedek, Lidia Pivovarova, Luís Correia, Pedro Cardoso

Date:2025-11-06 17:26:48

1. A hard stop for the implementation of rigorous conservation initiatives is our lack of key species data, especially occurrence data. Furthermore, researchers have to contend with an accelerated speed at which new information must be collected and processed due to anthropogenic activity. Publications ranging from scientific papers to gray literature contain this crucial information but their data are often not machine-readable, requiring extensive human work to be retrieved. 2. We present the ARETE R package, an open-source software aiming to automate data extraction of species occurrences powered by large language models, namely using the chatGPT Application Programming Interface. This R package integrates all steps of the data extraction and validation process, from Optical Character Recognition to detection of outliers and output in tabular format. Furthermore, we validate ARETE through systematic comparison between what is modelled and the work of human annotators. 3. We demonstrate the usefulness of the approach by comparing range maps produced using GBIF data and with those automatically extracted for 100 species of spiders. Newly extracted data allowed to expand the known Extent of Occurrence by a mean three orders of magnitude, revealing new areas where the species were found in the past, which mayhave important implications for spatial conservation planning and extinction risk assessments. 4. ARETE allows faster access to hitherto untapped occurrence data, a potential game changer in projects requiring such data. Researchers will be able to better prioritize resources, manually verifying selected species while maintaining automated extraction for the majority. This workflow also allows predicting available bibliographic data during project planning.

A New Probabilistic Mobile Byzantine Failure Model for Self-Protecting Systems

Authors:Silvia Bonomi, Giovanni Farina, Roy Friedman, Eviatar B. Procaccia, Sebastien Tixeuil

Date:2025-11-06 16:38:43

Modern distributed systems face growing security threats, as attackers continuously enhance their skills and vulnerabilities span across the entire system stack, from hardware to the application layer. In the system design phase, fault tolerance techniques can be employed to safeguard systems. From a theoretical perspective, an attacker attempting to compromise a system can be abstracted by considering the presence of Byzantine processes in the system. Although this approach enhances the resilience of the distributed system, it introduces certain limitations regarding the accuracy of the model in reflecting real-world scenarios. In this paper, we consider a self-protecting distributed system based on the \emph{Monitoring-Analyse-Plan-Execute over a shared Knowledge} (MAPE-K) architecture, and we propose a new probabilistic Mobile Byzantine Failure (MBF) that can be plugged into the Analysis component. Our new model captures the dynamics of evolving attacks and can be used to drive the self-protection and reconfiguration strategy. We analyze mathematically the time that it takes until the number of Byzantine nodes crosses given thresholds, or for the system to self-recover back into a safe state, depending on the rates of Byzantine infection spreading \emph{vs.} the rate of self-recovery. We also provide simulation results that illustrate the behavior of the system under such assumptions.

Robust mean-field control under common noise uncertainty

Authors:Mathieu Laurière, Ariel Neufeld, Kyunghyun Park

Date:2025-11-06 16:31:49

We propose and analyze a framework for discrete-time robust mean-field control problems under common noise uncertainty. In this framework, the mean-field interaction describes the collective behavior of infinitely many cooperative agents' state and action, while the common noise -- a random disturbance affecting all agents' state dynamics -- is uncertain. A social planner optimizes over open-loop controls on an infinite horizon to maximize the representative agent's worst-case expected reward, where worst-case corresponds to the most adverse probability measure among all candidates inducing the unknown true law of the common noise process. We refer to this optimization as a robust mean-field control problem under common noise uncertainty. We first show that this problem arises as the asymptotic limit of a cooperative $N$-agent robust optimization problem, commonly known as propagation of chaos. We then prove the existence of an optimal open-loop control by linking the robust mean field control problem to a lifted robust Markov decision problem on the space of probability measures and by establishing the dynamic programming principle and Bellman--Isaac fixed point theorem for the lifted robust Markov decision problem. Finally, we complement our theoretical results with numerical experiments motivated by distribution planning and systemic risk in finance, highlighting the advantages of accounting for common noise uncertainty.

GraSP-VLA: Graph-based Symbolic Action Representation for Long-Horizon Planning with VLA Policies

Authors:Maëlic Neau, Zoe Falomir, Paulo E. Santos, Anne-Gwenn Bosser, Cédric Buche

Date:2025-11-06 13:39:38

Deploying autonomous robots that can learn new skills from demonstrations is an important challenge of modern robotics. Existing solutions often apply end-to-end imitation learning with Vision-Language Action (VLA) models or symbolic approaches with Action Model Learning (AML). On the one hand, current VLA models are limited by the lack of high-level symbolic planning, which hinders their abilities in long-horizon tasks. On the other hand, symbolic approaches in AML lack generalization and scalability perspectives. In this paper we present a new neuro-symbolic approach, GraSP-VLA, a framework that uses a Continuous Scene Graph representation to generate a symbolic representation of human demonstrations. This representation is used to generate new planning domains during inference and serves as an orchestrator for low-level VLA policies, scaling up the number of actions that can be reproduced in a row. Our results show that GraSP-VLA is effective for modeling symbolic representations on the task of automatic planning domain generation from observations. In addition, results on real-world experiments show the potential of our Continuous Scene Graph representation to orchestrate low-level VLA policies in long-horizon tasks.

Differential Flatness of Quasi-Static Slider-Pusher Models with Applications in Control

Authors:Sander De Witte, Tom Lefebvre, Thomas Neve, Andras Retzler, Guillaume Crevecoeur

Date:2025-11-06 10:33:24

This paper investigates the dynamic properties of planar slider-pusher systems as a motion primitive in manipulation tasks. To that end, we construct a differential kinematic model deriving from the limit surface approach under the quasi-static assumption and with negligible contact friction. The quasi-static model applies to generic slider shapes and circular pusher geometries, enabling a differential kinematic representation of the system. From this model, we analyze differential flatness - a property advantageous for control synthesis and planning - and find that slider-pusher systems with polygon sliders and circular pushers exhibit flatness with the centre of mass as a flat output. Leveraging this property, we propose two control strategies for trajectory tracking: a cascaded quasi-static feedback strategy and a dynamic feedback linearization approach. We validate these strategies through closed-loop simulations incorporating perturbed models and input noise, as well as experimental results using a physical setup with a finger-like pusher and vision-based state detection. The real-world experiments confirm the applicability of the simulation gains, highlighting the potential of the proposed methods for

Transforming Mentorship: An AI Powered Chatbot Approach to University Guidance

Authors:Mashrur Rahman, Mantaqa abedin, Monowar Zamil Abir, Faizul Islam Ansari, Adib Reza, Farig Yousuf Sadeque, Niloy Farhan

Date:2025-11-06 08:24:52

University students face immense challenges during their undergraduate lives, often being deprived of personalized on-demand guidance that mentors fail to provide at scale. Digital tools exist, but there is a serious lack of customized coaching for newcomers. This paper presents an AI-powered chatbot that will serve as a mentor for the students of BRAC University. The main component is a data ingestion pipeline that efficiently processes and updates information from diverse sources, such as CSV files and university webpages. The chatbot retrieves information through a hybrid approach, combining BM25 lexical ranking with ChromaDB semantic retrieval, and uses a Large Language Model, LLaMA-3.3-70B, to generate conversational responses. The generated text was found to be semantically highly relevant, with a BERTScore of 0.831 and a METEOR score of 0.809. The data pipeline was also very efficient, taking 106.82 seconds for updates, compared to 368.62 seconds for new data. This chatbot will be able to help students by responding to their queries, helping them to get a better understanding of university life, and assisting them to plan better routines for their semester in the open-credit university.

Automated Tennis Player and Ball Tracking with Court Keypoints Detection (Hawk Eye System)

Authors:Venkata Manikanta Desu, Syed Fawaz Ali

Date:2025-11-06 07:18:54

This study presents a complete pipeline for automated tennis match analysis. Our framework integrates multiple deep learning models to detect and track players and the tennis ball in real time, while also identifying court keypoints for spatial reference. Using YOLOv8 for player detection, a custom-trained YOLOv5 model for ball tracking, and a ResNet50-based architecture for court keypoint detection, our system provides detailed analytics including player movement patterns, ball speed, shot accuracy, and player reaction times. The experimental results demonstrate robust performance in varying court conditions and match scenarios. The model outputs an annotated video along with detailed performance metrics, enabling coaches, broadcasters, and players to gain actionable insights into the dynamics of the game.

KoTaP: A Panel Dataset for Corporate Tax Avoidance, Performance, and Governance in Korea

Authors:Hyungjong Na, Wonho Song, Seungyong Han, Donghyeon Jo, Sejin Myung, Hyungjoon Kim

Date:2025-11-06 06:13:53

This study introduces the Korean Tax Avoidance Panel (KoTaP), a long-term panel dataset of non-financial firms listed on KOSPI and KOSDAQ between 2011 and 2024. After excluding financial firms, firms with non-December fiscal year ends, capital impairment, and negative pre-tax income, the final dataset consists of 12,653 firm-year observations from 1,754 firms. KoTaP is designed to treat corporate tax avoidance as a predictor variable and link it to multiple domains, including earnings management (accrual- and activity-based), profitability (ROA, ROE, CFO, LOSS), stability (LEV, CUR, SIZE, PPE, AGE, INVREC), growth (GRW, MB, TQ), and governance (BIG4, FORN, OWN). Tax avoidance itself is measured using complementary indicators cash effective tax rate (CETR), GAAP effective tax rate (GETR), and book-tax difference measures (TSTA, TSDA) with adjustments to ensure interpretability. A key strength of KoTaP is its balanced panel structure with standardized variables and its consistency with international literature on the distribution and correlation of core indicators. At the same time, it reflects distinctive institutional features of Korean firms, such as concentrated ownership, high foreign shareholding, and elevated liquidity ratios, providing both international comparability and contextual uniqueness. KoTaP enables applications in benchmarking econometric and deep learning models, external validity checks, and explainable AI analyses. It further supports policy evaluation, audit planning, and investment analysis, making it a critical open resource for accounting, finance, and interdisciplinary research.

When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation

Authors:Nishchal Sapkota, Haoyan Shi, Yejia Zhang, Xianshi Ma, Bofang Zheng, Danny Z. Chen

Date:2025-11-06 05:44:57

Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST

Two Decades of Research at the University of Lagos (2004-2023): A Scientometric Analysis of Productivity, Collaboration, and Impact

Authors:Muneer Ahmad, Samuel Ibor Ubi

Date:2025-11-06 05:26:17

This paper presents a scientometric analysis of research output from the University of Lagos, focusing on the two decades spanning 2004 to 2023. Using bibliometric data retrieved from the Web of Science, we examine trends in publication volume, collaboration patterns, citation impact, and the most prolific authors, departments, and research domains at the university. The study reveals a consistent increase in research productivity, with the highest publication output recorded in 2023. Health Sciences, Engineering, and Social Sciences are identified as dominant fields, reflecting the university's interdisciplinary research strengths. Collaborative efforts, both locally and internationally, show a positive correlation with higher citation impact, with the United States and the United Kingdom being the leading international collaborators. Notably, open-access publications account for a significant portion of the university's research output, enhancing visibility and citation rates. The findings offer valuable insights into the university's research performance over the past two decades, providing a foundation for strategic planning and policy formulation to foster research excellence and global impact.

Integrating Ergonomics and Manipulability for Upper Limb Postural Optimization in Bimanual Human-Robot Collaboration

Authors:Chenzui Li, Yiming Chen, Xi Wu, Giacinto Barresi, Fei Chen

Date:2025-11-06 03:16:39

This paper introduces an upper limb postural optimization method for enhancing physical ergonomics and force manipulability during bimanual human-robot co-carrying tasks. Existing research typically emphasizes human safety or manipulative efficiency, whereas our proposed method uniquely integrates both aspects to strengthen collaboration across diverse conditions (e.g., different grasping postures of humans, and different shapes of objects). Specifically, the joint angles of a simplified human skeleton model are optimized by minimizing the cost function to prioritize safety and manipulative capability. To guide humans towards the optimized posture, the reference end-effector poses of the robot are generated through a transformation module. A bimanual model predictive impedance controller (MPIC) is proposed for our human-like robot, CURI, to recalibrate the end effector poses through planned trajectories. The proposed method has been validated through various subjects and objects during human-human collaboration (HHC) and human-robot collaboration (HRC). The experimental results demonstrate significant improvement in muscle conditions by comparing the activation of target muscles before and after optimization.