planning - 2025-08-06

Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?

Authors:Wenxuan Shen, Mingjia Wang, Yaochen Wang, Dongping Chen, Junjie Yang, Yao Wan, Weiwei Lin

Date:2025-08-05 16:55:02

Retrieval-Augmented Generation (RAG) systems using Multimodal Large Language Models (MLLMs) show great promise for complex document understanding, yet their development is critically hampered by inadequate evaluation. Current benchmarks often focus on specific part of document RAG system and use synthetic data with incomplete ground truth and evidence labels, therefore failing to reflect real-world bottlenecks and challenges. To overcome these limitations, we introduce Double-Bench: a new large-scale, multilingual, and multimodal evaluation system that is able to produce fine-grained assessment to each component within document RAG systems. It comprises 3,276 documents (72,880 pages) and 5,168 single- and multi-hop queries across 6 languages and 4 document types with streamlined dynamic update support for potential data contamination issues. Queries are grounded in exhaustively scanned evidence pages and verified by human experts to ensure maximum quality and completeness. Our comprehensive experiments across 9 state-of-the-art embedding models, 4 MLLMs and 4 end-to-end document RAG frameworks demonstrate the gap between text and visual embedding models is narrowing, highlighting the need in building stronger document retrieval models. Our findings also reveal the over-confidence dilemma within current document RAG frameworks that tend to provide answer even without evidence support. We hope our fully open-source Double-Bench provide a rigorous foundation for future research in advanced document RAG systems. We plan to retrieve timely corpus and release new benchmarks on an annual basis.

$β$-Ga$_2$O$_3$--Based Radiation Detector for Proton Therapy

Authors:Hunter D. Ellis, Imteaz Rahaman, Apostoli Hillas, Botong Li, Vikren Sarkar, Kai Fu

Date:2025-08-05 16:20:16

Intensity modulated proton therapy (IMPT) is an advanced cancer treatment modality that offers significant advantages over conventional X-ray therapies, particularly in its ability to minimize radiation dose beyond the tumor target. This reduction in unnecessary irradiation exposure significantly lowers the risk to surrounding healthy tissue and reduces side effects compared to conventional X-ray treatments. However, due to the high complexity of IMPT plans, each plan must be independently validated to ensure the safety and efficacy of the radiation exposure to the patient. While ion chambers are currently used for this purpose, their limitations-particularly in angled-beam measurements and multi-depth assessments-hinder their effectiveness. Silicon-based detectors, commonly used in X-ray therapy, are unsuitable for IMPT due to their rapid degradation under proton irradiation. In this study, a $\beta$-Ga$_2$O$_3$-based metal-semiconductor-metal (MSM) detector was evaluated and compared with a commercial ion chamber using a MEVION S250i proton accelerator. The $\beta$-Ga$_2$O$_3$ detector demonstrated reliable detection of single-pulse proton doses as low as 0.26 MU and exhibited a linear charge-to-dose relationship across a wide range of irradiation conditions. Furthermore, its measurement variability was comparable to that of the ion chamber, with improved sensitivity observed at higher bias voltages. These results highlight the strong potential of $\beta$-Ga$_2$O$_3$ as a radiation-hard detector material for accurate dose verification in IMPT.

CollaBot: Vision-Language Guided Simultaneous Collaborative Manipulation

Authors:Kun Song, Shentao Ma, Gaoming Chen, Ninglong Jin, Guangbao Zhao, Mingyu Ding, Zhenhua Xiong, Jia Pan

Date:2025-08-05 14:57:37

A central research topic in robotics is how to use this system to interact with the physical world. Traditional manipulation tasks primarily focus on small objects. However, in factory or home environments, there is often a need for the movement of large objects, such as moving tables. These tasks typically require multi-robot systems to work collaboratively. Previous research lacks a framework that can scale to arbitrary sizes of robots and generalize to various kinds of tasks. In this work, we propose CollaBot, a generalist framework for simultaneous collaborative manipulation. First, we use SEEM for scene segmentation and point cloud extraction of the target object. Then, we propose a collaborative grasping framework, which decomposes the task into local grasp pose generation and global collaboration. Finally, we design a 2-stage planning module that can generate collision-free trajectories to achieve this task. Experiments show a success rate of 52% across different numbers of robots, objects, and tasks, indicating the effectiveness of the proposed framework.

A Genetic Algorithm Framework for Optimizing Three-Impulse Orbital Transfers with Poliastro Simulation

Authors:Phuc Hao Do, Tran Duc Le

Date:2025-08-05 14:03:35

Orbital maneuver planning is a critical aspect of mission design, aimed at minimizing propellant consumption, which is directly correlated with the total velocity change ($\Delta V$). While analytical solutions like the Hohmann and Bi-elliptic transfers offer optimal strategies for specific cases, they lack the flexibility for more general optimization problems. This paper presents a computational framework that couples a Genetic Algorithm (GA) with the Poliastro orbital mechanics library to autonomously discover fuel-optimal, three-impulse transfer trajectories between coplanar circular orbits. We validate this framework across two distinct scenarios: a low-energy transfer from Low Earth Orbit (LEO) to a Geostationary Orbit (GEO), and a high-energy transfer to a distant orbit with a radius 20 times that of LEO. Our results demonstrate the framework's remarkable adaptability. For the LEO-to-GEO transfer, the GA precisely converges to the classical Hohmann transfer, achieving an identical $\Delta V$ of 3853.96 m/s and validating the method's accuracy. Conversely, for the high-energy transfer, the GA identifies a superior Bi-elliptic trajectory that yields a significant $\Delta V$ saving of 213.47 m/s compared to the Hohmann transfer. This fuel efficiency, however, necessitates a trade-off, extending the mission duration from approximately 1 day to over 140 years. This work demonstrates an accessible and powerful toolchain for the rapid prototyping of optimal trajectories, showcasing how combining evolutionary algorithms with open-source libraries provides a robust method for solving complex astrodynamics problems and quantifying their critical design trade-offs.

AVPDN: Learning Motion-Robust and Scale-Adaptive Representations for Video-Based Polyp Detection

Authors:Zilin Chen, Shengnan Lu

Date:2025-08-05 13:59:18

Accurate detection of polyps is of critical importance for the early and intermediate stages of colorectal cancer diagnosis. Compared to static images, dynamic colonoscopy videos provide more comprehensive visual information, which can facilitate the development of effective treatment plans. However, unlike fixed-camera recordings, colonoscopy videos often exhibit rapid camera movement, introducing substantial background noise that disrupts the structural integrity of the scene and increases the risk of false positives. To address these challenges, we propose the Adaptive Video Polyp Detection Network (AVPDN), a robust framework for multi-scale polyp detection in colonoscopy videos. AVPDN incorporates two key components: the Adaptive Feature Interaction and Augmentation (AFIA) module and the Scale-Aware Context Integration (SACI) module. The AFIA module adopts a triple-branch architecture to enhance feature representation. It employs dense self-attention for global context modeling, sparse self-attention to mitigate the influence of low query-key similarity in feature aggregation, and channel shuffle operations to facilitate inter-branch information exchange. In parallel, the SACI module is designed to strengthen multi-scale feature integration. It utilizes dilated convolutions with varying receptive fields to capture contextual information at multiple spatial scales, thereby improving the model's denoising capability. Experiments conducted on several challenging public benchmarks demonstrate the effectiveness and generalization ability of the proposed method, achieving competitive performance in video-based polyp detection tasks.

Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

Authors:Xinlei Yu, Zhangquan Chen, Yudong Zhang, Shilin Lu, Ruolin Shen, Jiangning Zhang, Xiaobin Hu, Yanwei Fu, Shuicheng Yan

Date:2025-08-05 12:52:09

Existing vision-language models (VLMs), whether generalists or specialists, remain constrained by their parameter scale, lack robust self-correction capabilities, and underperform in tasks involving long visual contexts and complex reasoning, resulting in suboptimal performance on document-based tasks. To address this, we propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling, tailored for visual document understanding and visual question answering (VQA). It comprises four distinct small-scale agents, i.e., planning, execution, judgment, and answer agents, with clearly defined roles and effective collaboration. Notably, the judgment agent exclusively verifies correctness and redirects to prior agents for revisions, outperforming conventional correction strategies. To further expand the capability boundaries of the framework, we propose mixed reward modeling that balances agent-specific abilities and global collaboration, as well as agent-wise hybrid test-time scaling, which customizes different scaling strategies for each agent based on their functions. Evaluated on benchmarks spanning both document-based and non-document-based settings, our MACT shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks. Especially, it stands out in benchmarks involving long visual contexts and complicated reasoning. The three variants of MACT consistently hold the top three positions in average scores, leading in 13 of the 15 benchmarks. Code will be available at: https://github.com/YU-deep/MACT.git.

Decadal upgrade strategy for KAGRA toward post-O5 gravitational-wave astronomy

Authors:KAGRA Collaboration, T. Akutsu, M. Ando, M. Aoumi, A. Araya, Y. Aso, L. Baiotti, R. Bajpai, K. Cannon, A. H. -Y. Chen, D. Chen, H. Chen, A. Chiba, C. Chou, M. Eisenmann, K. Endo, T. Fujimori, S. Garg, D. Haba, S. Haino, R. Harada, H. Hayakawa, K. Hayama, S. Fujii, Y. Himemoto, N. Hirata, C. Hirose, H. -F. Hsieh, H. -Y. Hsieh, C. Hsiung, S. -H. Hsu, K. Ide, R. Iden, S. Ikeda, H. Imafuku, R. Ishikawa, Y. Itoh, M. Iwaya, H-B. Jin, K. Jung, T. Kajita, I. Kaku, M. Kamiizumi, N. Kanda, H. Kato, T. Kato, R. Kawamoto, S. Kim, C. Kim, K. Kobayashi, K. Kohri, K. Kokeyama, K. Komori, A. K. H. Kong, T. Koyama, J. Kume, S. Kuroyanagi, S. Kuwahara, K. Kwak, S. Kwon, H. W. Lee, R. Lee, S. Lee, K. L. Li, L. C. -C. Lin, E. -T. -Lin, Y. -C. Lin, G. C. Liu, K. Maeda, M. Meyer-Conde, Y. Michimura, K. Mitsuhashi, O. Miyakawa, S. Miyoki, S. Morisaki, Y. Moriwaki, M. Murakoshi, K. Nakagaki, K. Nakamura, H. Nakano, T. Narikawa, L. Naticchioni, L. Nguyen Quynh, Y. Nishino, A. Nishizawa, K. Obayashi, M. Ohashi, M. Onishi, K. Oohara, S. Oshino, R. Ozaki, M. A. Page, K. -C. Pan, B. -J. Park, J. Park, F. E. Pena Arellano, N. Ruhama, S. Saha, K. Sakai, Y. Sakai, R. Sato, S. Sato, Y. Sato, Y. Sato, T. Sawada, Y. Sekiguchi, N. Sembo, L. Shao, Z. -H. Shi, R. Shimomura, H. Shinkai, S. Singh, K. Somiya, I. Song, H. Sotani, Y. Sudo, K. Suzuki, M. Suzuki, H. Tagoshi, K. Takada, H. Takahashi, R. Takahashi, A. Takamori, S. Takano, H. Takeda, K. Takeshita, M. Tamaki, K. Tanaka, S. J. Tanaka, A. Taruya, T. Tomaru, T. Tomura, S. Tsuchida, N. Uchikata, T. Uchiyama, T. Uehara, K. Ueno, T. Ushiba, H. Wang, T. Washimi, C. Wu, H. Wu, K. Yamamoto, T. Yamamoto, T. S. Yamamoto, R. Yamazaki, Y. Yang, S. -W. Yeh, J. Yokoyama, T. Yokozawa, H. Yuzurihara, Z. -C. Zhao, X. Zhu, Z. -H. Zhu

Date:2025-08-05 12:42:12

The KAGRA Collaboration has investigated a ten-year upgrade strategy for the KAGRA gravitational wave detector, considering a total of 14 upgrade options that vary in mirror mass, quantum noise reduction techniques, and the quality of cryogenic suspensions. We evaluated the scientific potential of these configurations with a focus on key targets such as parameter estimation of compact binary coalescences, binary neutron star post-merger signals, and continuous gravitational waves. Rather than aiming to improve all science cases uniformly, we prioritized those most sensitive to the detector configuration. Technical feasibility was assessed based on required hardware developments, associated R\&D efforts, cost, and risk. Our study finds that a high-frequency upgrade plan that enhances sensitivity over a broad frequency range above ~200 Hz offers the best balance between scientific return and technical feasibility. Such an upgrade would enable sky localization of binary neutron star mergers at 100 Mpc to better than 0.5 deg$^2$ in a LIGO-Virgo-KAGRA network, and improve the measurement precision of tidal deformability parameter by approximately 10% at median, compared to a network without KAGRA.

Macro-from-Micro Planning for High-Quality and Parallelized Autoregressive Long Video Generation

Authors:Xunzhi Xiang, Yabo Chen, Guiyu Zhang, Zhongyu Wang, Zhe Gao, Quanming Xiang, Gonghu Shang, Junqi Liu, Haibin Huang, Yang Gao, Chi Zhang, Qi Fan, Xuelong Li

Date:2025-08-05 11:21:54

Current autoregressive diffusion models excel at video generation but are generally limited to short temporal durations. Our theoretical analysis indicates that the autoregressive modeling typically suffers from temporal drift caused by error accumulation and hinders parallelization in long video synthesis. To address these limitations, we propose a novel planning-then-populating framework centered on Macro-from-Micro Planning (MMPL) for long video generation. MMPL sketches a global storyline for the entire video through two hierarchical stages: Micro Planning and Macro Planning. Specifically, Micro Planning predicts a sparse set of future keyframes within each short video segment, offering motion and appearance priors to guide high-quality video segment generation. Macro Planning extends the in-segment keyframes planning across the entire video through an autoregressive chain of micro plans, ensuring long-term consistency across video segments. Subsequently, MMPL-based Content Populating generates all intermediate frames in parallel across segments, enabling efficient parallelization of autoregressive generation. The parallelization is further optimized by Adaptive Workload Scheduling for balanced GPU execution and accelerated autoregressive video generation. Extensive experiments confirm that our method outperforms existing long video generation models in quality and stability. Generated videos and comparison results are in our project page.

A Simulation of the Fermilab Main Injector Dual Power Amplifier Cavities

Authors:Susanna Stevenson

Date:2025-08-05 10:46:45

The Fermilab Main Injector accelerating cavities have sparking issues when they are run at voltages higher than those required by the PIP-II project. This is a problem Fermilab is working on as planning begins for the next upgrade to the accelerator complex. One of the methods being used to address the issue is the development of a CST Microwave Studio simulation to accurately model the PIP-II dual power amplifier cavities and identify which part(s) of the cavity is causing sparking to develop. The model will also be used to determine if changes to the cavity geometry may allow the cavity to be used at higher voltages before sparking occurs.

CookBench: A Long-Horizon Embodied Planning Benchmark for Complex Cooking Scenarios

Authors:Muzhen Cai, Xiubo Chen, Yining An, Jiaxin Zhang, Xuesong Wang, Wang Xu, Weinan Zhang, Ting Liu

Date:2025-08-05 09:01:47

Embodied Planning is dedicated to the goal of creating agents capable of executing long-horizon tasks in complex physical worlds. However, existing embodied planning benchmarks frequently feature short-horizon tasks and coarse-grained action primitives. To address this challenge, we introduce CookBench, a benchmark for long-horizon planning in complex cooking scenarios. By leveraging a high-fidelity simulation environment built upon the powerful Unity game engine, we define frontier AI challenges in a complex, realistic environment. The core task in CookBench is designed as a two-stage process. First, in Intention Recognition, an agent needs to accurately parse a user's complex intent. Second, in Embodied Interaction, the agent should execute the identified cooking goal through a long-horizon, fine-grained sequence of physical actions. Unlike existing embodied planning benchmarks, we refine the action granularity to a spatial level that considers crucial operational information while abstracting away low-level robotic control. Besides, We provide a comprehensive toolset that encapsulates the simulator. Its unified API supports both macro-level operations, such as placing orders and purchasing ingredients, and a rich set of fine-grained embodied actions for physical interaction, enabling researchers to focus on high-level planning and decision-making. Furthermore, we present an in-depth analysis of state-of-the-art, closed-source Large Language Model and Vision-Language Model, revealing their major shortcomings and challenges posed by complex, long-horizon tasks. The full benchmark will be open-sourced to facilitate future research.

ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow

Authors:Shanshan Guo, Xiwen Liang, Junfan Lin, Yuzheng Zhuang, Liang Lin, Xiaodan Liang

Date:2025-08-05 08:46:17

Language-instructed robot manipulation has garnered significant interest due to the potential of learning from collected data. While the challenges in high-level perception and planning are continually addressed along the progress of general large pre-trained models, the low precision of low-level action estimation has emerged as the key limiting factor in manipulation performance. To this end, this paper introduces a novel robot manipulation framework, i.e., ActionSink, to pave the way toward precise action estimations in the field of learning-based robot manipulation. As the name suggests, ActionSink reformulates the actions of robots as action-caused optical flows from videos, called "action flow", in a self-supervised manner, which are then used to be retrieved and integrated to enhance the action estimation. Specifically, ActionSink incorporates two primary modules. The first module is a coarse-to-fine action flow matcher, which continuously refines the accuracy of action flow via iterative retrieval and denoising process. The second module is a dynamic action flow integrator, which employs a working memory pool that dynamically and efficiently manages the historical action flows that should be used to integrate to enhance the current action estimation. In this module, a multi-layer fusion module is proposed to integrate direct estimation and action flows from both the current and the working memory, achieving highly accurate action estimation through a series of estimation-integration processes. Our ActionSink framework outperformed prior SOTA on the LIBERO benchmark by a 7.9\% success rate, and obtained nearly an 8\% accuracy gain on the challenging long-horizon visual task LIBERO-Long.

Scalability and Performance Evaluation of IEEE 802.11ah IoT Deployments: A Testbed Approach

Authors:Kostas Chounos, Katerina Kyriakou, Thanasis Korakis

Date:2025-08-05 06:46:31

This work focuses on the development and assessment of modern wireless Internet of Things (IoT) architectures, with relevance to emerging 5G and beyond applications. To analyze the growing demands for data, and their impact, we built an IEEE 802.11ah (WiFi Halow) office testbed for real-world experimentation. This deployment allows us to uncover the practical performance and scalability limitations of such networks under various challenging scenarios. To the best of our knowledge, this is the first study to consider complex real-world IEEE 802.11ah implementations, aiming specifically to reveal unexpected performance behaviors, such as significant throughput degradation arising in closely deployed wireless links. Our findings show that intense network contention and Adjacent Channel Interference (ACI), drastically impact the performance of the wireless links involved. Beyond evaluating network performance, our experimental analysis also considers the energy consumption of the devices under test, offering a more holistic perspective on the feasibility of IEEE 802.11ah in real-world deployments. The effective disclosure of such unexpected phenomena, can lead to well planned decisions and energy consumption optimization across the IoT to Cloud continuum.

Language as Cost: Proactive Hazard Mapping using VLM for Robot Navigation

Authors:Mintaek Oh, Chan Kim, Seung-Woo Seo, Seong-Woo Kim

Date:2025-08-05 06:35:37

Robots operating in human-centric or hazardous environments must proactively anticipate and mitigate dangers beyond basic obstacle detection. Traditional navigation systems often depend on static maps, which struggle to account for dynamic risks, such as a person emerging from a suddenly opening door. As a result, these systems tend to be reactive rather than anticipatory when handling dynamic hazards. Recent advancements in pre-trained large language models and vision-language models (VLMs) create new opportunities for proactive hazard avoidance. In this work, we propose a zero-shot language-as-cost mapping framework that leverages VLMs to interpret visual scenes, assess potential dynamic risks, and assign risk-aware navigation costs preemptively, enabling robots to anticipate hazards before they materialize. By integrating this language-based cost map with a geometric obstacle map, the robot not only identifies existing obstacles but also anticipates and proactively plans around potential hazards arising from environmental dynamics. Experiments in simulated and diverse dynamic environments demonstrate that the proposed method significantly improves navigation success rates and reduces hazard encounters, compared to reactive baseline planners. Code and supplementary materials are available at https://github.com/Taekmino/LaC.

Temperature and wind characteristics of Lenghu site for ventilation and structural design of large telescope enclosure

Authors:Taoran Li, Lu Feng

Date:2025-08-05 06:11:00

In recent years, a significant number of observatories and universities have been planning to construct optical and infrared telescopes at the Lenghu site in Qinghai Province due to the site's excellent seeing and clear night sky fraction. Although astronomical performances of the Lenghu site have been reported in detail by numerous papers, there were few reports showing statistics of temperature and wind characteristics in the traditional way required for the design of steel structures of large astronomical telescopes and enclosures, as well as the ventilation and air conditioning systems of these enclosures. This paper aims to present such new statistical data on temperature and wind conditions at the site, which could be helpful to inform and aid in such design decisions at the Lenghu site

SkeNa: Learning to Navigate Unseen Environments Based on Abstract Hand-Drawn Maps

Authors:Haojun Xu, Jiaqi Xiang, Wu Wei, Jinyu Chen, Linqing Zhong, Linjiang Huang, Hongyu Yang, Si Liu

Date:2025-08-05 03:56:32

A typical human strategy for giving navigation guidance is to sketch route maps based on the environmental layout. Inspired by this, we introduce Sketch map-based visual Navigation (SkeNa), an embodied navigation task in which an agent must reach a goal in an unseen environment using only a hand-drawn sketch map as guidance. To support research for SkeNa, we present a large-scale dataset named SoR, comprising 54k trajectory and sketch map pairs across 71 indoor scenes. In SoR, we introduce two navigation validation sets with varying levels of abstraction in hand-drawn sketches, categorized based on their preservation of spatial scales in the environment, to facilitate future research. To construct SoR, we develop an automated sketch-generation pipeline that efficiently converts floor plans into hand-drawn representations. To solve SkeNa, we propose SkeNavigator, a navigation framework that aligns visual observations with hand-drawn maps to estimate navigation targets. It employs a Ray-based Map Descriptor (RMD) to enhance sketch map valid feature representation using equidistant sampling points and boundary distances. To improve alignment with visual observations, a Dual-Map Aligned Goal Predictor (DAGP) leverages the correspondence between sketch map features and on-site constructed exploration map features to predict goal position and guide navigation. SkeNavigator outperforms prior floor plan navigation methods by a large margin, improving SPL on the high-abstract validation set by 105% relatively. Our code and dataset will be released.

CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction

Authors:Yizhuo Wang, Haodong He, Jingsong Liang, Yuhong Cao, Ritabrata Chakraborty, Guillaume Sartoretti

Date:2025-08-05 03:12:02

Path planning in unknown environments is a crucial yet inherently challenging capability for mobile robots, which primarily encompasses two coupled tasks: autonomous exploration and point-goal navigation. In both cases, the robot must perceive the environment, update its belief, and accurately estimate potential information gain on-the-fly to guide planning. In this work, we propose CogniPlan, a novel path planning framework that leverages multiple plausible layouts predicted by a COnditional GeNerative Inpainting model, mirroring how humans rely on cognitive maps during navigation. These predictions, based on the partially observed map and a set of layout conditioning vectors, enable our planner to reason effectively under uncertainty. We demonstrate strong synergy between generative image-based layout prediction and graph-attention-based path planning, allowing CogniPlan to combine the scalability of graph representations with the fidelity and predictiveness of occupancy maps, yielding notable performance gains in both exploration and navigation. We extensively evaluate CogniPlan on two datasets (hundreds of maps and realistic floor plans), consistently outperforming state-of-the-art planners. We further deploy it in a high-fidelity simulator and on hardware, showcasing its high-quality path planning and real-world applicability.

Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning

Authors:Yutong Wang, Pengliang Ji, Kaixin Li, Baolong Bi, Tao Feng, Guillaume Sartoretti

Date:2025-08-05 02:56:58

Large Language Reasoning Models have demonstrated remarkable success on static tasks, yet their application to multi-round agentic planning in interactive environments faces two fundamental challenges. First, the intractable credit assignment problem renders conventional reinforcement learning ineffective in sparse-reward settings. Second, the computational overhead of verbose, step-by-step reasoning histories is prohibitive. To address these challenges, we propose BPO, a three-stage framework (bootstrapping, extrapolation, and refinement) that establishes a self-improving data flywheel to develop robust reasoning models for long-horizon, sparse-reward environments. Our framework first bootstraps efficient reasoning using the proposed planning quaternions with long-short chain-of-thought fusion. It then extrapolates to out-of-distribution tasks through complexity-stratified curriculum learning. Finally, the model iteratively refines itself by learning exclusively on experiences selected via reward-gated rejection sampling. Experiments on ALFWorld, ScienceWorld, and WebShop demonstrate that our approach achieves state-of-the-art with significant token efficiency, providing a new recipe for reasoning models in agentic planning.

ClinicalFMamba: Advancing Clinical Assessment using Mamba-based Multimodal Neuroimaging Fusion

Authors:Meng Zhou, Farzad Khalvati

Date:2025-08-05 02:25:53

Multimodal medical image fusion integrates complementary information from different imaging modalities to enhance diagnostic accuracy and treatment planning. While deep learning methods have advanced performance, existing approaches face critical limitations: Convolutional Neural Networks (CNNs) excel at local feature extraction but struggle to model global context effectively, while Transformers achieve superior long-range modeling at the cost of quadratic computational complexity, limiting clinical deployment. Recent State Space Models (SSMs) offer a promising alternative, enabling efficient long-range dependency modeling in linear time through selective scan mechanisms. Despite these advances, the extension to 3D volumetric data and the clinical validation of fused images remains underexplored. In this work, we propose ClinicalFMamba, a novel end-to-end CNN-Mamba hybrid architecture that synergistically combines local and global feature modeling for 2D and 3D images. We further design a tri-plane scanning strategy for effectively learning volumetric dependencies in 3D images. Comprehensive evaluations on three datasets demonstrate the superior fusion performance across multiple quantitative metrics while achieving real-time fusion. We further validate the clinical utility of our approach on downstream 2D/3D brain tumor classification tasks, achieving superior performance over baseline methods. Our method establishes a new paradigm for efficient multimodal medical image fusion suitable for real-time clinical deployment.

Integrating Upstream Supply Chains into Generation Expansion Planning

Authors:Boyu Yao, Andrey Bernstein, Yury Dvorkin

Date:2025-08-05 02:06:07

Rising electricity demand underscores the need for secure and reliable generation expansion planning that accounts for upstream supply chain constraints. Traditional models often overlook limitations in materials, manufacturing capacity, lead times for deployment, and field availability, which can delay availability of planned resources and thus to threaten system reliability. This paper introduces a multi-stage supply chain-constrained generation expansion planning (SC-GEP) model that optimizes long-term investments while capturing material availability, production limits, spatial and temporal constraints, and material reuse from retired assets. A decomposition algorithm efficiently solves the resulting MILP. A Maryland case study shows that supply chain constraints shift technology choices, amplify deployment delays caused by lead times, and prompt earlier investment in shorter lead-time, low-material-intensity options. In the low-demand scenario, supply chain constraints raise investment costs by $1.2 billion. Under high demand, persistent generation and reserve shortfalls emerge, underscoring the need to integrate upstream constraints into long-term planning.

Physics-informed Neural Time Fields for Prehensile Object Manipulation

Authors:Hanwen Ren, Ruiqi Ni, Ahmed H. Qureshi

Date:2025-08-05 00:55:28

Object manipulation skills are necessary for robots operating in various daily-life scenarios, ranging from warehouses to hospitals. They allow the robots to manipulate the given object to their desired arrangement in the cluttered environment. The existing approaches to solving object manipulations are either inefficient sampling based techniques, require expert demonstrations, or learn by trial and error, making them less ideal for practical scenarios. In this paper, we propose a novel, multimodal physics-informed neural network (PINN) for solving object manipulation tasks. Our approach efficiently learns to solve the Eikonal equation without expert data and finds object manipulation trajectories fast in complex, cluttered environments. Our method is multimodal as it also reactively replans the robot's grasps during manipulation to achieve the desired object poses. We demonstrate our approach in both simulation and real-world scenarios and compare it against state-of-the-art baseline methods. The results indicate that our approach is effective across various objects, has efficient training compared to previous learning-based methods, and demonstrates high performance in planning time, trajectory length, and success rates. Our demonstration videos can be found at https://youtu.be/FaQLkTV9knI.

Autonomous Inorganic Materials Discovery via Multi-Agent Physics-Aware Scientific Reasoning

Authors:Alireza Ghafarollahi, Markus J. Buehler

Date:2025-08-04 23:40:43

Conventional machine learning approaches accelerate inorganic materials design via accurate property prediction and targeted material generation, yet they operate as single-shot models limited by the latent knowledge baked into their training data. A central challenge lies in creating an intelligent system capable of autonomously executing the full inorganic materials discovery cycle, from ideation and planning to experimentation and iterative refinement. We introduce SparksMatter, a multi-agent AI model for automated inorganic materials design that addresses user queries by generating ideas, designing and executing experimental workflows, continuously evaluating and refining results, and ultimately proposing candidate materials that meet the target objectives. SparksMatter also critiques and improves its own responses, identifies research gaps and limitations, and suggests rigorous follow-up validation steps, including DFT calculations and experimental synthesis and characterization, embedded in a well-structured final report. The model's performance is evaluated across case studies in thermoelectrics, semiconductors, and perovskite oxides materials design. The results demonstrate the capacity of SparksMatter to generate novel stable inorganic structures that target the user's needs. Benchmarking against frontier models reveals that SparksMatter consistently achieves higher scores in relevance, novelty, and scientific rigor, with a significant improvement in novelty across multiple real-world design tasks as assessed by a blinded evaluator. These results demonstrate SparksMatter's unique capacity to generate chemically valid, physically meaningful, and creative inorganic materials hypotheses beyond existing materials knowledge.

Optimal Trajectory Planning in a Vertically Undulating Snake Locomotion using Contact-implicit Optimization

Authors:Adarsh Salagame, Eric Sihite, Alireza Ramezani

Date:2025-08-04 23:21:13

Contact-rich problems, such as snake robot locomotion, offer unexplored yet rich opportunities for optimization-based trajectory and acyclic contact planning. So far, a substantial body of control research has focused on emulating snake locomotion and replicating its distinctive movement patterns using shape functions that either ignore the complexity of interactions or focus on complex interactions with matter (e.g., burrowing movements). However, models and control frameworks that lie in between these two paradigms and are based on simple, fundamental rigid body dynamics, which alleviate the challenging contact and control allocation problems in snake locomotion, remain absent. This work makes meaningful contributions, substantiated by simulations and experiments, in the following directions: 1) introducing a reduced-order model based on Moreau's stepping-forward approach from differential inclusion mathematics, 2) verifying model accuracy, 3) experimental validation.

Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation

Authors:Radhika Dua, Young Joon, Kwon, Siddhant Dogra, Daniel Freedman, Diana Ruan, Motaz Nashawaty, Danielle Rigau, Daniel Alexander Alber, Kang Zhang, Kyunghyun Cho, Eric Karl Oermann

Date:2025-08-04 18:28:03

Radiological imaging is central to diagnosis, treatment planning, and clinical decision-making. Vision-language foundation models have spurred interest in automated radiology report generation (RRG), but safe deployment requires reliable clinical evaluation of generated reports. Existing metrics often rely on surface-level similarity or behave as black boxes, lacking interpretability. We introduce ICARE (Interpretable and Clinically-grounded Agent-based Report Evaluation), an interpretable evaluation framework leveraging large language model agents and dynamic multiple-choice question answering (MCQA). Two agents, each with either the ground-truth or generated report, generate clinically meaningful questions and quiz each other. Agreement on answers captures preservation and consistency of findings, serving as interpretable proxies for clinical precision and recall. By linking scores to question-answer pairs, ICARE enables transparent, and interpretable assessment. Clinician studies show ICARE aligns significantly more with expert judgment than prior metrics. Perturbation analyses confirm sensitivity to clinical content and reproducibility, while model comparisons reveal interpretable error patterns.

Actionable Counterfactual Explanations Using Bayesian Networks and Path Planning with Applications to Environmental Quality Improvement

Authors:Enrique Valero-Leal, Pedro Larrañaga, Concha Bielza

Date:2025-08-04 17:20:50

Counterfactual explanations study what should have changed in order to get an alternative result, enabling end-users to understand machine learning mechanisms with counterexamples. Actionability is defined as the ability to transform the original case to be explained into a counterfactual one. We develop a method for actionable counterfactual explanations that, unlike predecessors, does not directly leverage training data. Rather, data is only used to learn a density estimator, creating a search landscape in which to apply path planning algorithms to solve the problem and masking the endogenous data, which can be sensitive or private. We put special focus on estimating the data density using Bayesian networks, demonstrating how their enhanced interpretability is useful in high-stakes scenarios in which fairness is raising concern. Using a synthetic benchmark comprised of 15 datasets, our proposal finds more actionable and simpler counterfactuals than the current state-of-the-art algorithms. We also test our algorithm with a real-world Environmental Protection Agency dataset, facilitating a more efficient and equitable study of policies to improve the quality of life in United States of America counties. Our proposal captures the interaction of variables, ensuring equity in decisions, as policies to improve certain domains of study (air, water quality, etc.) can be detrimental in others. In particular, the sociodemographic domain is often involved, where we find important variables related to the ongoing housing crisis that can potentially have a severe negative impact on communities.

Nuclear modification factor $I_{AA}$ in $AA$ collisions at RHIC and LHC energies in scenarios with and without quark-gluon plasma formation in $pp$ collisions

Authors:B. G. Zakharov

Date:2025-08-04 17:12:30

We calculate the away-side hadron-triggered modification factor $I_{AA}$ in $AA$ collisions at RHIC and LHC energies for scenarios with and without quark-gluon plasma formation in $pp$ collision. We find that for both scenarios theoretical results for $I_{AA}$ agree well with the available data for 2.76 TeV Pb+Pb and 0.2 TeV Au+Au collisions. We make predictions for $I_{AA}$ in 7 TeV O+O collisions that are planned at the LHC. Our results show that measuring $I_{OO}$ in the whole centrality interval and at small centrality ($\lesssim 5$%) may give information on the presence of jet quenching in $pp$ collisions.

HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research

Authors:Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Liantao Ma, Lequan Yu

Date:2025-08-04 17:08:47

The efficacy of AI agents in healthcare research is hindered by their reliance on static, predefined strategies. This creates a critical limitation: agents can become better tool-users but cannot learn to become better strategic planners, a crucial skill for complex domains like healthcare. We introduce HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its own high-level problem-solving policies by distilling procedural successes and failures into a durable, strategic knowledge base. To anchor our research and facilitate reproducible evaluation, we introduce EHRFlowBench, a new benchmark featuring complex, realistic health data analysis tasks derived from peer-reviewed clinical research. Our comprehensive experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work marks a necessary shift from building better tool-users to designing smarter, self-evolving task-managers, paving the way for more autonomous and effective AI for scientific discovery.

Periodic robust robotic rock chop via virtual model control

Authors:Yi Zhang, Fumiya Iida, Fulvio Forni

Date:2025-08-04 16:56:32

Robotic cutting is a challenging contact-rich manipulation task where the robot must simultaneously negotiate unknown object mechanics, large contact forces, and precise motion requirements. We introduce a new virtual-model control scheme that enables knife rocking motion for robot manipulators, without pre-planned trajectories or precise information of the environment. Motion is generated through interconnection with virtual mechanisms, given by virtual springs, dampers, and masses arranged in a suitable way. Through analysis and experiments, we demonstrate that the controlled robot behavior settles into a periodic motion. Experiments with a Franka manipulator demonstrate robust cuts with five different vegetables, and sub-millimeter slice accuracy from 1 mm to 6 mm at nearly one cut per second. The same controller survives changes in knife shape and cutting board height, and adaptation to a different humanoid manipulator, demonstrating robustness and platform independence.

Intensional FOL over Belnap's Billatice for Strong-AI Robotics

Authors:Zoran Majkic

Date:2025-08-04 16:43:44

AGI (Strong AI) aims to create intelligent robots that are quasi indistinguishable from the human mind. Like a child, the AGI robot would have to learn through input and experiences, constantly progressing and advancing its abilities over time. The AGI robot would require an intelligence more close to human's intelligence: it would have a self-aware consciousness that has the ability to solve problems, learn, and plan. Based on this approach an Intensional many-sorted First-order Logic (IFOL), as an extension of a standard FOL with Tarskian's semantics, is proposed in order to avoid the problems of standard 2-valued FOL with paradoxes (inconsistent formulae) and a necessity for robots to work with incomplete (unknown) knowledge as well. This is a more sophisticated version of IFOL with the same syntax but different semantics, able to deal with truth-ordering and knowledge-ordering as well, based on the well known Belnap's billatice with four truth-values that extend the set of classical two truth-values.

Automated Construction of Artificial Lattice Structures with Designer Electronic States

Authors:Ganesh Narasimha, Mykola Telychko, Wooin Yang, Arthur P. Baddorf, P. Ganesh, An-Ping Li, Rama Vasudevan

Date:2025-08-04 16:38:45

Manipulating matter with a scanning tunneling microscope (STM) enables creation of atomically defined artificial structures that host designer quantum states. However, the time-consuming nature of the manipulation process, coupled with the sensitivity of the STM tip, constrains the exploration of diverse configurations and limits the size of designed features. In this study, we present a reinforcement learning (RL)-based framework for creating artificial structures by spatially manipulating carbon monoxide (CO) molecules on a copper substrate using the STM tip. The automated workflow combines molecule detection and manipulation, employing deep learning-based object detection to locate CO molecules and linear assignment algorithms to allocate these molecules to designated target sites. We initially perform molecule maneuvering based on randomized parameter sampling for sample bias, tunneling current setpoint and manipulation speed. This dataset is then structured into an action trajectory used to train an RL agent. The model is subsequently deployed on the STM for real-time fine-tuning of manipulation parameters during structure construction. Our approach incorporates path planning protocols coupled with active drift compensation to enable atomically precise fabrication of structures with significantly reduced human input while realizing larger-scale artificial lattices with desired electronic properties. To underpin of efficiency of our approach we demonstrate the automated construction of an extended artificial graphene lattice and confirm the existence of characteristic Dirac point in its electronic structure. Further challenges to RL-based structural assembly scalability are discussed.

RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation

Authors:Jierui Qu, Jianchun Zhao

Date:2025-08-04 16:12:06

Accurate whole-heart segmentation is a critical component in the precise diagnosis and interventional planning of cardiovascular diseases. Integrating complementary information from modalities such as computed tomography (CT) and magnetic resonance imaging (MRI) can significantly enhance segmentation accuracy and robustness. However, existing multi-modal segmentation methods face several limitations: severe spatial inconsistency between modalities hinders effective feature fusion; fusion strategies are often static and lack adaptability; and the processes of feature alignment and segmentation are decoupled and inefficient. To address these challenges, we propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment, termed RL-U$^2$Net, designed for precise and efficient multi-modal 3D whole-heart segmentation. The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module between the encoders. The module employs a cross-modal attention mechanism to capture semantic correspondences between modalities and a reinforcement-learning agent learns an optimal rotation strategy that consistently aligns anatomical pose and texture features. The aligned features are then reconstructed through their respective decoders. Finally, an ensemble-learning-based decision module integrates the predictions from individual patches to produce the final segmentation result. Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$^2$Net outperforms existing state-of-the-art methods, achieving Dice coefficients of 93.1% on CT and 87.0% on MRI, thereby validating the effectiveness and superiority of the proposed approach.