planning - 2025-06-10

Realistic Urban Traffic Generator using Decentralized Federated Learning for the SUMO simulator

Authors:Alberto Bazán-Guillén, Carlos Beis-Penedo, Diego Cajaraville-Aboy, Pablo Barbecho-Bautista, Rebeca P. Díaz-Redondo, Luis J. de la Cruz Llopis, Ana Fernández-Vilas, Mónica Aguilar Igartua, Manuel Fernández-Veiga
Date:2025-06-09 17:51:45

Realistic urban traffic simulation is essential for sustainable urban planning and the development of intelligent transportation systems. However, generating high-fidelity, time-varying traffic profiles that accurately reflect real-world conditions, especially in large-scale scenarios, remains a major challenge. Existing methods often suffer from limitations in accuracy, scalability, or raise privacy concerns due to centralized data processing. This work introduces DesRUTGe (Decentralized Realistic Urban Traffic Generator), a novel framework that integrates Deep Reinforcement Learning (DRL) agents with the SUMO simulator to generate realistic 24-hour traffic patterns. A key innovation of DesRUTGe is its use of Decentralized Federated Learning (DFL), wherein each traffic detector and its corresponding urban zone function as an independent learning node. These nodes train local DRL models using minimal historical data and collaboratively refine their performance by exchanging model parameters with selected peers (e.g., geographically adjacent zones), without requiring a central coordinator. Evaluated using real-world data from the city of Barcelona, DesRUTGe outperforms standard SUMO-based tools such as RouteSampler, as well as other centralized learning approaches, by delivering more accurate and privacy-preserving traffic pattern generation.

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Authors:Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar
Date:2025-06-09 17:50:02

The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents. We first show that even prompting-based interaction scaling without any training can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their rollout lengths. Using a Gemma 3 12B model, TTI produces state-of-the-art open-source, open-data web agents on WebVoyager and WebArena benchmarks. We further show that TTI enables agents to balance exploration and exploitation adaptively. Our results establish interaction scaling as a powerful, complementary axis to scaling per-step compute, offering new avenues for training adaptive agents.

CrosswalkNet: An Optimized Deep Learning Framework for Pedestrian Crosswalk Detection in Aerial Images with High-Performance Computing

Authors:Zubin Bhuyan, Yuanchang Xie, AngkeaReach Rith, Xintong Yan, Nasko Apostolov, Jimi Oke, Chengbo Ai
Date:2025-06-09 15:56:24

With the increasing availability of aerial and satellite imagery, deep learning presents significant potential for transportation asset management, safety analysis, and urban planning. This study introduces CrosswalkNet, a robust and efficient deep learning framework designed to detect various types of pedestrian crosswalks from 15-cm resolution aerial images. CrosswalkNet incorporates a novel detection approach that improves upon traditional object detection strategies by utilizing oriented bounding boxes (OBB), enhancing detection precision by accurately capturing crosswalks regardless of their orientation. Several optimization techniques, including Convolutional Block Attention, a dual-branch Spatial Pyramid Pooling-Fast module, and cosine annealing, are implemented to maximize performance and efficiency. A comprehensive dataset comprising over 23,000 annotated crosswalk instances is utilized to train and validate the proposed framework. The best-performing model achieves an impressive precision of 96.5% and a recall of 93.3% on aerial imagery from Massachusetts, demonstrating its accuracy and effectiveness. CrosswalkNet has also been successfully applied to datasets from New Hampshire, Virginia, and Maine without transfer learning or fine-tuning, showcasing its robustness and strong generalization capability. Additionally, the crosswalk detection results, processed using High-Performance Computing (HPC) platforms and provided in polygon shapefile format, have been shown to accelerate data processing and detection, supporting real-time analysis for safety and mobility applications. This integration offers policymakers, transportation engineers, and urban planners an effective instrument to enhance pedestrian safety and improve urban mobility.

A distributed motion planning approach to cooperative underwater acoustic source tracking and pursuit

Authors:Andrea Tiranti, Francesco Wanderlingh, Enrico Simetti, Marco Baglietto, Giovanni Indiveri, Antonio Pascoal
Date:2025-06-09 15:49:36

This paper addresses the problem of underwater acoustic source tracking and pursuit with a team of autonomous underwater vehicles. Producing distributed control strategies in an underwater sensor network is not trivial since communication is primarily acoustic, which makes it intermittent and often plagued with major difficulties. For this reason, we propose an optimization scheme based on a Partially Observable Markov Decision Process for improving the performance of underwater mobile sensor networks, in which autonomous underwater vehicles (agents) play the role of moving nodes of a network. The key idea is to adjust the agents' guidance strategies to achieve coordinated motion planning, enabling optimal geometric configurations between the agents and the target to enhance tracking performance. Such a problem is cast as a multi-objective optimization problem that is solved through a receding horizon lookahead optimization scheme since we are interested in long-term tracking accuracy. The planning strategy is distributed using the sequential multi-agent decision-making paradigm to make the solving tractable since the optimization depends on the joint action domain. A distributed control framework has been implemented in a simulation environment to validate the proposed approach, which explicitly accounts for the major limitations imposed by acoustic communications.

F2Net: A Frequency-Fused Network for Ultra-High Resolution Remote Sensing Segmentation

Authors:Hengzhi Chen, Liqian Feng, Wenhua Wu, Xiaogang Zhu, Shawn Leo, Kun Hu
Date:2025-06-09 15:09:49

Semantic segmentation of ultra-high-resolution (UHR) remote sensing imagery is critical for applications like environmental monitoring and urban planning but faces computational and optimization challenges. Conventional methods either lose fine details through downsampling or fragment global context via patch processing. While multi-branch networks address this trade-off, they suffer from computational inefficiency and conflicting gradient dynamics during training. We propose F2Net, a frequency-aware framework that decomposes UHR images into high- and low-frequency components for specialized processing. The high-frequency branch preserves full-resolution structural details, while the low-frequency branch processes downsampled inputs through dual sub-branches capturing short- and long-range dependencies. A Hybrid-Frequency Fusion module integrates these observations, guided by two novel objectives: Cross-Frequency Alignment Loss ensures semantic consistency between frequency components, and Cross-Frequency Balance Loss regulates gradient magnitudes across branches to stabilize training. Evaluated on DeepGlobe and Inria Aerial benchmarks, F2Net achieves state-of-the-art performance with mIoU of 80.22 and 83.39, respectively. Our code will be publicly available.

Accelerating Diffusion Models in Offline RL via Reward-Aware Consistency Trajectory Distillation

Authors:Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J. Zico Kolter, Jeff Schneider
Date:2025-06-09 14:48:19

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While the consistency model offers a potential solution, its applications to decision-making often struggle with suboptimal demonstrations or rely on complex concurrent training of multiple networks. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method enables single-step generation while maintaining higher performance and simpler training. Empirical evaluations on the Gym MuJoCo benchmarks and long horizon planning demonstrate that our approach can achieve an 8.7% improvement over previous state-of-the-art while offering up to 142x speedup over diffusion counterparts in inference time.

SMaRCSim: Maritime Robotics Simulation Modules

Authors:Mart Kartašev, David Dörner, Özer Özkahraman, Petter Ögren, Ivan Stenius, John Folkesson
Date:2025-06-09 13:57:32

Developing new functionality for underwater robots and testing them in the real world is time-consuming and resource-intensive. Simulation environments allow for rapid testing before field deployment. However, existing tools lack certain functionality for use cases in our project: i) developing learning-based methods for underwater vehicles; ii) creating teams of autonomous underwater, surface, and aerial vehicles; iii) integrating the simulation with mission planning for field experiments. A holistic solution to these problems presents great potential for bringing novel functionality into the underwater domain. In this paper we present SMaRCSim, a set of simulation packages that we have developed to help us address these issues.

Decoding Saccadic Eye Movements from Brain Signals Using an Endovascular Neural Interface

Authors:Suleman Rasheed, James Bennett, Peter E. Yoo, Anthony N. Burkitt, David B. Grayden
Date:2025-06-09 06:56:47

An Oculomotor Brain-Computer Interface (BCI) records neural activity from regions of the brain involved in planning eye movements and translates this activity into control commands. While previous successful oculomotor BCI studies primarily relied on invasive microelectrode implants in non-human primates, this study investigates the feasibility of an oculomotor BCI using a minimally invasive endovascular Stentrode device implanted near the supplementary motor area in a patient with amyotrophic lateral sclerosis (ALS). To achieve this, self-paced visually-guided and free-viewing saccade tasks were designed, in which the participant performed saccades in four directions (left, right, up, down), with simultaneous recording of endovascular EEG and eye gaze. The visually guided saccades were cued with visual stimuli, whereas the free-viewing saccades were self-directed without explicit cues. The results showed that while the neural responses of visually guided saccades overlapped with the cue-evoked potentials, the free-viewing saccades exhibited distinct saccade-related potentials that began shortly before eye movement, peaked approximately 50 ms after saccade onset, and persisted for around 200 ms. In the frequency domain, these responses appeared as a low-frequency synchronisation below 15 Hz. Classification of 'fixation vs. saccade' was robust, achieving mean area under the receiver operating characteristic curve (AUC) scores of 0.88 within sessions and 0.86 between sessions. In contrast, classifying saccade direction proved more challenging, yielding within-session AUC scores of 0.67 for four-class decoding and up to 0.75 for the best-performing binary comparisons (left vs. up and left vs. down). This proof-of-concept study demonstrates the feasibility of an endovascular oculomotor BCI in an ALS patient, establishing a foundation for future oculomotor BCI studies in human subjects.

Text-guided multi-stage cross-perception network for medical image segmentation

Authors:Gaoyu Chen
Date:2025-06-09 06:50:15

Medical image segmentation plays a crucial role in clinical medicine, serving as a tool for auxiliary diagnosis, treatment planning, and disease monitoring, thus facilitating physicians in the study and treatment of diseases. However, existing medical image segmentation methods are limited by the weak semantic expression of the target segmentation regions, which is caused by the low contrast between the target and non-target segmentation regions. To address this limitation, text prompt information has greast potential to capture the lesion location. However, existing text-guided methods suffer from insufficient cross-modal interaction and inadequate cross-modal feature expression. To resolve these issues, we propose the Text-guided Multi-stage Cross-perception network (TMC). In TMC, we introduce a multistage cross-attention module to enhance the model's understanding of semantic details and a multi-stage alignment loss to improve the consistency of cross-modal semantics. The results of the experiments demonstrate that our TMC achieves a superior performance with Dice of 84.77%, 78.50%, 88.73% in three public datasets (QaTa-COV19, MosMedData and Breast), outperforming UNet based networks and text-guided methods.

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models

Authors:Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques
Date:2025-06-09 06:35:12

Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities. This sequential approach creates a mismatch -- attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats. To address this, we propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction. We cast safety alignment as a two-player zero-sum game, where a single model alternates between attacker and defender roles -- generating adversarial prompts and safeguarding against them -- while a reward LM adjudicates outcomes. This enables dynamic co-adaptation. Grounded in the game-theoretic framework of zero-sum games, we establish a theoretical safety guarantee which motivates the design of our method: if self-play converges to a Nash Equilibrium, the defender will reliably produce safe responses to any adversarial input. Empirically, Self-RedTeam uncovers more diverse attacks (+21.8% SBERT) compared to attackers trained against static defenders and achieves higher robustness on safety benchmarks (e.g., +65.5% on WildJailBreak) than defenders trained against static attackers. We further propose hidden Chain-of-Thought, allowing agents to plan privately, which boosts adversarial diversity and reduces over-refusals. Our results motivate a shift from reactive patching to proactive co-evolution in LM safety training, enabling scalable, autonomous, and robust self-improvement of LMs via multi-agent reinforcement learning (MARL).

Addressing tokens dynamic generation, propagation, storage and renewal to secure the GlideinWMS pilot based jobs and system

Authors:Bruno Moreira Coimbra, Marco Mambelli
Date:2025-06-09 02:58:42

GlideinWMS has been one of the first middleware in the WLCG community to transition from X.509 to support also tokens. The first step was to get from the prototype in 2019 to using tokens in production in 2022. This paper will present the challenges introduced by the wider adoption of tokens and the evolution plans for securing the pilot infrastructure of GlideinWMS and supporting the new requirements. In the last couple of years, the GlideinWMS team supported the migration of experiments and resources to tokens. Inadequate support in the current infrastructure, more stringent requirements, and the higher spatial and temporal granularity forced GlideinWMS to revisit once more how credentials are generated, used, and propagated. The new credential modules have been designed to be used in multiple systems (GlideinWMS, HEPCloud) and use a model where credentials have type, purpose, and different flows. Credentials are dynamically generated in order to customize the duration and limit the scope to the targeted resource. This allows to enforce the least privilege principle. Finally, we also considered adding credential storage, renewal, and invalidation mechanisms within the GlideinWMS infrastructure to better serve the experiments' needs.

Digital Twin-based Smart Manufacturing: Dynamic Line Reconfiguration for Disturbance Handling

Authors:Bo Fu, Mingjie Bi, Shota Umeda, Takahiro Nakano, Youichi Nonaka, Quan Zhou, Takaharu Matsui, Dawn M. Tilbury, Kira Barton
Date:2025-06-09 00:16:52

The increasing complexity of modern manufacturing, coupled with demand fluctuation, supply chain uncertainties, and product customization, underscores the need for manufacturing systems that can flexibly update their configurations and swiftly adapt to disturbances. However, current research falls short in providing a holistic reconfigurable manufacturing framework that seamlessly monitors system disturbances, optimizes alternative line configurations based on machine capabilities, and automates simulation evaluation for swift adaptations. This paper presents a dynamic manufacturing line reconfiguration framework to handle disturbances that result in operation time changes. The framework incorporates a system process digital twin for monitoring disturbances and triggering reconfigurations, a capability-based ontology model capturing available agent and resource options, a configuration optimizer generating optimal line configurations, and a simulation generation program initializing simulation setups and evaluating line configurations at approximately 400x real-time speed. A case study of a battery production line has been conducted to evaluate the proposed framework. In two implemented disturbance scenarios, the framework successfully recovers system throughput with limited resources, preventing the 26% and 63% throughput drops that would have occurred without a reconfiguration plan. The reconfiguration optimizer efficiently finds optimal solutions, taking an average of 0.03 seconds to find a reconfiguration plan for a manufacturing line with 51 operations and 40 available agents across 8 agent types.

BR-MPPI: Barrier Rate guided MPPI for Enforcing Multiple Inequality Constraints with Learned Signed Distance Field

Authors:Hardik Parwana, Taekyung Kim, Kehan Long, Bardh Hoxha, Hideki Okamoto, Georgios Fainekos, Dimitra Panagou
Date:2025-06-08 23:45:14

Model Predictive Path Integral (MPPI) controller is used to solve unconstrained optimal control problems and Control Barrier Function (CBF) is a tool to impose strict inequality constraints, a.k.a, barrier constraints. In this work, we propose an integration of these two methods that employ CBF-like conditions to guide the control sampling procedure of MPPI. CBFs provide an inequality constraint restricting the rate of change of barrier functions by a classK function of the barrier itself. We instead impose the CBF condition as an equality constraint by choosing a parametric linear classK function and treating this parameter as a state in an augmented system. The time derivative of this parameter acts as an additional control input that is designed by MPPI. A cost function is further designed to reignite Nagumo's theorem at the boundary of the safe set by promoting specific values of classK parameter to enforce safety. Our problem formulation results in an MPPI subject to multiple state and control-dependent equality constraints which are non-trivial to satisfy with randomly sampled control inputs. We therefore also introduce state transformations and control projection operations, inspired by the literature on path planning for manifolds, to resolve the aforementioned issue. We show empirically through simulations and experiments on quadrotor that our proposed algorithm exhibits better sampled efficiency and enhanced capability to operate closer to the safe set boundary over vanilla MPPI.

IDEIA: A Generative AI-Based System for Real-Time Editorial Ideation in Digital Journalism

Authors:Victor B. Santos, Cauã O. Jordão, Leonardo J. O. Ibiapina, Gabriel M. Silva, Mirella E. B. Santana, Matheus A. Garrido, Lucas R. C. Farias
Date:2025-06-08 20:46:01

This paper presents IDEIA (Intelligent Engine for Editorial Ideation and Assistance), a generative AI-powered system designed to optimize the journalistic ideation process by combining real-time trend analysis with automated content suggestion. Developed in collaboration with the Sistema Jornal do Commercio de Comunica\c{c}\~ao (SJCC), the largest media conglomerate in Brazil's North and Northeast regions, IDEIA integrates the Google Trends API for data-driven topic monitoring and the Google Gemini API for the generation of context-aware headlines and summaries. The system adopts a modular architecture based on Node.js, React, and PostgreSQL, supported by Docker containerization and a CI/CD pipeline using GitHub Actions and Vercel. Empirical results demonstrate a significant reduction in the time and cognitive effort required for editorial planning, with reported gains of up to 70\% in the content ideation stage. This work contributes to the field of computational journalism by showcasing how intelligent automation can enhance productivity while maintaining editorial quality. It also discusses the technical and ethical implications of incorporating generative models into newsroom workflows, highlighting scalability and future applicability across sectors beyond journalism.

A Narrative Review on Large AI Models in Lung Cancer Screening, Diagnosis, and Treatment Planning

Authors:Jiachen Zhong, Yiting Wang, Di Zhu, Ziwei Wang
Date:2025-06-08 17:42:24

Lung cancer remains one of the most prevalent and fatal diseases worldwide, demanding accurate and timely diagnosis and treatment. Recent advancements in large AI models have significantly enhanced medical image understanding and clinical decision-making. This review systematically surveys the state-of-the-art in applying large AI models to lung cancer screening, diagnosis, prognosis, and treatment. We categorize existing models into modality-specific encoders, encoder-decoder frameworks, and joint encoder architectures, highlighting key examples such as CLIP, BLIP, Flamingo, BioViL-T, and GLoRIA. We further examine their performance in multimodal learning tasks using benchmark datasets like LIDC-IDRI, NLST, and MIMIC-CXR. Applications span pulmonary nodule detection, gene mutation prediction, multi-omics integration, and personalized treatment planning, with emerging evidence of clinical deployment and validation. Finally, we discuss current limitations in generalizability, interpretability, and regulatory compliance, proposing future directions for building scalable, explainable, and clinically integrated AI systems. Our review underscores the transformative potential of large AI models to personalize and optimize lung cancer care.

SAP-Bench: Benchmarking Multimodal Large Language Models in Surgical Action Planning

Authors:Mengya Xu, Zhongzhen Huang, Dillan Imans, Yiru Ye, Xiaofan Zhang, Qi Dou
Date:2025-06-08 15:30:04

Effective evaluation is critical for driving advancements in MLLM research. The surgical action planning (SAP) task, which aims to generate future action sequences from visual inputs, demands precise and sophisticated analytical capabilities. Unlike mathematical reasoning, surgical decision-making operates in life-critical domains and requires meticulous, verifiable processes to ensure reliability and patient safety. This task demands the ability to distinguish between atomic visual actions and coordinate complex, long-horizon procedures, capabilities that are inadequately evaluated by current benchmarks. To address this gap, we introduce SAP-Bench, a large-scale, high-quality dataset designed to enable multimodal large language models (MLLMs) to perform interpretable surgical action planning. Our SAP-Bench benchmark, derived from the cholecystectomy procedures context with the mean duration of 1137.5s, and introduces temporally-grounded surgical action annotations, comprising the 1,226 clinically validated action clips (mean duration: 68.7s) capturing five fundamental surgical actions across 74 procedures. The dataset provides 1,152 strategically sampled current frames, each paired with the corresponding next action as multimodal analysis anchors. We propose the MLLM-SAP framework that leverages MLLMs to generate next action recommendations from the current surgical scene and natural language instructions, enhanced with injected surgical domain knowledge. To assess our dataset's effectiveness and the broader capabilities of current models, we evaluate seven state-of-the-art MLLMs (e.g., OpenAI-o1, GPT-4o, QwenVL2.5-72B, Claude-3.5-Sonnet, GeminiPro2.5, Step-1o, and GLM-4v) and reveal critical gaps in next action prediction performance.

Regularized Adaptive Graph Learning for Large-Scale Traffic Forecasting

Authors:Kaiqi Wu, Weiyang Kong, Sen Zhang, Yubao Liu, Zitong Chen
Date:2025-06-08 14:58:27

Traffic prediction is a critical task in spatial-temporal forecasting with broad applications in travel planning and urban management. Adaptive graph convolution networks have emerged as mainstream solutions due to their ability to learn node embeddings in a data-driven manner and capture complex latent dependencies. However, existing adaptive graph learning methods for traffic forecasting often either ignore the regularization of node embeddings, which account for a significant proportion of model parameters, or face scalability issues from expensive graph convolution operations. To address these challenges, we propose a Regularized Adaptive Graph Learning (RAGL) model. First, we introduce a regularized adaptive graph learning framework that synergizes Stochastic Shared Embedding (SSE) and adaptive graph convolution via a residual difference mechanism, achieving both embedding regularization and noise suppression. Second, to ensure scalability on large road networks, we develop the Efficient Cosine Operator (ECO), which performs graph convolution based on the cosine similarity of regularized embeddings with linear time complexity. Extensive experiments on four large-scale real-world traffic datasets show that RAGL consistently outperforms state-of-the-art methods in terms of prediction accuracy and exhibits competitive computational efficiency.

Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization

Authors:Zhican Wang, Guanghui He, Dantong Liu, Lingjun Gao, Shell Xu Hu, Chen Zhang, Zhuoran Song, Nicholas Lane, Wayne Luk, Hongxiang Fan
Date:2025-06-08 10:14:54

3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis, making it widely adopted in fields such as AR/VR, robotics, and autonomous driving. Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets. This paper presents an architecture-algorithm co-design to address these inefficiencies. First, we reveal substantial redundancy caused by repeated computation of common terms/expressions during the conventional rasterization. To resolve this, we propose axis-oriented rasterization, which pre-computes and reuses shared terms along both the X and Y axes through a dedicated hardware design, effectively reducing multiply-and-add (MAC) operations by up to 63%. Second, by identifying the resource and performance inefficiency of the sorting process, we introduce a novel neural sorting approach that predicts order-independent blending weights using an efficient neural network, eliminating the need for costly hardware sorters. A dedicated training framework is also proposed to improve its algorithmic stability. Third, to uniformly support rasterization and neural network inference, we design an efficient reconfigurable processing array that maximizes hardware utilization and throughput. Furthermore, we introduce a $\pi$-trajectory tile schedule, inspired by Morton encoding and Hilbert curve, to optimize Gaussian reuse and reduce memory access overhead. Comprehensive experiments demonstrate that the proposed design preserves rendering quality while achieving a speedup of $23.4\sim27.8\times$ and energy savings of $28.8\sim51.4\times$ compared to edge GPUs for real-world scenes. We plan to open-source our design to foster further development in this field.

Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalanced Regression

Authors:Yung-Chien Wang, Kuang-Da Wang, Wei-Yao Wang, Wen-Chih Peng
Date:2025-06-08 07:53:03

Tabular data serve as a fundamental and ubiquitous representation of structured information in numerous real-world applications, e.g., finance and urban planning. In the realm of tabular imbalanced applications, data imbalance has been investigated in classification tasks with insufficient instances in certain labels, causing the model's ineffective generalizability. However, the imbalance issue of tabular regression tasks is underexplored, and yet is critical due to unclear boundaries for continuous labels and simplifying assumptions in existing imbalance regression work, which often rely on known and balanced test distributions. Such assumptions may not hold in practice and can lead to performance degradation. To address these issues, we propose MATI: Mixture Experts with Test-Time Self-Supervised Aggregation for Tabular Imbalance Regression, featuring two key innovations: (i) the Region-Aware Mixture Expert, which adopts a Gaussian Mixture Model to capture the underlying related regions. The statistical information of each Gaussian component is then used to synthesize and train region-specific experts to capture the unique characteristics of their respective regions. (ii) Test-Time Self-Supervised Expert Aggregation, which dynamically adjusts region expert weights based on test data features to reinforce expert adaptation across varying test distributions. We evaluated MATI on four real-world tabular imbalance regression datasets, including house pricing, bike sharing, and age prediction. To reflect realistic deployment scenarios, we adopted three types of test distributions: a balanced distribution with uniform target frequencies, a normal distribution that follows the training data, and an inverse distribution that emphasizes rare target regions. On average across these three test distributions, MATI achieved a 7.1% improvement in MAE compared to existing methods.

Optimizing rake-links independently of timetables in railway operations

Authors:Sourav Dey
Date:2025-06-08 07:20:45

This study addresses optimal rake-link formation in large-scale timetabled rail operations by modeling the problem as a directed acyclic graph and solving it via the minimum path cover algorithm. It enables efficient rake-to-service assignment while minimizing fleet size. Crucially, it decouples rake-link optimization from the timetable planning process, allowing planners to evaluate feasible rake configurations independently. The model incorporates operational constraints such as deadhead limits, service balance, and slack allowances. Applied to real-world data from Indian Railways, the results reveal clustered Pareto fronts in the decision space, indicating robust and redundant solutions. The approach lays a foundation for resilient, adaptive rail management via digital twin systems.

Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments

Authors:Riley Simmons-Edler, Ryan P. Badman, Felix Baastad Berg, Raymond Chua, John J. Vastola, Joshua Lunger, William Qian, Kanaka Rajan
Date:2025-06-08 03:43:48

Understanding the behavior of deep reinforcement learning (DRL) agents -- particularly as task and agent sophistication increase -- requires more than simple comparison of reward curves, yet standard methods for behavioral analysis remain underdeveloped in DRL. We apply tools from neuroscience and ethology to study DRL agents in a novel, complex, partially observable environment, ForageWorld, designed to capture key aspects of real-world animal foraging -- including sparse, depleting resource patches, predator threats, and spatially extended arenas. We use this environment as a platform for applying joint behavioral and neural analysis to agents, revealing detailed, quantitatively grounded insights into agent strategies, memory, and planning. Contrary to common assumptions, we find that model-free RNN-based DRL agents can exhibit structured, planning-like behavior purely through emergent dynamics -- without requiring explicit memory modules or world models. Our results show that studying DRL agents like animals -- analyzing them with neuroethology-inspired tools that reveal structure in both behavior and neural dynamics -- uncovers rich structure in their learning dynamics that would otherwise remain invisible. We distill these tools into a general analysis framework linking core behavioral and representational features to diagnostic methods, which can be reused for a wide range of tasks and agents. As agents grow more complex and autonomous, bridging neuroscience, cognitive science, and AI will be essential -- not just for understanding their behavior, but for ensuring safe alignment and maximizing desirable behaviors that are hard to measure via reward. We show how this can be done by drawing on lessons from how biological intelligence is studied.

Graph Neural Networks in Modern AI-aided Drug Discovery

Authors:Odin Zhang, Haitao Lin, Xujun Zhang, Xiaorui Wang, Zhenxing Wu, Qing Ye, Weibo Zhao, Jike Wang, Kejun Ying, Yu Kang, Chang-yu Hsieh, Tingjun Hou
Date:2025-06-07 20:29:59

Graph neural networks (GNNs), as topology/structure-aware models within deep learning, have emerged as powerful tools for AI-aided drug discovery (AIDD). By directly operating on molecular graphs, GNNs offer an intuitive and expressive framework for learning the complex topological and geometric features of drug-like molecules, cementing their role in modern molecular modeling. This review provides a comprehensive overview of the methodological foundations and representative applications of GNNs in drug discovery, spanning tasks such as molecular property prediction, virtual screening, molecular generation, biomedical knowledge graph construction, and synthesis planning. Particular attention is given to recent methodological advances, including geometric GNNs, interpretable models, uncertainty quantification, scalable graph architectures, and graph generative frameworks. We also discuss how these models integrate with modern deep learning approaches, such as self-supervised learning, multi-task learning, meta-learning and pre-training. Throughout this review, we highlight the practical challenges and methodological bottlenecks encountered when applying GNNs to real-world drug discovery pipelines, and conclude with a discussion on future directions.

RF-Source Seeking with Obstacle Avoidance using Real-time Modified Artificial Potential Fields in Unknown Environments

Authors:Shahid Mohammad Mulla, Aryan Kanakapudi, Lakshmi Narasimhan, Anuj Tiwari
Date:2025-06-07 14:20:58

Navigation of UAVs in unknown environments with obstacles is essential for applications in disaster response and infrastructure monitoring. However, existing obstacle avoidance algorithms, such as Artificial Potential Field (APF) are unable to generalize across environments with different obstacle configurations. Furthermore, the precise location of the final target may not be available in applications such as search and rescue, in which case approaches such as RF source seeking can be used to align towards the target location. This paper proposes a real-time trajectory planning method, which involves real-time adaptation of APF through a sampling-based approach. The proposed approach utilizes only the bearing angle of the target without its precise location, and adjusts the potential field parameters according to the environment with new obstacle configurations in real time. The main contributions of the article are i) an RF source seeking algorithm to provide a bearing angle estimate using RF signal calculations based on antenna placement, and ii) a modified APF for adaptable collision avoidance in changing environments, which are evaluated separately in the simulation software Gazebo, using ROS2 for communication. Simulation results show that the RF source-seeking algorithm achieves high accuracy, with an average angular error of just 1.48 degrees, and with this estimate, the proposed navigation algorithm improves the success rate of reaching the target by 46% and reduces the trajectory length by 1.2% compared to standard potential fields.

Spatial Disparities in Fire Shelter Accessibility: Capacity Challenges in the Palisades and Eaton Fires

Authors:Su Yeon Han, Yubin Lee, Jooyoung Yoo, Jeon-Young Kang, Jinwoo Park, Soe W. Myint, Eunsang Cho, Xin Gu, Joon-Seok Kim
Date:2025-06-07 13:55:27

The increasing frequency and severity of wildfire in California, exacerbated by prolonged drought and environmental changes, pose significant challenges to urban community resilience and equitable emergency response. The study investigates issues of accessibility to shelters during the Palisades and Eaton Fires which started in January 2025 in Southern California that led to over 180,000 displacements and the loss of 16,000 structures. Despite coordinated efforts of many organizations' emergency assistance, shelter shortages left many evacuees without safety or accessible refuge. This research aims to measure shelter accessibility during the fires' peak, evaluate whether existing shelter capacity met the demand, and identify spatial disparities in access. Results reveal severe shelter shortages and pronounced inequities in access to shelters, particularly in geographically isolated regions and mountainous areas. Our simulations of shelter placement strategies using a capacity-based algorithm and a proximity-based approach demonstrate potential improvements in both shelter accessibility and equitable access to shelters. The findings underscore the critical need for strategic shelter planning and infrastructure development to enhance disaster readiness and reduce vulnerability in regions that frequently experience wildfires.

Depth-Optimal Quantum Layout Synthesis as SAT

Authors:Anna B. Jakobsen, Anders B. Clausen, Jaco van de Pol, Irfansha Shaik
Date:2025-06-07 10:47:58

Quantum circuits consist of gates applied to qubits. Current quantum hardware platforms impose connectivity restrictions on binary CX gates. Hence, Layout Synthesis is an important step to transpile quantum circuits before they can be executed. Since CX gates are noisy, it is important to reduce the CX count or CX depth of the mapped circuits. We provide a new and efficient encoding of Quantum-circuit Layout Synthesis in SAT. Previous SAT encodings focused on gate count and CX-gate count. Our encoding instead guarantees that we find mapped circuits with minimal circuit depth or minimal CX-gate depth. We use incremental SAT solving and parallel plans for an efficient encoding. This results in speedups of more than 10-100x compared to OLSQ2, which guarantees depth-optimality. But minimizing depth still takes more time than minimizing gate count with Q-Synth. We correlate the noise reduction achieved by simulating circuits after (CX)-count and (CX)-depth reduction. We find that minimizing for CX-count correlates better with reducing noise than minimizing for CX-depth. However, taking into account both CX-count and CX-depth provides the best noise reduction.

Integrating AI Planning Semantics into SysML System Models for Automated PDDL File Generation

Authors:Hamied Nabizada, Tom Jeleniewski, Lasse Beers, Maximilian Weigand, Felix Gehlhoff, Alexander Fay
Date:2025-06-07 08:46:14

This paper presents a SysML profile that enables the direct integration of planning semantics based on the Planning Domain Definition Language (PDDL) into system models. Reusable stereotypes are defined for key PDDL concepts such as types, predicates, functions and actions, while formal OCL constraints ensure syntactic consistency. The profile was derived from the Backus-Naur Form (BNF) definition of PDDL 3.1 to align with SysML modeling practices. A case study from aircraft manufacturing demonstrates the application of the profile: a robotic system with interchangeable end effectors is modeled and enriched to generate both domain and problem descriptions in PDDL format. These are used as input to a PDDL solver to derive optimized execution plans. The approach supports automated and model-based generation of planning descriptions and provides a reusable bridge between system modeling and AI planning in engineering design.

High count rate effects in event processing for XRISM/Resolve x-ray microcalorimeter: II. Energy scale and resolution in orbit

Authors:Misaki Mizumoto, Yoshiaki Kanemaru, Shinya Yamada, Caroline A. Kilbourne, Megan E. Eckart, Edmund Hodges-Kluck, Yoshitaka Ishisaki, Frederick S. Porter, Katja Pottschmidt, Tsubasa Tamba
Date:2025-06-07 07:15:52

The Resolve instrument on the X-ray Imaging and Spectroscopy Mission (XRISM) uses a 36-pixel microcalorimeter designed to deliver high-resolution, non-dispersive X-ray spectroscopy. Although it is optimized for extended sources with low count rates, Resolve observations of bright point sources are still able to provide unique insights into the physics of these objects, as long as high count rate effects are addressed in the analysis. These effects include {the loss of exposure time for each pixel}, change on the energy scale, and change on the energy resolution. To investigate these effects under realistic observational conditions, we observed the bright X-ray source, the Crab Nebula, with XRISM at several offset positions with respect to the Resolve field of view and with continuous illumination from {$^{55}$Fe sources} on the filter wheel. For the spectral analysis, we excluded data where exposure time loss was too significant to ensure reliable spectral statistics. The energy scale at 6 keV shows a slight negative shift in the high-count-rate regime. The energy resolution at 6 keV worsens as the count rate in electrically neighboring pixels increases, but can be restored by applying a nearest-neighbor coincidence cut (``cross-talk cut''). We examined how these effects influence the observation of bright point sources, using GX 13+1 as a test case, and identified an eV-scale energy offset at 6 keV between the inner (brighter) and outer (fainter) pixels. Users who seek to analyze velocity structures on the order of tens of km~s$^{-1}$ should account for such high count rate effects. These findings will aid in the interpretation of Resolve data from bright sources and provide valuable considerations for designing and planning for future microcalorimeter missions.

SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

Authors:Hao Wang, Chengkai Hou, Xianglong Li, Yankai Fu, Chenxuan Li, Ning Chen, Gaole Dai, Jiaming Liu, Tiejun Huang, Shanghang Zhang
Date:2025-06-07 07:04:48

Learning to control high-speed objects in the real world remains a challenging frontier in robotics. Table tennis serves as an ideal testbed for this problem, demanding both rapid interception of fast-moving balls and precise adjustment of their trajectories. This task presents two fundamental challenges: it requires a high-precision vision system capable of accurately predicting ball trajectories, and it necessitates intelligent strategic planning to ensure precise ball placement to target regions. The dynamic nature of table tennis, coupled with its real-time response requirements, makes it particularly well-suited for advancing robotic control capabilities in fast-paced, precision-critical domains. In this paper, we present SpikePingpong, a novel system that integrates spike-based vision with imitation learning for high-precision robotic table tennis. Our approach introduces two key attempts that directly address the aforementioned challenges: SONIC, a spike camera-based module that achieves millimeter-level precision in ball-racket contact prediction by compensating for real-world uncertainties such as air resistance and friction; and IMPACT, a strategic planning module that enables accurate ball placement to targeted table regions. The system harnesses a 20 kHz spike camera for high-temporal resolution ball tracking, combined with efficient neural network models for real-time trajectory correction and stroke planning. Experimental results demonstrate that SpikePingpong achieves a remarkable 91% success rate for 30 cm accuracy target area and 71% in the more challenging 20 cm accuracy task, surpassing previous state-of-the-art approaches by 38% and 37% respectively. These significant performance improvements enable the robust implementation of sophisticated tactical gameplay strategies, providing a new research perspective for robotic control in high-speed dynamic tasks.

Optimizing Battery and Line Undergrounding Investments for Transmission Systems under Wildfire Risk Scenarios: A Benders Decomposition Approach

Authors:Ryan Piansky, Rahul K. Gupta, Daniel K. Molzahn
Date:2025-06-07 06:53:13

With electric power infrastructure posing an increasing risk of igniting wildfires under continuing climate change, utilities are frequently de-energizing power lines to mitigate wildfire ignition risk, which can cause load shedding. Recent research advocates for installing battery energy storage systems as well as undergrounding risky overhead lines to reduce the load shedding during such de-energizations. Since wildfire ignition risk can exhibit substantial geographic and temporal variations, it is important to plan battery installation and line undergrounding investments while considering multiple possible scenarios. This paper presents a scenario-based framework for optimizing battery installation and line undergrounding investments while considering many scenarios, each consisting of a day-long time series of uncertain parameters for the load demand, renewable generation, and wildfire ignition risks. This problem is difficult to solve due to a large number of scenarios and binary variables associated with the battery placements as well as the lines to be undergrounded. To address the computational challenges, we decompose the problem in a two-stage scheme via a Benders decomposition approach. The first stage is a master problem formulated as a mixed integer linear programming (MILP) model that makes decisions on the locations and sizes of batteries as well as the lines to be undergrounded. The second stage consists of a linear programming model that assesses these battery and line undergrounding decisions as modeled by a DC OPF formulation. We demonstrate the effectiveness of the proposed scheme on a large-scale transmission network with real world data on wildfire ignition risks, load, and renewable generation.

RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation

Authors:Songhao Han, Boxiang Qiu, Yue Liao, Siyuan Huang, Chen Gao, Shuicheng Yan, Si Liu
Date:2025-06-07 06:15:49

Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 policies, underutilizing VLMs' strengths in semantic reasoning and long-horizon planning. These System 2 capabilities-characterized by deliberative, goal-directed thinking-remain under explored due to the limited temporal scale and structural complexity of current benchmarks. To address this gap, we introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation. RoboCerebra includes: (1) a large-scale simulation dataset with extended task horizons and diverse subtask sequences in household environments; (2) a hierarchical framework combining a high-level VLM planner with a low-level vision-language-action (VLA) controller; and (3) an evaluation protocol targeting planning, reflection, and memory through structured System 1-System 2 interaction. The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences. Human operators execute the subtasks in simulation, yielding high-quality trajectories with dynamic object variations. Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations. We further benchmark state-of-the-art VLMs as System 2 modules and analyze their performance across key cognitive dimensions, advancing the development of more capable and generalizable robotic planners.