planning - 2026-01-06

A translation of "What is differential geometry: curves and surfaces"

Authors:Anton Petrunin, Sergio Zamora Barrera

Date:2026-01-05 18:16:44

These notes are designed for those who either plan to work in differential geometry, or at least want to have a good reason not to do it. We discuss smooth curves and surfaces -- the main gate to differential geometry. We focus on the techniques that are absolutely essential for further study, keeping it problem-centered, elementary, visual, and virtually rigorous.

Impact of Spatial Proximity on Drone Services

Authors:Vejaykarthy Srithar, Syeda Amna Rizvi, Amani Abusafia, Athman Bouguettaya, Balsam Alkouz

Date:2026-01-05 15:32:21

We demonstrate the peer-to-peer impact of drones flying in close proximity. Understanding these impacts is crucial for planning efficient drone delivery services. In this regard, we conducted a set of experiments using drones at varying positions in a 3D space under different wind conditions. We collected data on drone energy consumption traveling in a skyway segment. We developed a Graphical User Interface (GUI) that plots drone trajectories within a segment. The GUI facilitates analyzing the peer-to-peer influence of drones on their energy consumption. The analysis includes drones' positions, distance of separation, and wind impact.

Risk-Averse Markov Decision Processes: Applications to Electricity Grid and Reservoir Management

Authors:Arash Khojaste, Jonathan Pearce, Daniela Pucci de Farias, Geoffrey Pritchard, Golbon Zakeri

Date:2026-01-05 15:31:27

This paper develops risk-averse models to support system operators in planning and operating the electricity grid under uncertainty from renewable power generation. We incorporate financial risk hedging using conditional value at risk (CVaR) within a Markov Decision Process (MDP) framework and propose efficient, exact solution methods for these models. In addition, we introduce a power reliability-oriented risk measure and present new, computationally efficient models for risk-averse grid planning and operations.

ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense

Authors:Yu Li, Sizhe Tang, Rongqian Chen, Fei Xu Yu, Guangyu Jiang, Mahdi Imani, Nathaniel D. Bastian, Tian Lan

Date:2026-01-05 15:18:54

Automated cyber defense (ACD) seeks to protect computer networks with minimal or no human intervention, reacting to intrusions by taking corrective actions such as isolating hosts, resetting services, deploying decoys, or updating access controls. However, existing approaches for ACD, such as deep reinforcement learning (RL), often face difficult exploration in complex networks with large decision/state spaces and thus require an expensive amount of samples. Inspired by the need to learn sample-efficient defense policies, we frame ACD in CAGE Challenge 4 (CAGE-4 / CC4) as a context-based partially observable Markov decision problem and propose a planning-centric defense policy based on Monte Carlo Tree Search (MCTS). It explicitly models the exploration-exploitation tradeoff in ACD and uses statistical sampling to guide exploration and decision making. We make novel use of graph neural networks (GNNs) to embed observations from the network as attributed graphs, to enable permutation-invariant reasoning over hosts and their relationships. To make our solution practical in complex search spaces, we guide MCTS with learned graph embeddings and priors over graph-edit actions, combining model-free generalization and policy distillation with look-ahead planning. We evaluate the resulting agent on CC4 scenarios involving diverse network structures and adversary behaviors, and show that our search-guided, graph-embedding-based planning improves defense reward and robustness relative to state-of-the-art RL baselines.

AI-Driven Stabilization in Power Grids through Controlling Line Admittances

Authors:Sangjoon Park, Hoyun Choi, Yongsun Lee, Seungchan Jo, Jürgen Kurths, B. Kahng

Date:2026-01-05 13:44:01

The global transition from traditional power plants to renewable energy sources introduces new challenges in grid stability, primarily because inverter-based technologies provide insufficient inertia. To address this, we introduce an artificial intelligence algorithm that autonomously stabilizes power grids by adaptively tuning admittance regulators in response to disturbances. This Adaptive Admittance Controller (AAC) algorithm not only stabilizes the system in real time but also identifies the best regulator locations, thereby unifying grid planning and real time control within a single framework. When tested on a real UK power grid, the AAC markedly reduces frequency deviations and rapidly restores nominal operation. In addition, the algorithm isolates a small number of key regulators and intervenes only on these, lowering both system complexity and cost. The AAC algorithm further reduces the nonlinearity effect, quickly stabilizing the frequency and power flow. This intelligent control scheme enables power grids to reliably return to stable operating conditions under a broad spectrum of fault scenarios. The proposed framework can also be used to mitigate cascading failures by adaptively controlling critical links in a variety of networked infrastructures, such as cascades of traffic congestion on road networks or fuse failures in energy-saving systems.

Dancing Points: Synthesizing Ballroom Dancing with Three-Point Inputs

Authors:Peizhuo Li, Sebastian Starke, Yuting Ye, Olga Sorkine-Hornung

Date:2026-01-05 13:24:12

Ballroom dancing is a structured yet expressive motion category. Its highly diverse movement and complex interactions between leader and follower dancers make the understanding and synthesis challenging. We demonstrate that the three-point trajectory available from a virtual reality (VR) device can effectively serve as a dancer's motion descriptor, simplifying the modeling and synthesis of interplay between dancers' full-body motions down to sparse trajectories. Thanks to the low dimensionality, we can employ an efficient MLP network to predict the follower's three-point trajectory directly from the leader's three-point input for certain types of ballroom dancing, addressing the challenge of modeling high-dimensional full-body interaction. It also prevents our method from overfitting thanks to its compact yet explicit representation. By leveraging the inherent structure of the movements and carefully planning the autoregressive procedure, we show a deterministic neural network is able to translate three-point trajectories into a virtual embodied avatar, which is typically considered under-constrained and requires generative models for common motions. In addition, we demonstrate this deterministic approach generalizes beyond small, structured datasets like ballroom dancing, and performs robustly on larger, more diverse datasets such as LaFAN. Our method provides a computationally- and data-efficient solution, opening new possibilities for immersive paired dancing applications. Code and pre-trained models for this paper are available at https://peizhuoli.github.io/dancing-points.

PhysSFI-Net: Physics-informed Geometric Learning of Skeletal and Facial Interactions for Orthognathic Surgical Outcome Prediction

Authors:Jiahao Bao, Huazhen Liu, Yu Zhuang, Leran Tao, Xinyu Xu, Yongtao Shi, Mengjia Cheng, Yiming Wang, Congshuang Ku, Ting Zeng, Yilang Du, Siyi Chen, Shunyao Shen, Suncheng Xiang, Hongbo Yu

Date:2026-01-05 13:14:19

Orthognathic surgery repositions jaw bones to restore occlusion and enhance facial aesthetics. Accurate simulation of postoperative facial morphology is essential for preoperative planning. However, traditional biomechanical models are computationally expensive, while geometric deep learning approaches often lack interpretability. In this study, we develop and validate a physics-informed geometric deep learning framework named PhysSFI-Net for precise prediction of soft tissue deformation following orthognathic surgery. PhysSFI-Net consists of three components: a hierarchical graph module with craniofacial and surgical plan encoders combined with attention mechanisms to extract skeletal-facial interaction features; a Long Short-Term Memory (LSTM)-based sequential predictor for incremental soft tissue deformation; and a biomechanics-inspired module for high-resolution facial surface reconstruction. Model performance was assessed using point cloud shape error (Hausdorff distance), surface deviation error, and landmark localization error (Euclidean distances of craniomaxillofacial landmarks) between predicted facial shapes and corresponding ground truths. A total of 135 patients who underwent combined orthodontic and orthognathic treatment were included for model training and validation. Quantitative analysis demonstrated that PhysSFI-Net achieved a point cloud shape error of 1.070 +/- 0.088 mm, a surface deviation error of 1.296 +/- 0.349 mm, and a landmark localization error of 2.445 +/- 1.326 mm. Comparative experiments indicated that PhysSFI-Net outperformed the state-of-the-art method ACMT-Net in prediction accuracy. In conclusion, PhysSFI-Net enables interpretable, high-resolution prediction of postoperative facial morphology with superior accuracy, showing strong potential for clinical application in orthognathic surgical planning and simulation.

Agentic Retoucher for Text-To-Image Generation

Authors:Shaocheng Shen, Jianfeng Liang. Chunlei Cai, Cong Geng, Huiyu Duan, Xiaoyun Zhang, Qiang Hu, Guangtao Zhai

Date:2026-01-05 12:06:43

Text-to-image (T2I) diffusion models such as SDXL and FLUX have achieved impressive photorealism, yet small-scale distortions remain pervasive in limbs, face, text and so on. Existing refinement approaches either perform costly iterative re-generation or rely on vision-language models (VLMs) with weak spatial grounding, leading to semantic drift and unreliable local edits. To close this gap, we propose Agentic Retoucher, a hierarchical decision-driven framework that reformulates post-generation correction as a human-like perception-reasoning-action loop. Specifically, we design (1) a perception agent that learns contextual saliency for fine-grained distortion localization under text-image consistency cues, (2) a reasoning agent that performs human-aligned inferential diagnosis via progressive preference alignment, and (3) an action agent that adaptively plans localized inpainting guided by user preference. This design integrates perceptual evidence, linguistic reasoning, and controllable correction into a unified, self-corrective decision process. To enable fine-grained supervision and quantitative evaluation, we further construct GenBlemish-27K, a dataset of 6K T2I images with 27K annotated artifact regions across 12 categories. Extensive experiments demonstrate that Agentic Retoucher consistently outperforms state-of-the-art methods in perceptual quality, distortion localization and human preference alignment, establishing a new paradigm for self-corrective and perceptually reliable T2I generation.

Bringing computation to the data: A MOEA-driven approach for optimising data processing in the context of the SKA and SRCNet

Authors:Manuel Parra-Royón, Álvaro Rodríguez-Gallardo, Susana Sánchez-Expósito, Laura Darriba-Pol, Jesús Sánchez-Castañeda, M. Ángeles Mendoza, Julián Garrido, Javier Moldón, Lourdes Verdes-Montenegro

Date:2026-01-05 10:35:34

The Square Kilometre Array (SKA) will generate unprecedented data volumes, making efficient data processing a critical challenge. Within this context, the SKA Regional Centres Network (SRCNet) must operate in a near-exascale environment where traditional data-centric computing models based on moving large datasets to centralised resources are no longer viable due to network and storage bottlenecks. To address this limitation, this work proposes a shift towards distributed and in-situ computing, where computation is moved closer to the data. We explore the integration of Function-as-a-Service (FaaS) with an intelligent decision-making entity based on Evolutionary Algorithms (EAs) to optimise data-intensive workflows within SRCNet. FaaS enables lightweight and modular function execution near data sources while abstracting infrastructure management. The proposed decision-making entity employs Multi-Objective Evolutionary Algorithms (MOEAs) to explore near-optimal execution plans considering execution time and energy consumption, together with constraints related to data location and transfer costs. This work establishes a baseline framework for efficient and cost-aware computation-to-data strategies within the SRCNet architecture.

Learning Diffusion Policy from Primitive Skills for Robot Manipulation

Authors:Zhihao Gu, Ming Yang, Difan Zou, Dong Xu

Date:2026-01-05 09:56:24

Diffusion policies (DP) have recently shown great promise for generating actions in robotic manipulation. However, existing approaches often rely on global instructions to produce short-term control signals, which can result in misalignment in action generation. We conjecture that the primitive skills, referred to as fine-grained, short-horizon manipulations, such as ``move up'' and ``open the gripper'', provide a more intuitive and effective interface for robot learning. To bridge this gap, we propose SDP, a skill-conditioned DP that integrates interpretable skill learning with conditional action planning. SDP abstracts eight reusable primitive skills across tasks and employs a vision-language model to extract discrete representations from visual observations and language instructions. Based on them, a lightweight router network is designed to assign a desired primitive skill for each state, which helps construct a single-skill policy to generate skill-aligned actions. By decomposing complex tasks into a sequence of primitive skills and selecting a single-skill policy, SDP ensures skill-consistent behavior across diverse tasks. Extensive experiments on two challenging simulation benchmarks and real-world robot deployments demonstrate that SDP consistently outperforms SOTA methods, providing a new paradigm for skill-based robot learning with diffusion policies.

SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling

Authors:Tieu-Long Phan, Nhu-Ngoc Nguyen Song, Peter F. Stadler

Date:2026-01-05 09:49:58

We present SynRXN, a unified benchmarking framework and open-data resource for computer-aided synthesis planning (CASP). SynRXN decomposes end-to-end synthesis planning into five task families, covering reaction rebalancing, atom-to-atom mapping, reaction classification, reaction property prediction, and synthesis route design. Curated, provenance-tracked reaction corpora are assembled from heterogeneous public sources into a harmonized representation and packaged as versioned datasets for each task family, with explicit source metadata, licence tags, and machine-readable manifests that record checksums, and row counts. For every task, SynRXN provides transparent splitting functions that generate leakage-aware train, validation, and test partitions, together with standardized evaluation workflows and metric suites tailored to classification, regression, and structured prediction settings. For sensitive benchmarking, we combine public training and validation data with held-out gold-standard test sets, and contamination-prone tasks such as reaction rebalancing and atom-to-atom mapping are distributed only as evaluation sets and are explicitly not intended for model training. Scripted build recipes enable bitwise-reproducible regeneration of all corpora across machines and over time, and the entire resource is released under permissive open licences to support reuse and extension. By removing dataset heterogeneity and packaging transparent, reusable evaluation scaffolding, SynRXN enables fair longitudinal comparison of CASP methods, supports rigorous ablations and stress tests along the full reaction-informatics pipeline, and lowers the barrier for practitioners who seek robust and comparable performance estimates for real-world synthesis planning workloads.

MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning

Authors:Minh Hieu Ha, Khanh Ly Ta, Hung Phan, Tung Doan, Tung Dao, Dao Tran, Huynh Thi Thanh Binh

Date:2026-01-05 08:55:27

Autonomous path planning requires a synergy between global reasoning and geometric precision, especially in complex or cluttered environments. While classical A* is valued for its optimality, it incurs prohibitive computational and memory costs in large-scale scenarios. Recent attempts to mitigate these limitations by using Large Language Models for waypoint guidance remain insufficient, as they rely only on text-based reasoning without spatial grounding. As a result, such models often produce incorrect waypoints in topologically complex environments with dead ends, and lack the perceptual capacity to interpret ambiguous physical boundaries. These inconsistencies lead to costly corrective expansions and undermine the intended computational efficiency. We introduce MMP-A*, a multimodal framework that integrates the spatial grounding capabilities of vision-language models with a novel adaptive decay mechanism. By anchoring high-level reasoning in physical geometry, the framework produces coherent waypoint guidance that addresses the limitations of text-only planners. The adaptive decay mechanism dynamically regulates the influence of uncertain waypoints within the heuristic, ensuring geometric validity while substantially reducing memory overhead. To evaluate robustness, we test the framework in challenging environments characterized by severe clutter and topological complexity. Experimental results show that MMP-A* achieves near-optimal trajectories with significantly reduced operational costs, demonstrating its potential as a perception-grounded and computationally efficient paradigm for autonomous navigation.

Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems

Authors:Niloufar Alipour Talemi, Julia Boone, Fatemeh Afghah

Date:2026-01-05 08:34:17

The paradigm of Earth Observation analysis is shifting from static deep learning models to autonomous agentic AI. Although recent vision foundation models and multimodal large language models advance representation learning, they often lack the sequential planning and active tool orchestration required for complex geospatial workflows. This survey presents the first comprehensive review of agentic AI in remote sensing. We introduce a unified taxonomy distinguishing between single-agent copilots and multi-agent systems while analyzing architectural foundations such as planning mechanisms, retrieval-augmented generation, and memory structures. Furthermore, we review emerging benchmarks that move the evaluation from pixel-level accuracy to trajectory-aware reasoning correctness. By critically examining limitations in grounding, safety, and orchestration, this work outlines a strategic roadmap for the development of robust, autonomous geospatial intelligence.

Extending SST Anomaly Forecasts Through Simultaneous Decomposition of Seasonal and PDO Modes

Authors:Rameshan Kallummal

Date:2026-01-05 07:47:12

We present a new approach to forecasting North Pacific Sea Surface Temperatures (SST) by recognizing that interannual variability primarily reflects amplitude changes in four dominant seasonal cycles. Our multivariate linear model simultaneously captures these amplitude-modulated seasonal cycles along with the Pacific Decadal Oscillation (PDO), which naturally emerges as an intrinsic feature of the system rather than a separate phenomenon. Using sixteen-dimensional regression based on four spatially distributed time series per variable, the model delivers unprecedented forecast accuracy for both interannual amplitude modulations and PDO evolution, maintaining skill beyond 36 months -- a substantial improvement over current operational and research forecasts, including machine learning methods. Predictions initialized in 2024 project that the PDO will remain in its negative phase through late 2026, implying reduced likelihood of severe marine heatwaves in the eastern North Pacific during this period. These findings have direct implications for regional climate impacts, including storm tracks, precipitation patterns, and marine ecosystem health. By treating seasonal and interannual variability as coupled rather than independent processes, this framework advances our understanding of North Pacific climate dynamics and provides a powerful tool for stakeholders managing climate-sensitive resources and planning adaptation strategies in regions strongly influenced by North Pacific conditions.

The Machine Learning Canvas: Empirical Findings on Why Strategy Matters More Than AI Code Generation

Authors:Martin Prause

Date:2026-01-05 07:02:58

Despite the growing popularity of AI coding assistants, over 80% of machine learning (ML) projects fail to deliver real business value. This study creates and tests a Machine Learning Canvas, a practical framework that combines business strategy, software engineering, and data science in order to determine the factors that lead to the success of ML projects. We surveyed 150 data scientists and analyzed their responses using statistical modeling. We identified four key success factors: Strategy (clear goals and planning), Process (how work gets done), Ecosystem (tools and infrastructure), and Support (organizational backing and resources). Our results show that these factors are interconnected - each one affects the next. For instance, strong organizational support results in a clearer strategy (β= 0.432, p < 0.001), which improves work processes (β= 0.428, p < 0.001) and builds better infrastructure (β= 0.547, p < 0.001). Together, these elements determine whether a project succeeds. The surprising finding? Although AI assistants make coding faster, they don't guarantee project success. AI assists with the "how" of coding but cannot replace the "why" and "what" of strategic thinking.

PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism and Comprehensive AI Psychological Counselor

Authors:Qianjun Pan, Junyi Wang, Jie Zhou, Yutao Yang, Junsong Li, Kaiyin Xu, Yougen Zhou, Yihan Li, Jingyuan Zhao, Qin Chen, Ningning Zhou, Kai Chen, Liang He

Date:2026-01-05 05:26:57

To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenges: \textbf{1) Can we train a highly realistic AI counselor?} Realistic counseling is a longitudinal task requiring sustained memory and dynamic goal tracking. We propose a multi-session benchmark (spanning 6-10 sessions across three distinct stages) that demands critical capabilities such as memory continuity, adaptive reasoning, and longitudinal planning. The dataset is annotated with extensive professional skills, comprising over 677 meta-skills and 4577 atomic skills. \textbf{2) How to train a multi-therapy AI counselor?} While existing models often focus on a single therapy, complex cases frequently require flexible strategies among various therapies. We construct a diverse dataset covering five therapeutic modalities (Psychodynamic, Behaviorism, CBT, Humanistic Existentialist, and Postmodernist) alongside an integrative therapy with a unified three-stage clinical framework across six core psychological topics. \textbf{3) How to systematically evaluate an AI counselor?} We establish a holistic evaluation framework with 18 therapy-specific and therapy-shared metrics across Client-Level and Counselor-Level dimensions. To support this, we also construct over 2,000 diverse client profiles. Extensive experimental analysis fully validates the superior quality and clinical fidelity of our dataset. Crucially, \texttt{PsychEval} transcends static benchmarking to serve as a high-fidelity reinforcement learning environment that enables the self-evolutionary training of clinically responsible and adaptive AI counselors.

AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving

Authors:Yanhao Wu, Haoyang Zhang, Fei He, Rui Wu, Congpei Qiu, Liang Gao, Wei Ke, Tong Zhang

Date:2026-01-05 03:41:20

End-to-end autonomous driving has rapidly progressed, enabling joint perception and planning in complex environments. In the planning stage, state-of-the-art (SOTA) end-to-end autonomous driving models decouple planning into parallel lateral and longitudinal predictions. While effective, this parallel design can lead to i) coordination failures between the planned path and speed, and ii) underutilization of the drive path as a prior for longitudinal planning, thus redundantly encoding static information. To address this, we propose a novel cascaded framework that explicitly conditions longitudinal planning on the drive path, enabling coordinated and collision-aware lateral and longitudinal planning. Specifically, we introduce a path-conditioned formulation that explicitly incorporates the drive path into longitudinal planning. Building on this, the model predicts longitudinal displacements along the drive path rather than full 2D trajectory waypoints. This design simplifies longitudinal reasoning and more tightly couples it with lateral planning. Additionally, we introduce a planning-oriented data augmentation strategy that simulates rare safety-critical events, such as vehicle cut-ins, by adding agents and relabeling longitudinal targets to avoid collision. Evaluated on the challenging Bench2Drive benchmark, our method sets a new SOTA, achieving a driving score of 89.07 and a success rate of 73.18%, demonstrating significantly improved coordination and safety

FALCON: Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation

Authors:Abdur R. Fayjie, Pankhi Kashyap, Jutika Borah, Patrick Vandewalle

Date:2026-01-04 22:57:49

Precise delineation of anatomical and pathological structures within 3D medical volumes is crucial for accurate diagnosis, effective surgical planning, and longitudinal disease monitoring. Despite advancements in AI, clinically viable segmentation is often hindered by the scarcity of 3D annotations, patient-specific variability, data privacy concerns, and substantial computational overhead. In this work, we propose FALCON, a cross-domain few-shot segmentation framework that achieves high-precision 3D volume segmentation by processing data as 2D slices. The framework is first meta-trained on natural images to learn-to-learn generalizable segmentation priors, then transferred to the medical domain via adversarial fine-tuning and boundary-aware learning. Task-aware inference, conditioned on support cues, allows FALCON to adapt dynamically to patient-specific anatomical variations across slices. Experiments on four benchmarks demonstrate that FALCON consistently achieves the lowest Hausdorff Distance scores, indicating superior boundary accuracy while maintaining a Dice Similarity Coefficient comparable to the state-of-the-art models. Notably, these results are achieved with significantly less labeled data, no data augmentation, and substantially lower computational overhead.

EHRSummarizer: A Privacy-Aware, FHIR-Native Architecture for Structured Clinical Summarization of Electronic Health Records

Authors:Houman Kazemzadeh, Nima Minaifar, Kamyar Naderi, Sho Tabibzadeh

Date:2026-01-04 21:10:42

Clinicians routinely navigate fragmented electronic health record (EHR) interfaces to assemble a coherent picture of a patient's problems, medications, recent encounters, and longitudinal trends. This work describes EHRSummarizer, a privacy-aware, FHIR-native reference architecture that retrieves a targeted set of high-yield FHIR R4 resources, normalizes them into a consistent clinical context package, and produces structured summaries intended to support structured chart review. The system can be configured for data minimization, stateless processing, and flexible deployment, including local inference within an organization's trust boundary. To mitigate the risk of unsupported or unsafe behavior, the summarization stage is constrained to evidence present in the retrieved context package, is intended to indicate missing or unavailable domains where feasible, and avoids diagnostic or treatment recommendations. Prototype demonstrations on synthetic and test FHIR environments illustrate end-to-end behavior and output formats; however, this manuscript does not report clinical outcomes or controlled workflow studies. We outline an evaluation plan centered on faithfulness, omission risk, temporal correctness, usability, and operational monitoring to guide future institutional assessments.

Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation

Authors:Huajie Tan, Peterson Co, Yijie Xu, Shanyu Rong, Yuheng Ji, Cheng Chi, Xiansheng Chen, Qiongyu Zhang, Zhongxia Zhao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

Date:2026-01-04 17:53:42

Long-horizon robotic manipulation is increasingly important for real-world deployment, requiring spatial disambiguation in complex layouts and temporal resilience under dynamic interaction. However, existing end-to-end and hierarchical Vision-Language-Action (VLA) policies often rely on text-only cues while keeping plan intent latent, which undermines referential grounding in cluttered or underspecified scenes, impedes effective task decomposition of long-horizon goals with close-loop interaction, and limits causal explanation by obscuring the rationale behind action choices. To address these issues, we first introduce Visual Sketch, an implausible visual intermediate that renders points, boxes, arrows, and typed relations in the robot's current views to externalize spatial intent, connect language to scene geometry. Building on Visual Sketch, we present Action-Sketcher, a VLA framework that operates in a cyclic See-Think-Sketch-Act workflow coordinated by adaptive token-gated strategy for reasoning triggers, sketch revision, and action issuance, thereby supporting reactive corrections and human interaction while preserving real-time action prediction. To enable scalable training and evaluation, we curate diverse corpus with interleaved images, text, Visual Sketch supervision, and action sequences, and train Action-Sketcher with a multi-stage curriculum recipe that combines interleaved sequence alignment for modality unification, language-to-sketch consistency for precise linguistic grounding, and imitation learning augmented with sketch-to-action reinforcement for robustness. Extensive experiments on cluttered scenes and multi-object tasks, in simulation and on real-world tasks, show improved long-horizon success, stronger robustness to dynamic scene changes, and enhanced interpretability via editable sketches and step-wise plans. Project website: https://action-sketcher.github.io

HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller

Authors:Tran Tien Dat, Nguyen Hai An, Nguyen Khanh Viet Dung, Nguyen Duy Duc

Date:2026-01-04 15:49:46

Current attempts of Reinforcement Learning for Autonomous Controller are data-demanding while the results are under-performed, unstable, and unable to grasp and anchor on the concept of safety, and over-concentrating on noise features due to the nature of pixel reconstruction. While current Self-Supervised Learningapproachs that learning on high-dimensional representations by leveraging the JointEmbedding Predictive Architecture (JEPA) are interesting and an effective alternative, as the idea mimics the natural ability of the human brain in acquiring new skill usingimagination and minimal samples of observations. This study introduces Hanoi-World, a JEPA-based world model that using recurrent neural network (RNN) formaking longterm horizontal planning with effective inference time. Experimentsconducted on the Highway-Env package with difference enviroment showcase the effective capability of making a driving plan while safety-awareness, with considerablecollision rate in comparison with SOTA baselines

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Authors:Yang Zhou, Hao Shao, Letian Wang, Zhuofan Zong, Hongsheng Li, Steven L. Waslander

Date:2026-01-04 13:36:21

Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal evolution of complex scenes. In autonomous driving, this vision gives rise to driving world models: generative simulators that imagine ego and agent futures, enabling scalable simulation, safe testing of corner cases, and rich synthetic data generation. Yet, despite fast-growing research activity, the field lacks a rigorous benchmark to measure progress and guide priorities. Existing evaluations remain limited: generic video metrics overlook safety-critical imaging factors; trajectory plausibility is rarely quantified; temporal and agent-level consistency is neglected; and controllability with respect to ego conditioning is ignored. Moreover, current datasets fail to cover the diversity of conditions required for real-world deployment. To address these gaps, we present DrivingGen, the first comprehensive benchmark for generative driving world models. DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources, spanning varied weather, time of day, geographic regions, and complex maneuvers, with a suite of new metrics that jointly assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking 14 state-of-the-art models reveals clear trade-offs: general models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality. DrivingGen offers a unified evaluation framework to foster reliable, controllable, and deployable driving world models, enabling scalable simulation, planning, and data-driven decision-making.

DiffKD-DCIS: Predicting Upgrade of Ductal Carcinoma In Situ with Diffusion Augmentation and Knowledge Distillation

Authors:Tao Li, Qing Li, Na Li, Hui Xie

Date:2026-01-04 12:26:07

Accurately predicting the upgrade of ductal carcinoma in situ (DCIS) to invasive ductal carcinoma (IDC) is crucial for surgical planning. However, traditional deep learning methods face challenges due to limited ultrasound data and poor generalization ability. This study proposes the DiffKD-DCIS framework, integrating conditional diffusion modeling with teacher-student knowledge distillation. The framework operates in three stages: First, a conditional diffusion model generates high-fidelity ultrasound images using multimodal conditions for data augmentation. Then, a deep teacher network extracts robust features from both original and synthetic data. Finally, a compact student network learns from the teacher via knowledge distillation, balancing generalization and computational efficiency. Evaluated on a multi-center dataset of 1,435 cases, the synthetic images were of good quality. The student network had fewer parameters and faster inference. On external test sets, it outperformed partial combinations, and its accuracy was comparable to senior radiologists and superior to junior ones, showing significant clinical potential.

AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval

Authors:Yue Zhou, Ran Ding, Xue Yang, Xue Jiang, Xingzhao Liu

Date:2026-01-04 07:38:51

Despite notable advancements in remote sensing vision-language models (VLMs), existing models often struggle with spatial understanding, limiting their effectiveness in real-world applications. To push the boundaries of VLMs in remote sensing, we specifically address vehicle imagery captured by drones and introduce a spatially-aware dataset AirSpatial, which comprises over 206K instructions and introduces two novel tasks: Spatial Grounding and Spatial Question Answering. It is also the first remote sensing grounding dataset to provide 3DBB. To effectively leverage existing image understanding of VLMs to spatial domains, we adopt a two-stage training strategy comprising Image Understanding Pre-training and Spatial Understanding Fine-tuning. Utilizing this trained spatially-aware VLM, we develop an aerial agent, AirSpatialBot, which is capable of fine-grained vehicle attribute recognition and retrieval. By dynamically integrating task planning, image understanding, spatial understanding, and task execution capabilities, AirSpatialBot adapts to diverse query requirements. Experimental results validate the effectiveness of our approach, revealing the spatial limitations of existing VLMs while providing valuable insights. The model, code, and datasets will be released at https://github.com/VisionXLab/AirSpatialBot

Adaptive Kernel Regression for Constrained Route Alignment: Theory and Iterative Data Sharpening

Authors:Shiyin Du, Yiting Chen, Wenzhi Yang, Qiong Li, Xiaoping Shi

Date:2026-01-04 03:27:49

Route alignment design in surveying and transportation engineering frequently involves fixed waypoint constraints, where a path must precisely traverse specific coordinates. While existing literature primarily relies on geometric optimization or control-theoretic spline frameworks, there is a lack of systematic statistical modeling approaches that balance global smoothness with exact point adherence. This paper proposes an Adaptive Nadaraya-Watson (ANW) kernel regression estimator designed to address the fixed waypoint problem. By incorporating waypoint-specific weight tuning parameters, the ANW estimator decouples global smoothing from local constraint satisfaction, avoiding the "jagged" artifacts common in naive local bandwidth-shrinking strategies. To further enhance estimation accuracy, we develop an iterative data sharpening algorithm that systematically reduces bias while maintaining the stability of the kernel framework. We establish the theoretical foundation for the ANW estimator by deriving its asymptotic bias and variance and proving its convergence properties under the internal constraint model. Numerical case studies in 1D and 2D trajectory planning demonstrate that the method effectively balances root mean square error (RMSE) and curvature smoothness. Finally, we validate the practical utility of the framework through empirical applications to railway and highway route planning. In sum, this work provides a stable, theoretically grounded, and computationally efficient solution for complex, constrained alignment design problems.

Automated SBOM-Driven Vulnerability Triage for IoT Firmware: A Lightweight Pipeline for Risk Prioritization

Authors:Abdurrahman Tolay

Date:2026-01-04 00:09:01

The proliferation of Internet of Things (IoT) devices has introduced significant security challenges, primarily due to the opacity of firmware components and the complexity of supply chain dependencies. IoT firmware frequently relies on outdated, third-party libraries embedded within monolithic binary blobs, making vulnerability management difficult. While Software Bill of Materials (SBOM) standards have matured, generating actionable intelligence from raw firmware dumps remains a manual and error-prone process. This paper presents a lightweight, automated pipeline designed to extract file systems from Linux-based IoT firmware, generate a comprehensive SBOM, map identified components to known vulnerabilities, and apply a multi-factor triage scoring model. The proposed system focuses on risk prioritization by integrating signals from the Common Vulnerability Scoring System (CVSS), Exploit Prediction Scoring System (EPSS), and the CISA Known Exploited Vulnerabilities (KEV) catalog. Unlike conventional scanners that produce high volumes of uncontextualized alerts, this approach emphasizes triage by calculating a localized risk score for each finding. We describe the architecture, the normalization challenges of embedded Linux, and a scoring methodology intended to reduce alert fatigue. The study outlines a planned evaluation strategy to validate the extraction success rate and triage efficacy using a dataset of public vendor firmware, offering a reproducibility framework for future research in firmware security.

SAHA: Supervised Autonomous HArvester for selective forest thinning

Authors:Fang Nan, Meher Malladi, Qingqing Li, Fan Yang, Joonas Juola, Tiziano Guadagnino, Jens Behley, Cesar Cadena, Cyrill Stachniss, Marco Hutter

Date:2026-01-03 20:48:10

Forestry plays a vital role in our society, creating significant ecological, economic, and recreational value. Efficient forest management involves labor-intensive and complex operations. One essential task for maintaining forest health and productivity is selective thinning, which requires skilled operators to remove specific trees to create optimal growing conditions for the remaining ones. In this work, we present a solution based on a small-scale robotic harvester (SAHA) designed for executing this task with supervised autonomy. We build on a 4.5-ton harvester platform and implement key hardware modifications for perception and automatic control. We implement learning- and model-based approaches for precise control of hydraulic actuators, accurate navigation through cluttered environments, robust state estimation, and reliable semantic estimation of terrain traversability. Integrating state-of-the-art techniques in perception, planning, and control, our robotic harvester can autonomously navigate forest environments and reach targeted trees for selective thinning. We present experimental results from extensive field trials over kilometer-long autonomous missions in northern European forests, demonstrating the harvester's ability to operate in real forests. We analyze the performance and provide the lessons learned for advancing robotic forest management.

LiveBo: Empowering Non-Chinese Speaking Students through AI-Driven Real-Life Scenarios in Cantonese

Authors:Ka Yan Fung, Kwong Chiu Fung, Yuxing Tao, Tze Leung Rick Lui, Kuen Fung Sin

Date:2026-01-03 16:25:17

Language learning is a multifaceted process. Insufficient vocabulary can hinder communication and lead to demotivation. For non-Chinese speaking (NCS) students, learning Traditional Chinese (Cantonese) poses distinct challenges, particularly due to the complexity of converting spoken and written forms. To address this issue, this study examines the effectiveness of real-life scenario simulations integrated with interactive social robots in enhancing NCS student engagement and language acquisition. The research employs a quasi-experimental design involving NCS students who interact with an AI-driven, robot-assisted language learning system, LiveBo. The study aims to assess the impact of this innovative approach on active participation and motivation. Data are collected through proficiency tests, questionnaires and semi-structured interviews. Findings indicate that NCS students experience positive improvements in behavioural and emotional engagement, motivation and learning outcomes, highlighting the potential of integrating novel technologies in language education. We plan to compare with the control group in the future. This study highlights the significance of interactive and immersive learning experiences in promoting motivation and enhancing language acquisition among NCS students.

Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data

Authors:Marzieh Amiri Shahbazi, Ali Baheri, Nasibeh Azadeh-Fard

Date:2026-01-03 16:06:37

Clinical decision-making demands uncertainty quantification that provides both distribution-free coverage guarantees and risk-adaptive precision, requirements that existing methods fail to jointly satisfy. We present a hybrid Bayesian-conformal framework that addresses this fundamental limitation in healthcare predictions. Our approach integrates Bayesian hierarchical random forests with group-aware conformal calibration, using posterior uncertainties to weight conformity scores while maintaining rigorous coverage validity. Evaluated on 61,538 admissions across 3,793 U.S. hospitals and 4 regions, our method achieves target coverage (94.3% vs 95% target) with adaptive precision: 21% narrower intervals for low-uncertainty cases while appropriately widening for high-risk predictions. Critically, we demonstrate that well-calibrated Bayesian uncertainties alone severely under-cover (14.1%), highlighting the necessity of our hybrid approach. This framework enables risk-stratified clinical protocols, efficient resource planning for high-confidence predictions, and conservative allocation with enhanced oversight for uncertain cases, providing uncertainty-aware decision support across diverse healthcare settings.

MentalGame: Predicting Personality-Job Fitness for Software Developers Using Multi-Genre Games and Machine Learning Approaches

Authors:Soroush Elyasi, Arya VarastehNezhad, Fattaneh Taghiyareh

Date:2026-01-03 15:09:02

Personality assessment in career guidance and personnel selection traditionally relies on self-report questionnaires, which are susceptible to response bias, fatigue, and intentional distortion. Game-based assessment offers a promising alternative by capturing implicit behavioral signals during gameplay. This study proposes a multi-genre serious-game framework combined with machine-learning techniques to predict suitability for software development roles. Developer-relevant personality and behavioral traits were identified through a systematic literature review and an empirical study of professional software engineers. A custom mobile game was designed to elicit behaviors related to problem solving, planning, adaptability, persistence, time management, and information seeking. Fine-grained gameplay event data were collected and analyzed using a two-phase modeling strategy where suitability was predicted exclusively from gameplay-derived behavioral features. Results show that our model achieved up to 97% precision and 94% accuracy. Behavioral analysis revealed that proper candidates exhibited distinct gameplay patterns, such as more wins in puzzle-based games, more side challenges, navigating menus more frequently, and exhibiting fewer pauses, retries, and surrender actions. These findings demonstrate that implicit behavioral traces captured during gameplay is promising in predicting software-development suitability without explicit personality testing, supporting serious games as a scalable, engaging, and less biased alternative for career assessment.