planning - 2026-01-15

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Authors:Chi-Pin Huang, Yunze Man, Zhiding Yu, Min-Hung Chen, Jan Kautz, Yu-Chiang Frank Wang, Fu-En Yang
Date:2026-01-14 18:59:59

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy reasoning traces. We propose Fast-ThinkAct, an efficient reasoning framework that achieves compact yet performant planning through verbalizable latent reasoning. Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodied control. This enables reasoning-enhanced policy learning that effectively connects compact reasoning to action execution. Extensive experiments across diverse embodied manipulation and reasoning benchmarks demonstrate that Fast-ThinkAct achieves strong performance with up to 89.3\% reduced inference latency over state-of-the-art reasoning VLAs, while maintaining effective long-horizon planning, few-shot adaptation, and failure recovery.

TEMPO: A Realistic Multi-Domain Benchmark for Temporal Reasoning-Intensive Retrieval

Authors:Abdelrahman Abdallah, Mohammed Ali, Muhammad Abdul-Mageed, Adam Jatowt
Date:2026-01-14 14:45:20

Existing temporal QA benchmarks focus on simple fact-seeking queries from news corpora, while reasoning-intensive retrieval benchmarks lack temporal grounding. However, real-world information needs often require reasoning about temporal evolution and synthesizing evidence across time periods. We introduce TEMPO, the first benchmark combining temporal reasoning with reasoning-intensive retrieval across 13 domains. TEMPO features: (1) 1,730 complex queries requiring deep temporal reasoning such as tracking changes, identifying trends, or comparing cross-period evidence; (2) step-wise retrieval planning with 3,976 decomposed steps and gold documents mapped to each step for multi-hop evaluation; and (3) novel temporal metrics including Temporal Coverage@k and Temporal Precision@k measuring whether results span required time periods. Evaluation of 12 retrieval systems reveals substantial challenges: the best model (DiVeR) achieves only 32.0 NDCG@10 and 71.4\% Temporal Coverage@10, demonstrating difficulty in retrieving temporally complete evidence. We believe TEMPO provides a challenging benchmark for improving temporal reasoning in retrieval and RAG systems. Our code and data are available at https://github.com/tempo-bench/Tempo. See also our official website: https://tempo-bench.github.io/.

Terminally constrained flow-based generative models from an optimal control perspective

Authors:Weiguo Gao, Ming Li, Qianxiao Li
Date:2026-01-14 13:32:15

We address the problem of sampling from terminally constrained distributions with pre-trained flow-based generative models through an optimal control formulation. Theoretically, we characterize the value function by a Hamilton-Jacobi-Bellman equation and derive the optimal feedback control as the minimizer of the associated Hamiltonian. We show that as the control penalty increases, the controlled process recovers the reference distribution, while as the penalty vanishes, the terminal law converges to a generalized Wasserstein projection onto the constraint manifold. Algorithmically, we introduce Terminal Optimal Control with Flow-based models (TOCFlow), a geometry-aware sampling-time guidance method for pre-trained flows. Solving the control problem in a terminal co-moving frame that tracks reference trajectories yields a closed-form scalar damping factor along the Riemannian gradient, capturing second-order curvature effects without matrix inversions. TOCFlow therefore matches the geometric consistency of Gauss-Newton updates at the computational cost of standard gradient guidance. We evaluate TOCFlow on three high-dimensional scientific tasks spanning equality, inequality, and global statistical constraints, namely Darcy flow, constrained trajectory planning, and turbulence snapshot generation with Kolmogorov spectral scaling. Across all settings, TOCFlow improves constraint satisfaction over Euclidean guidance and projection baselines while preserving the reference model's generative quality.

Two-Scale Spatial Deployment for Cost-Effective Wireless Networks via Cooperative IRSs and Movable Antennas

Authors:Ying Gao, Qingqing Wu, Ziyuan Zheng, Yanze Zhu, Wen Chen, Xin Lin, Shanpu Shen
Date:2026-01-14 13:16:30

This paper proposes a two-scale spatial deployment strategy to ensure reliable coverage for multiple target areas, integrating macroscopic intelligent reflecting surfaces (IRSs) and fine-grained movable antennas (MAs). Specifically, IRSs are selectively deployed from candidate sites to shape the propagation geometry, while MAs are locally repositioned among discretized locations to exploit small-scale channel variations. The objective is to minimize the total deployment cost of MAs and IRSs by jointly optimizing the IRS site selection, MA positions, transmit precoding, and IRS phase shifts, subject to the signal-to-noise ratio (SNR) requirements for all target areas. This leads to a challenging mixed-integer non-convex optimization problem that is intractable to solve directly. To address this, we first formulate an auxiliary problem to verify the feasibility. A penalty-based double-loop algorithm integrating alternating optimization and successive convex approximation (SCA) is developed to solve this feasibility issue, which is subsequently adapted to obtain a suboptimal solution for the original cost minimization problem. Finally, based on the obtained solution, we formulate an element refinement problem to further reduce the deployment cost, which is solved by a penalty-based SCA algorithm. Simulation results demonstrate that the proposed designs consistently outperform benchmarks relying on independent area planning or full IRS deployment in terms of cost-efficiency. Moreover, for cost minimization, MA architectures are preferable in large placement apertures, whereas fully populated FPA architectures excel in compact ones; for worst-case SNR maximization, MA architectures exhibit a lower cost threshold for feasibility, while FPA architectures can attain peak SNR at a lower total cost.

Video-MSR: Benchmarking Multi-hop Spatial Reasoning Capabilities of MLLMs

Authors:Rui Zhu, Xin Shen, Shuchen Wu, Chenxi Miao, Xin Yu, Yang Li, Weikang Li, Deguo Xia, Jizhou Huang
Date:2026-01-14 12:24:47

Spatial reasoning has emerged as a critical capability for Multimodal Large Language Models (MLLMs), drawing increasing attention and rapid advancement. However, existing benchmarks primarily focus on single-step perception-to-judgment tasks, leaving scenarios requiring complex visual-spatial logical chains significantly underexplored. To bridge this gap, we introduce Video-MSR, the first benchmark specifically designed to evaluate Multi-hop Spatial Reasoning (MSR) in dynamic video scenarios. Video-MSR systematically probes MSR capabilities through four distinct tasks: Constrained Localization, Chain-based Reference Retrieval, Route Planning, and Counterfactual Physical Deduction. Our benchmark comprises 3,052 high-quality video instances with 4,993 question-answer pairs, constructed via a scalable, visually-grounded pipeline combining advanced model generation with rigorous human verification. Through a comprehensive evaluation of 20 state-of-the-art MLLMs, we uncover significant limitations, revealing that while models demonstrate proficiency in surface-level perception, they exhibit distinct performance drops in MSR tasks, frequently suffering from spatial disorientation and hallucination during multi-step deductions. To mitigate these shortcomings and empower models with stronger MSR capabilities, we further curate MSR-9K, a specialized instruction-tuning dataset, and fine-tune Qwen-VL, achieving a +7.82% absolute improvement on Video-MSR. Our results underscore the efficacy of multi-hop spatial instruction data and establish Video-MSR as a vital foundation for future research. The code and data will be available at https://github.com/ruiz-nju/Video-MSR.

Radiomics-Integrated Deep Learning with Hierarchical Loss for Osteosarcoma Histology Classification

Authors:Yaxi Chen, Zi Ye, Shaheer U. Saeed, Oliver Yu, Simin Ni, Jie Huang, Yipeng Hu
Date:2026-01-14 12:09:34

Osteosarcoma (OS) is an aggressive primary bone malignancy. Accurate histopathological assessment of viable versus non-viable tumor regions after neoadjuvant chemotherapy is critical for prognosis and treatment planning, yet manual evaluation remains labor-intensive, subjective, and prone to inter-observer variability. Recent advances in digital pathology have enabled automated necrosis quantification. Evaluating on test data, independently sampled on patient-level, revealed that the deep learning model performance dropped significantly from the tile-level generalization ability reported in previous studies. First, this work proposes the use of radiomic features as additional input in model training. We show that, despite that they are derived from the images, such a multimodal input effectively improved the classification performance, in addition to its added benefits in interpretability. Second, this work proposes to optimize two binary classification tasks with hierarchical classes (i.e. tumor-vs-non-tumor and viable-vs-non-viable), as opposed to the alternative ``flat'' three-class classification task (i.e. non-tumor, non-viable tumor, viable tumor), thereby enabling a hierarchical loss. We show that such a hierarchical loss, with trainable weightings between the two tasks, the per-class performance can be improved significantly. Using the TCIA OS Tumor Assessment dataset, we experimentally demonstrate the benefits from each of the proposed new approaches and their combination, setting a what we consider new state-of-the-art performance on this open dataset for this application. Code and trained models: https://github.com/YaxiiC/RadiomicsOS.git.

ReflexDiffusion: Reflection-Enhanced Trajectory Planning for High-lateral-acceleration Scenarios in Autonomous Driving

Authors:Xuemei Yao, Xiao Yang, Jianbin Sun, Liuwei Xie, Xuebin Shao, Xiyu Fang, Hang Su, Kewei Yang
Date:2026-01-14 11:03:29

Generating safe and reliable trajectories for autonomous vehicles in long-tail scenarios remains a significant challenge, particularly for high-lateral-acceleration maneuvers such as sharp turns, which represent critical safety situations. Existing trajectory planners exhibit systematic failures in these scenarios due to data imbalance. This results in insufficient modelling of vehicle dynamics, road geometry, and environmental constraints in high-risk situations, leading to suboptimal or unsafe trajectory prediction when vehicles operate near their physical limits. In this paper, we introduce ReflexDiffusion, a novel inference-stage framework that enhances diffusion-based trajectory planners through reflective adjustment. Our method introduces a gradient-based adjustment mechanism during the iterative denoising process: after each standard trajectory update, we compute the gradient between the conditional and unconditional noise predictions to explicitly amplify critical conditioning signals, including road curvature and lateral vehicle dynamics. This amplification enforces strict adherence to physical constraints, particularly improving stability during high-lateral-acceleration maneuvers where precise vehicle-road interaction is paramount. Evaluated on the nuPlan Test14-hard benchmark, ReflexDiffusion achieves a 14.1% improvement in driving score for high-lateral-acceleration scenarios over the state-of-the-art (SOTA) methods. This demonstrates that inference-time trajectory optimization can effectively compensate for training data sparsity by dynamically reinforcing safety-critical constraints near handling limits. The framework's architecture-agnostic design enables direct deployment to existing diffusion-based planners, offering a practical solution for improving autonomous vehicle safety in challenging driving conditions.

Field report from Collaborative Research Center 1625: Heterogeneous research data management using ontology representations

Authors:Doaa Mohamed, Samuel García Vázquez, Behnam Mardani, Victor Dudarev, Alfred Ludwig, Maribel Acosta, Markus Stricker
Date:2026-01-14 10:40:10

The goal of the Collaborative Research Center 1625 is the establishment of a scientific basis for the atomic-scale understanding and design of multifunctional compositionally complex solid solution surfaces. Next to materials synthesis in form of thin-film materials libraries, various materials characterization and simulations techniques are used to explore the materials data space of the problem. Machine learning and artificial intelligence techniques guide its exploration and navigation. The effective use of the combined heterogeneous data requires more than just a simple research data management plan. Consequently, our research data management system maps different data modalities in different formats and resolutions from different labs to the correct spatial locations on physical samples. Besides a graphical user interface, the system can also be accessed through an application programming interface for reproducible data-driven workflows. It is implemented by a combination of a custom research data management system designed around a relational database, an ontology which builds upon materials science-specific ontologies, and the construction of a Knowledge Graph. Along with the technical solutions of research data management system and lessons learned, first use cases are shown which were not possible (or at least much harder to achieve) without it.

Monte-Carlo Tree Search with Neural Network Guidance for Lane-Free Autonomous Driving

Authors:Ioannis Peridis, Dimitrios Troullinos, Georgios Chalkiadakis, Pantelis Giankoulidis, Ioannis Papamichail, Markos Papageorgiou
Date:2026-01-14 10:35:21

Lane-free traffic environments allow vehicles to better harness the lateral capacity of the road without being restricted to lane-keeping, thereby increasing the traffic flow rates. As such, we have a distinct and more challenging setting for autonomous driving. In this work, we consider a Monte-Carlo Tree Search (MCTS) planning approach for single-agent autonomous driving in lane-free traffic, where the associated Markov Decision Process we formulate is influenced from existing approaches tied to reinforcement learning frameworks. In addition, MCTS is equipped with a pre-trained neural network (NN) that guides the selection phase. This procedure incorporates the predictive capabilities of NNs for a more informed tree search process under computational constraints. In our experimental evaluation, we consider metrics that address both safety (through collision rates) and efficacy (through measured speed). Then, we examine: (a) the influence of isotropic state information for vehicles in a lane-free environment, resulting in nudging behaviour--vehicles' policy reacts due to the presence of faster tailing ones, (b) the acceleration of performance for the NN-guided variant of MCTS, and (c) the trade-off between computational resources and solution quality.

Research on Piano Timbre Transformation System Based on Diffusion Model

Authors:Chun-Chieh Hsu, Tsai-Ling Hsu, Chen-Chen Yeh, Shao-Chien Lu, Cheng-Han Wu, Bing-Ze Liu, Timothy K. Shih, Yu-Cheng Lin
Date:2026-01-14 10:10:19

We propose a timbre conversion model based on the Diffusion architecture de-signed to precisely translate music played by various instruments into piano ver-sions. The model employs a Pitch Encoder and Loudness Encoder to extract pitch and loudness features of the music, which serve as conditional inputs to the Dif-fusion Model's decoder, generating high-quality piano timbres. Case analysis re-sults show that the model performs excellently in terms of pitch accuracy and timbral similarity, maintaining stable conversion across different musical styles (classical, jazz, pop) and lengths (from short clips to full pieces). Particularly, the model maintains high sound quality and accuracy even when dealing with rapidly changing notes and complex musical structures, demonstrating good generaliza-tion capability. Additionally, the model has the potential for real-time musical conversion and is suitable for live performances and digital music creation tools. Future research will focus on enhancing the handling of loudness dynamics and incorporating additional musical features (such as timbral variations and rhythmic complexity) to improve the model's adaptability and expressiveness. We plan to explore the model's application potential in other timbre conversion tasks, such as converting vocals to instrumental sounds or integration with MIDI digital pianos, further expanding the application scope of the Diffusion-based timbre conversion model in the field of music generation.

Feedback-Based Mobile Robot Navigation in 3-D Environments Using Artificial Potential Functions Technical Report

Authors:Ro'i Lang, Elon Rimon
Date:2026-01-14 09:41:19

This technical report presents the construction and analysis of polynomial navigation functions for motion planning in 3-D workspaces populated by spherical and cylindrical obstacles. The workspace is modeled as a bounded spherical region, and obstacles are encoded using smooth polynomial implicit functions. We establish conditions under which the proposed navigation functions admit a unique non-degenerate minimum at the target while avoiding local minima, including in the presence of pairwise intersecting obstacles. Gradient and Hessian analyses are provided, and the theoretical results are validated through numerical simulations in obstacle rich 3-D environments.

PhyRPR: Training-Free Physics-Constrained Video Generation

Authors:Yibo Zhao, Hengjia Li, Xiaofei He, Boxi Wu
Date:2026-01-14 07:41:56

Recent diffusion-based video generation models can synthesize visually plausible videos, yet they often struggle to satisfy physical constraints. A key reason is that most existing approaches remain single-stage: they entangle high-level physical understanding with low-level visual synthesis, making it hard to generate content that require explicit physical reasoning. To address this limitation, we propose a training-free three-stage pipeline,\textit{PhyRPR}:\textit{Phy\uline{R}eason}--\textit{Phy\uline{P}lan}--\textit{Phy\uline{R}efine}, which decouples physical understanding from visual synthesis. Specifically, \textit{PhyReason} uses a large multimodal model for physical state reasoning and an image generator for keyframe synthesis; \textit{PhyPlan} deterministically synthesizes a controllable coarse motion scaffold; and \textit{PhyRefine} injects this scaffold into diffusion sampling via a latent fusion strategy to refine appearance while preserving the planned dynamics. This staged design enables explicit physical control during generation. Extensive experiments under physics constraints show that our method consistently improves physical plausibility and motion controllability.

Vision-Conditioned Variational Bayesian Last Layer Dynamics Models

Authors:Paul Brunzema, Thomas Lew, Ray Zhang, Takeru Shirasawa, John Subosits, Marcus Greiff
Date:2026-01-14 05:25:18

Agile control of robotic systems often requires anticipating how the environment affects system behavior. For example, a driver must perceive the road ahead to anticipate available friction and plan actions accordingly. Achieving such proactive adaptation within autonomous frameworks remains a challenge, particularly under rapidly changing conditions. Traditional modeling approaches often struggle to capture abrupt variations in system behavior, while adaptive methods are inherently reactive and may adapt too late to ensure safety. We propose a vision-conditioned variational Bayesian last-layer dynamics model that leverages visual context to anticipate changes in the environment. The model first learns nominal vehicle dynamics and is then fine-tuned with feature-wise affine transformations of latent features, enabling context-aware dynamics prediction. The resulting model is integrated into an optimal controller for vehicle racing. We validate our method on a Lexus LC500 racing through water puddles. With vision-conditioning, the system completed all 12 attempted laps under varying conditions. In contrast, all baselines without visual context consistently lost control, demonstrating the importance of proactive dynamics adaptation in high-performance applications.

SafePlanner: Testing Safety of the Automated Driving System Plan Model

Authors:Dohyun Kim, Sanggu Han, Sangmin Woo, Joonha Jang, Jaehoon Kim, Changhun Song, Yongdae Kim
Date:2026-01-14 05:14:00

In this work, we present SafePlanner, a systematic testing framework for identifying safety-critical flaws in the Plan model of Automated Driving Systems (ADS). SafePlanner targets two core challenges: generating structurally meaningful test scenarios and detecting hazardous planning behaviors. To maximize coverage, SafePlanner performs a structural analysis of the Plan model implementation - specifically, its scene-transition logic and hierarchical control flow - and uses this insight to extract feasible scene transitions from code. It then composes test scenarios by combining these transitions with non-player vehicle (NPC) behaviors. Guided fuzzing is applied to explore the behavioral space of the Plan model under these scenarios. We evaluate SafePlanner on Baidu Apollo, a production-grade level 4 ADS. It generates 20635 test cases and detects 520 hazardous behaviors, grouped into 15 root causes through manual analysis. For four of these, we applied patches based on our analysis; the issues disappeared, and no apparent side effects were observed. SafePlanner achieves 83.63 percent function and 63.22 percent decision coverage on the Plan model, outperforming baselines in both bug discovery and efficiency.

Build Code is Still Code: Finding the Antidote for Pipeline Poisoning

Authors:Brent Pappas, Paul Gazzillo
Date:2026-01-13 21:35:18

Open source C code underpins society's computing infrastructure. Decades of work has helped harden C code against attackers, but C projects do not consist of only C code. C projects also contain build system code for automating development tasks like compilation, testing, and packaging. These build systems are critcal to software supply chain security and vulnerable to being poisoned, with the XZ Utils and SolarWinds attacks being recent examples. Existing techniques try to harden software supply chains by verifying software dependencies, but such methods ignore the build system itself. Similarly, classic software security checkers only analyze and monitor program code, not build system code. Moreover, poisoned build systems can easily circumvent tools for detecting program code vulnerabilities by disabling such checks. We present development phase isolation, a novel strategy for hardening build systems against poisoning by modeling the information and behavior permissions of build automation as if it were program code. We have prototyped this approach as a tool called Foreman, which successfully detects and warns about the poisoned test files involved in the XZ Utils attack. We outline our future plans to protect against pipeline poisoning by automatically checking development phase isolation. We envision a future where build system security checkers are as prevalent as program code checkers.

Variance-Penalized MC-Dropout as a Learned Smoothing Prior for Brain Tumour Segmentation

Authors:Satyaki Roy Chowdhury, Golrokh Mirzaei
Date:2026-01-13 19:50:28

Brain tumor segmentation is essential for diagnosis and treatment planning, yet many CNN and U-Net based approaches produce noisy boundaries in regions of tumor infiltration. We introduce UAMSA-UNet, an Uncertainty-Aware Multi-Scale Attention-based Bayesian U-Net that in- stead leverages Monte Carlo Dropout to learn a data-driven smoothing prior over its predictions, while fusing multi-scale features and attention maps to capture both fine details and global context. Our smoothing-regularized loss augments binary cross-entropy with a variance penalty across stochas- tic forward passes, discouraging spurious fluctuations and yielding spatially coherent masks. On BraTS2023, UAMSA- UNet improves Dice Similarity Coefficient by up to 3.3% and mean IoU by up to 2.7% over U-Net; on BraTS2024, it delivers up to 4.5% Dice and 4.0% IoU gains over the best baseline. Remarkably, it also reduces FLOPs by 42.5% rel- ative to U-Net++ while maintaining higher accuracy. These results demonstrate that, by combining multi-scale attention with a learned smoothing prior, UAMSA-UNet achieves both better segmentation quality and computational efficiency, and provides a flexible foundation for future integration with transformer-based modules for further enhanced segmenta- tion results.

Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models

Authors:Youwei Liu, Jian Wang, Hanlin Wang, Beichen Guo, Wenjie Li
Date:2026-01-13 19:49:58

Recent advances in world models have shown promise for modeling future dynamics of environmental states, enabling agents to reason and act without accessing real environments. Current methods mainly perform single-step or fixed-horizon rollouts, leaving their potential for complex task planning under-exploited. We propose Imagine-then-Plan (\texttt{ITP}), a unified framework for agent learning via lookahead imagination, where an agent's policy model interacts with the learned world model, yielding multi-step ``imagined'' trajectories. Since the imagination horizon may vary by tasks and stages, we introduce a novel adaptive lookahead mechanism by trading off the ultimate goal and task progress. The resulting imagined trajectories provide rich signals about future consequences, such as achieved progress and potential conflicts, which are fused with current observations, formulating a partially \textit{observable} and \textit{imaginable} Markov decision process to guide policy learning. We instantiate \texttt{ITP} with both training-free and reinforcement-trained variants. Extensive experiments across representative agent benchmarks demonstrate that \texttt{ITP} significantly outperforms competitive baselines. Further analyses validate that our adaptive lookahead largely enhances agents' reasoning capability, providing valuable insights into addressing broader, complex tasks.

Expanding the High-z Supernova Frontier: "Wide-Area" JWST Discoveries from the First Two Years of COSMOS-Web

Authors:Ori D. Fox, Armin Rest, Justin D. R. Pierel, David A. Coulter, Caitlin M. Casey, Jeyhan S. Kartaltepe, Hollis B. Akins, Maximilien Franco, Mike Engesser, Conor Larison, Takashi J. Moriya, Robert M. Quimby, Marko Shuntov, Matthew R. Siebert, Christa DeCoursey, James M. DerKacy, Nicole E. Drakos, Eiichi Egami, Steven L. Finkelstein, Carter Flayhart, Seiji Fujimoto, Estefania Padilla Gonzalez, Massimo Griggio, Santosh Harish, Olivier Ilbert, Kohei Inayoshi, Anton M. Koekemoer, Vasily Kokorev, Clotilde Laigle, Erini Lambrides, Rebecca L. Larson, Daizhong Liu, Georgios E. Magdis, Jacqueline E. McCleary, Henry J. McCracken, Nicolas McMahon, Jed McKinney, Thomas Moore, Louise Paquereau, Jason Rhodes, Brant E. Robertson, David B. Sanders, Sogol Sanjaripour, Koji Shukawa, Louis-Gregory Strolger, Sune Toft, Qinan Wang, Robert E. Williams, Yossef Zenati
Date:2026-01-13 19:10:21

Transient astronomy in the early Universe (z > 2) remains largely unexplored, lying beyond the rest-frame optical spectroscopic reach of most current observatories. Yet this regime promises transformative insights, with high-redshift transients providing direct access to the early Universe and enabling studies of how stellar populations and cosmology evolve over cosmic time. JWST is uniquely equipped to probe these redshifts efficiently in the rest-frame optical and near-IR. We present results from an initial pathfinder search, covering an area of ~133 arcmin^2 (~0.037 deg^2) independently imaged by the PRIMER and COSMOS-Web (hereafter COSMOS) extragalactic surveys. Although neither program was designed for time-domain astronomy, combining their data results in difference images separated by roughly one year, leading to the discovery of 68 supernovae (SNe) with host photometric redshifts reaching z < 5. For most SNe, only a single epoch is available, but the combination of host redshift, classification, color, and magnitude enables us to prioritize candidates for detailed photometric and spectroscopic follow-up. Among the most notable sources are a relatively bright, blue CCSN at z > 3 (SN 2023aeab) and a young, normal SN Ia at z > 2 (SN 2023aeax). The sample distribution highlights the increasing likelihood that a wide-area JWST program can uncover younger, bluer, and potentially more extreme explosions. While this pathfinder effort is limited in cadence and number of filters, it demonstrates the strong potential of a dedicated, well-planned time-domain survey with JWST to obtain the sample sizes and rate measurements needed to chart SN populations deep into the early Universe.

RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis

Authors:Zhengwei Tao, Bo Li, Jialong Wu, Guochen Yan, Huanyao Zhang, Jiahao Xu, Haitao Mi, Wentao Zhang
Date:2026-01-13 16:25:07

Agentic Retrieval-Augmented Generation (RAG) empowers large language models to autonomously plan and retrieve information for complex problem-solving. However, the development of robust agents is hindered by the scarcity of high-quality training data that reflects the noise and complexity of real-world retrieval environments. Conventional manual annotation is unscalable and often fails to capture the dynamic reasoning strategies required to handle retrieval failures. To bridge this gap, we introduce RAGShaper, a novel data synthesis framework designed to automate the construction of RAG tasks and robust agent trajectories. RAGShaper incorporates an InfoCurator to build dense information trees enriched with adversarial distractors spanning Perception and Cognition levels. Furthermore, we propose a constrained navigation strategy that forces a teacher agent to confront these distractors, thereby eliciting trajectories that explicitly demonstrate error correction and noise rejection. Comprehensive experiments confirm that models trained on our synthesized corpus significantly outperform existing baselines, exhibiting superior robustness in noise-intensive and complex retrieval tasks.

VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory

Authors:Shaoan Wang, Yuanfei Luo, Xingyu Chen, Aocheng Luo, Dongyue Li, Chang Liu, Sheng Chen, Yangang Zhang, Junzhi Yu
Date:2026-01-13 15:43:43

VLA models have shown promising potential in embodied navigation by unifying perception and planning while inheriting the strong generalization abilities of large VLMs. However, most existing VLA models rely on reactive mappings directly from observations to actions, lacking the explicit reasoning capabilities and persistent memory required for complex, long-horizon navigation tasks. To address these challenges, we propose VLingNav, a VLA model for embodied navigation grounded in linguistic-driven cognition. First, inspired by the dual-process theory of human cognition, we introduce an adaptive chain-of-thought mechanism, which dynamically triggers explicit reasoning only when necessary, enabling the agent to fluidly switch between fast, intuitive execution and slow, deliberate planning. Second, to handle long-horizon spatial dependencies, we develop a visual-assisted linguistic memory module that constructs a persistent, cross-modal semantic memory, enabling the agent to recall past observations to prevent repetitive exploration and infer movement trends for dynamic environments. For the training recipe, we construct Nav-AdaCoT-2.9M, the largest embodied navigation dataset with reasoning annotations to date, enriched with adaptive CoT annotations that induce a reasoning paradigm capable of adjusting both when to think and what to think about. Moreover, we incorporate an online expert-guided reinforcement learning stage, enabling the model to surpass pure imitation learning and to acquire more robust, self-explored navigation behaviors. Extensive experiments demonstrate that VLingNav achieves state-of-the-art performance across a wide range of embodied navigation benchmarks. Notably, VLingNav transfers to real-world robotic platforms in a zero-shot manner, executing various navigation tasks and demonstrating strong cross-domain and cross-task generalization.

Percentile-based probabilistic optimization for systematic and random uncertainties in radiation therapy

Authors:Albin Fredriksson, Erik Engwall, Jenneke de Jong, Johan Sundström
Date:2026-01-13 15:15:43

Geometric uncertainty can degrade treatment quality in radiation therapy. While margins and robust optimization mitigate these effects, they provide only implicit control over clinical goal fulfillment probability. We therefore develop a probabilistic planning framework using a percentile-based optimization function that targets a specified probability of clinical goal fulfillment. Systematic and random uncertainties were explicitly modeled over full treatment courses. A scenario dose approximation method based on interpolation between a fixed set of doses was used, enabling efficient simulation of treatment courses during optimization. The framework was evaluated on a prostate case treated with volumetric-modulated arc therapy (VMAT) and a brain case treated with pencil beam scanning (PBS) proton therapy. Plans were compared to conventional margin-based and worst-case robust optimization using probabilistic evaluation. For the prostate case, probabilistic optimization improved organ at risk (OAR) sparing while maintaining target coverage compared to margin-based planning, increasing average OAR goal fulfillment probability by 13.3 percentage points and reducing 90th percentile OAR doses by an average of 3.5~Gy. For the brain case, probabilistic optimization improved target minimum dose passing probabilities (e.g., 88\% vs.~22\% for $D_{95}$) and brainstem maximum dose passing probability (70\% vs.~30\%), while maintaining comparable or improved OAR sparing compared to worst-case optimization. Probabilistic optimization enables explicit and interpretable control over goal fulfillment probabilities. Combining full treatment course modeling with efficient approximate dose calculation, the proposed framework improved the trade-off between target coverage and OAR sparing compared to conventional planning approaches in both photon and proton therapy.

Cities at Play: Improving Equilibria in Urban Neighbourhood Games

Authors:Martin Gairing, Adrian Vetta, Zhanzhan Zhao
Date:2026-01-13 15:14:45

How should cities invest to improve social welfare when individuals respond strategically to local conditions? We model this question using a game-theoretic version of Schelling's bounded neighbourhood model, where agents choose neighbourhoods based on concave, non-monotonic utility functions reflecting local population. While naive improvements may worsen outcomes - analogous to Braess' paradox - we show that carefully designed, small-scale investments can reliably align individual incentives with societal goals. Specifically, modifying utilities at a total cost of at most $0.81 ε^2 \cdot \texttt{opt}$ guarantees that every resulting Nash equilibrium achieves a social welfare of at least $ε\cdot \texttt{opt}$, where $\texttt{opt}$ is the optimum social welfare. Our results formalise how targeted interventions can transform supra-negative outcomes into supra-positive returns, offering new insights into strategic urban planning and decentralised collective behaviour.

Rewriting Video: Text-Driven Reauthoring of Video Footage

Authors:Sitong Wang, Anh Truong, Lydia B. Chilton, Dingzeyu Li
Date:2026-01-13 13:49:05

Video is a powerful medium for communication and storytelling, yet reauthoring existing footage remains challenging. Even simple edits often demand expertise, time, and careful planning, constraining how creators envision and shape their narratives. Recent advances in generative AI suggest a new paradigm: what if editing a video were as straightforward as rewriting text? To investigate this, we present a tech probe and a study on text-driven video reauthoring. Our approach involves two technical contributions: (1) a generative reconstruction algorithm that reverse-engineers video into an editable text prompt, and (2) an interactive probe, Rewrite Kit, that allows creators to manipulate these prompts. A technical evaluation of the algorithm reveals a critical human-AI perceptual gap. A probe study with 12 creators surfaced novel use cases such as virtual reshooting, synthetic continuity, and aesthetic restyling. It also highlighted key tensions around coherence, control, and creative alignment in this new paradigm. Our work contributes empirical insights into the opportunities and challenges of text-driven video reauthoring, offering design implications for future co-creative video tools.

JudgeRLVR: Judge First, Generate Second for Efficient Reasoning

Authors:Jiangshan Duo, Hanyu Li, Hailin Zhang, Yudong Wang, Sujian Li, Liang Zhao
Date:2026-01-13 11:47:42

Reinforcement Learning with Verifiable Rewards (RLVR) has become a standard paradigm for reasoning in Large Language Models. However, optimizing solely for final-answer correctness often drives models into aimless, verbose exploration, where they rely on exhaustive trial-and-error tactics rather than structured planning to reach solutions. While heuristic constraints like length penalties can reduce verbosity, they often truncate essential reasoning steps, creating a difficult trade-off between efficiency and verification. In this paper, we argue that discriminative capability is a prerequisite for efficient generation: by learning to distinguish valid solutions, a model can internalize a guidance signal that prunes the search space. We propose JudgeRLVR, a two-stage judge-then-generate paradigm. In the first stage, we train the model to judge solution responses with verifiable answers. In the second stage, we fine-tune the same model with vanilla generating RLVR initialized from the judge. Compared to Vanilla RLVR using the same math-domain training data, JudgeRLVR achieves a better quality--efficiency trade-off for Qwen3-30B-A3B: on in-domain math, it delivers about +3.7 points average accuracy gain with -42\% average generation length; on out-of-domain benchmarks, it delivers about +4.5 points average accuracy improvement, demonstrating enhanced generalization.

CoMa: Contextual Massing Generation with Vision-Language Models

Authors:Evgenii Maslov, Valentin Khrulkov, Anastasia Volkova, Anton Gusarov, Andrey Kuznetsov, Ivan Oseledets
Date:2026-01-13 11:44:00

The conceptual design phase in architecture and urban planning, particularly building massing, is complex and heavily reliant on designer intuition and manual effort. To address this, we propose an automated framework for generating building massing based on functional requirements and site context. A primary obstacle to such data-driven methods has been the lack of suitable datasets. Consequently, we introduce the CoMa-20K dataset, a comprehensive collection that includes detailed massing geometries, associated economical and programmatic data, and visual representations of the development site within its existing urban context. We benchmark this dataset by formulating massing generation as a conditional task for Vision-Language Models (VLMs), evaluating both fine-tuned and large zero-shot models. Our experiments reveal the inherent complexity of the task while demonstrating the potential of VLMs to produce context-sensitive massing options. The dataset and analysis establish a foundational benchmark and highlight significant opportunities for future research in data-driven architectural design.

Effective outdoor pathloss prediction: A multi-layer segmentation approach with weighting map

Authors:Yuan Gao, Tao Wen, Wenjing Xie, Jianbo Du, Yong Zeng, Dusit Niyato, Shugong Xu
Date:2026-01-13 11:07:36

Predicting pathloss by considering the physical environment is crucial for effective wireless network planning. Traditional methods, such as ray tracing and model-based approaches, often face challenges due to high computational complexity and discrepancies between models and real-world environments. In contrast, deep learning has emerged as a promising alternative, offering accurate path loss predictions with reduced computational complexity. In our research, we introduce a ResNet-based model designed to enhance path loss prediction. We employ innovative techniques to capture key features of the environment by generating transmission (Tx) and reception (Rx) depth maps, as well as a distance map from the geographic data. Recognizing the significant attenuation caused by signal reflection and diffraction, particularly at high frequencies, we have developed a weighting map that emphasizes the areas adjacent to the direct path between Tx and Rx for path loss prediction. {Extensive simulations demonstrate that our model outperforms PPNet, RPNet, and Vision Transformer (ViT) by 1.2-3.0 dB using dataset of ITU challenge 2024 and ICASSP 2023. In addition, the floating point operations (FLOPs) of the proposed model is 60\% less than those of benchmarks.} Additionally, ablation studies confirm that the inclusion of the weighting map significantly enhances prediction performance.

Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?

Authors:Long Zhang, Yuchen Xia, Bingqing Wei, Zhen Liu, Shiwen Mao, Zhu Han, Mohsen Guizani
Date:2026-01-13 11:05:12

The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization. We starts by introducing the foundational principles of EI driving and LMMs. Moreover, we examine the emerging opportunities this framework enables, encompassing potential benefits and representative use cases. A case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task. Finally, several future research directions to empower EI driving are identified to guide subsequent work.

Planet-Host Stars Across the Galaxy in the 2040s

Authors:M. Tsantaki, K. Biazzo, F. Zahra Majidi, G. Tautvaisiene, I. Busa
Date:2026-01-13 10:20:31

By the 2040s, the exoplanet field will have moved from the discovery of a few thousand planets to hundreds of thousands, thanks to Gaia DR5, TESS, PLATO, Roman, and their successors. At that stage, the key bottleneck will no longer be planet detection, but our ability to understand how planetary systems form, evolve, and diversify across different stellar and Galactic environments. To address this, we need a large-scale, high-resolution spectroscopic survey of planet-host stars, spanning a broad range of Galactic environments (thin and thick disks, bulge, halo, clusters, associations), and including a well-defined control sample of non-hosts. Such a survey must deliver homogeneous stellar parameters, detailed abundance determinations, ages, and kinematics for tens of thousands of hosts, extending to the faint magnitudes probed by future missions but are beyond the reach of existing and currently planned spectroscopic facilities.

Kantorovich Distance via Spanning Trees: Properties and Algorithms

Authors:Jérémie Bigot, Luis Fredes
Date:2026-01-13 10:02:59

We study optimal transport between probability measures supported on the same finite metric space, where the ground cost is a distance induced by a weighted connected graph. Building on recent work showing that the resulting Kantorovich distance can be expressed as a minimization problem over the set of spanning trees of this underlying graph, we investigate the implications of this reformulation on the construction of an optimal transport plan and a dual potential based on the solution of such an optimization problem. In this setting, we derive an explicit formula for the Kantorovich potential in terms of the imbalanced cumulative mass (a generalization of the cumulative distribution in R) along an optimal spanning tree solving such a minimization problem, under a weak non-degeneracy condition on the pair of measures that guarantees the uniqueness of a dual potential. Our second contribution establishes the existence of an optimal transport plan that can be computed efficiently by a dynamic programming procedure once an optimal spanning tree is known. Finally, we propose a stochastic algorithm based on simulated annealing on the space of spanning trees to compute such an optimal spanning tree. Numerical experiments illustrate the theoretical results and demonstrate the practical relevance of the proposed approach for optimal transport on finite metric spaces.

Evaluating the effectiveness of radio frequency interference removal algorithms for single pulse searches

Authors:R. S. Hombal, L. Levin, B. W. Stappers, M. Droog, A. Karastergiou, D. Lumbaa, M. B. Mickaliger, A. Naidu, K. M. Rajwade, J. Sepulveda, B. Shaw, S. Singh, T. Prabu
Date:2026-01-13 09:10:33

Radio Frequency Interference (RFI), the presence of artificial and/or terrestrial signals in astronomical data, poses a great challenge to the search for pulsars and radio transients, such as Rotating Radio Transients (RRATs) and Fast Radio Bursts (FRBs), by obscuring or distorting the signal of interest and resulting in large numbers of erroneous detections. RFI mitigation algorithms aim to remove this interference and improve the chance of detection of transients, but with the growing number of techniques, selecting the most appropriate method for a given survey can be problematic. The choice of method is particularly important in real-time searches planned for next-generation telescopes such as those of the SKAO, where there is no possibility to reprocess the data. In this paper, we explore the algorithm selection problem by injecting pulses into data which simulates several RFI environments. A set of these files is then cleaned using RFI mitigation algorithms and run through a single pulse search pipeline to analyse the recovery of the injected pulses. We examine the recovery of the injected single pulses with an emphasis on a number of cases spanning a range of pulse brightness, width and dispersion measure. The efficacy and side effects of a few popular RFI excision methods, namely IQRM, SKF, and ZDMF are evaluated.