planning - 2026-04-17

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Authors:Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo

Date:2026-04-16 17:59:49

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Authors:Hao Gao, Shaoyu Chen, Yifan Zhu, Yuehao Song, Wenyu Liu, Qian Zhang, Xinggang Wang

Date:2026-04-16 17:59:44

High-level autonomous driving requires motion planners capable of modeling multimodal future uncertainties while remaining robust in closed-loop interactions. Although diffusion-based planners are effective at modeling complex trajectory distributions, they often suffer from stochastic instabilities and the lack of corrective negative feedback when trained purely with imitation learning. To address these issues, we propose RAD-2, a unified generator-discriminator framework for closed-loop planning. Specifically, a diffusion-based generator is used to produce diverse trajectory candidates, while an RL-optimized discriminator reranks these candidates according to their long-term driving quality. This decoupled design avoids directly applying sparse scalar rewards to the full high-dimensional trajectory space, thereby improving optimization stability. To further enhance reinforcement learning, we introduce Temporally Consistent Group Relative Policy Optimization, which exploits temporal coherence to alleviate the credit assignment problem. In addition, we propose On-policy Generator Optimization, which converts closed-loop feedback into structured longitudinal optimization signals and progressively shifts the generator toward high-reward trajectory manifolds. To support efficient large-scale training, we introduce BEV-Warp, a high-throughput simulation environment that performs closed-loop evaluation directly in Bird's-Eye View feature space via spatial warping. RAD-2 reduces the collision rate by 56% compared with strong diffusion-based planners. Real-world deployment further demonstrates improved perceived safety and driving smoothness in complex urban traffic.

Think in Latent Thoughts: A New Paradigm for Gloss-Free Sign Language Translation

Authors:Yiyang Jiang, Li Zhang, Xiao-Yong Wei, Li Qing

Date:2026-04-16 17:57:24

Many SLT systems quietly assume that brief chunks of signing map directly to spoken-language words. That assumption breaks down because signers often create meaning on the fly using context, space, and movement. We revisit SLT and argue that it is mainly a cross-modal reasoning task, not just a straightforward video-to-text conversion. We thus introduce a reasoning-driven SLT framework that uses an ordered sequence of latent thoughts as an explicit middle layer between the video and the generated text. These latent thoughts gradually extract and organize meaning over time. On top of this, we use a plan-then-ground decoding method: the model first decides what it wants to say, and then looks back at the video to find the evidence. This separation improves coherence and faithfulness. We also built and released a new large-scale gloss-free SLT dataset with stronger context dependencies and more realistic meanings. Experiments across several benchmarks show consistent gains over existing gloss-free methods. Code and data will be released upon acceptance at https://github.com/fletcherjiang/SignThought.

Studies of the Modular COsmic Ray Detector (MCORD) using an automatic temperature control loop to maintain constant gain parameters of semiconductor SiPM photomultipliers

Authors:M. Bielewicz, M. Kiecana, A. Bancer, J. Grzyb, M. Grodzicka-Kobylka, T. Szczesniak, K. Kopanski, W. Noga, L. Kazmierczak, G. Saworska, A. Broslawski, P. Mazerewicz, E. Jaworska

Date:2026-04-16 17:26:37

The MCORD detector is a modular scintillator-based system employing silicon photomultipliers (SiPMs) and FPGA-based digital signal processing, designed for applications such as cosmic muon detection, veto systems, and detector calibration support. In this work, we investigate the influence of ambient temperature variations on detector performance, with particular emphasis on SiPM gain stability. Several automatic temperature compensation loops were implemented to stabilize the operating voltage of the sensors. Based on controlled laboratory measurements, we evaluate the effectiveness of different control strategies, including variations in temperature averaging time and threshold response criteria. The performance of each approach is compared in terms of gain stability and response dynamics. We identify the optimal temperature control configuration for planned MCORD measurements and present recent modifications to the detector electronics, including updated software for AFE control. Additionally, we describe modifications made to the detectors electronics since the previous publication, including new software developed to control the AFE electronics.

Agentic Microphysics: A Manifesto for Generative AI Safety

Authors:Federico Pierucci, Matteo Prandi, Marcantonio Bracale Syrnikov, Marcello Galisai, Piercosma Bisconti

Date:2026-04-16 17:11:28

This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among agents, through processes of communication, observation, and mutual influence that shape collective behaviour over time. As the object of analysis shifts, a methodological gap emerges. Approaches focused either on single agents or on aggregate outcomes do not identify the interaction-level mechanisms that generate collective risks or the design variables that control them. A framework is required that links local interaction structure to population-level dynamics in a causally explicit way, allowing both explanation and intervention. We introduce two linked concepts. Agentic microphysics defines the level of analysis: local interaction dynamics where one agent's output becomes another's input under specific protocol conditions. Generative safety defines the methodology: growing phenomena and elicit risks from micro-level conditions to identify sufficient mechanisms, detect thresholds, and design effective interventions.

Pricing Electric Vehicle Charging and Station Access via Copositive Duality

Authors:Nanfei Jiang, Yi Zhou, Josh A. Taylor, Mahnoosh Alizadeh

Date:2026-04-16 17:05:41

Optimized charging of electric vehicles (EVs) at public locations consists of two decisions: how much energy to deliver at what times, which is continuous, and where to plug in, which is binary. This makes optimizing EV charging a mixed-integer linear program (MILP). This discreteness undermines traditional marginal pricing methods. In this paper, we develop the first marginal-price-based mechanism for pricing EV charging with binary station access constraints. Using the result of Burer (2009), we express the EV charging as a completely positive program (CPP), whose dual is a copositive program (COP). This convex dual admits valid shadow prices even though the original allocation problem is discrete and nonconvex. By interpreting the COP dual variables as marginal prices, we construct a pricing mechanism that captures EV supply equipment (EVSE) congestion as well as charging-capacity limits. We prove that the resulting mechanism is revenue-adequate for the operator and individually rational for every EV user, in the strong sense that each user maximizes their own welfare by accepting their assigned charging plan rather than deviating to any alternative option. We further develop problem-specific inner-approximation and dimension-reduction techniques that substantially improve the computational tractability of solving the COP in our setting. Numerical experiments on both small and large scale charging instances demonstrate that our pricing mechanism captures discrete congestion effects and aligns user incentives with the system-optimal assignment, outperforming time-of-use (TOU) and convex relaxation benchmarks.

Benchmarking Classical Coverage Path Planning Heuristics on Irregular Hexagonal Grids for Maritime Coverage Scenarios

Authors:Carlos S. Sepúlveda, Gonzalo A. Ruz

Date:2026-04-16 16:28:03

Coverage path planning on irregular hexagonal grids is relevant to maritime surveillance, search and rescue and environmental monitoring, yet classical methods are often compared on small ad hoc examples or on rectangular grids. This paper presents a reproducible benchmark of deterministic single-vehicle coverage path planning heuristics on irregular hexagonal graphs derived from synthetic but maritime-motivated areas of interest. The benchmark contains 10,000 Hamiltonian-feasible instances spanning compact, elongated, and irregular morphologies, 17 heuristics from seven families, and a common evaluation protocol covering Hamiltonian success, complete-coverage success, revisits, path length, heading changes, and CPU latency. Across the released dataset, heuristics with explicit shortest-path reconnection solve the relaxed coverage task reliably but almost never produce zero-revisit tours. Exact Depth-First Search confirms that every released instance is Hamiltonian-feasible. The strongest classical Hamiltonian baseline is a Warnsdorff variant that uses an index-based tie-break together with a terminal-inclusive residual-degree policy, reaching 79.0% Hamiltonian success. The dominant design choice is not tie-breaking alone, but how the residual degree is defined when the endpoint is reserved until the final move. This shows that underreported implementation details can materially affect performance on sparse geometric graphs with bottlenecks. The benchmark is intended as a controlled testbed for heuristic analysis rather than as a claim of operational optimality at fleet scale.

Symmetry Preserving Contact Interaction Approaches: An Overview of Meson and Diquark Form Factors

Authors:L. X. Gutiérrez-Guerrero, Roger José Hernández-Pinto

Date:2026-04-16 15:10:25

We present an updated overview of the symmetry preserving Contact Interaction model in hadronic physics, developed a little over a decade ago to describe the mass spectrum and internal structure of mesons and diquarks composed of light and heavy quarks. Over the years, the Contact Interaction has evolved into a framework capable of treating both ground and excited states, providing a simple yet consistent approach to nonperturbative QCD. In this review, we examine the mass spectrum and elastic form factors of forty mesons with different spins and parities, together with their corresponding diquark partners. Importantly, we update the comparison of Contact Interaction predictions using recent results from the literature, offering a fresh perspective on the model's performance, strengths, and limitations. The analysis presented here refines previous conclusions and supports the Contact Interaction as a practical tool for hadron structure studies, with potential applications to baryons and multiquark states. We also present comparisons with other theoretical models and approaches, including lattice quantum chromodynamics, and comment on future prospects in view of ongoing and planned hadron structure experimental programs. In particular, forthcoming measurements at FAIR, together with future studies at Jefferson Lab and the Electron Ion Collider, are expected to provide key insights into hadron structure, with FAIR offering indirect constraints via hadron spectroscopy, hadronic interactions, and in-medium properties, while high-precision data on meson structure and form factors from Jefferson Lab and the Electron Ion Collider will provide valuable benchmarks to confront Contact Interaction based predictions.

Amortized Optimal Transport from Sliced Potentials

Authors:Minh-Phuc Truong, Khai Nguyen

Date:2026-04-16 15:05:59

We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich potentials derived from sliced OT. We introduce two amortization strategies: regression-based amortization (RA-OT) and objective-based amortization (OA-OT). In RA-OT, we formulate a functional regression model that treats Kantorovich potentials from the original OT problem as responses and those obtained from sliced OT as predictors, and estimate these models via least-squares methods. In OA-OT, we estimate the parameters of the functional model by optimizing the Kantorovich dual objective. In both approaches, the predicted OT plan is subsequently recovered from the estimated potentials. As amortized OT methods, both RA-OT and OA-OT enable efficient solutions to repeated OT problems across different measure pairs by reusing information learned from prior instances to rapidly approximate new solutions. Moreover, by exploiting the structure provided by sliced OT, the proposed models are more parsimonious, independent of specific structures of the measures, such as the number of atoms in the discrete case, while achieving high accuracy. We demonstrate the effectiveness of our approaches on tasks including MNIST digit transport, color transfer, supply-demand transportation on spherical data, and mini-batch OT conditional flow matching.

NEAT-NC: NEAT guided Navigation Cells for Robot Path Planning

Authors:Hibatallah Meliani, Khadija Slimani, Samira Khoulji

Date:2026-04-16 14:40:46

To navigate a space, the brain makes an internal representation of the environment using different cells such as place cells, grid cells, head direction cells, border cells, and speed cells. All these cells, along with sensory inputs, enable an organism to explore the space around it. Inspired by these biological principles, we developed NEATNC, a Neuro-Evolution of Augmenting Topology guided Navigation Cells. The goal of the paper is to improve NEAT algorithm performance in path planning in dynamic environments using spatial cognitive cells. This approach uses navigation cells as inputs and evolves recurrent neural networks, representing the hippocampus part of the brain. The performance of the proposed algorithm is evaluated in different static and dynamic scenarios. This study highlights NEAT's adaptability to complex and different environments, showcasing the utility of biological theories. This suggests that our approach is well-suited for real-time dynamic path planning for robotics and games.

Trajectory Planning for a Multi-UAV Rigid-Payload Cascaded Transportation System Based on Enhanced Tube-RRT*

Authors:Jianqiao Yu, Jia Li, Tianhua Gao

Date:2026-04-16 14:39:28

This paper presents a two-stage trajectory planning framework for a multi-UAV rigid-payload cascaded transportation system, aiming to address planning challenges in densely cluttered environments. In Stage I, an Enhanced Tube-RRT* algorithm is developed by integrating active hybrid sampling and an adaptive expansion strategy, enabling rapid generation of a safe and feasible virtual tube in environments with dense obstacles. Moreover, a trajectory smoothness cost is explicitly incorporated into the edge cost to reduce excessive turns and thereby mitigate cable-induced oscillations. Simulation results demonstrate that the proposed Enhanced Tube-RRT* achieves a higher success rate and effective sampling rate than mixed-sampling Tube-RRT* (STube-RRT*) and adaptive-extension Tube-RRT* (AETube-RRT*), while producing a shorter optimal path with a smaller cumulative turning angle. In Stage II, a convex quadratic program is formulated by considering payload translational and rotational dynamics, cable tension constraints, and collision-safety constraints, yielding a smooth, collision-free desired payload trajectory. Finally, a centralized geometric control scheme is applied to the cascaded system to validate the effectiveness and feasibility of the proposed planning framework, offering a practical solution for payload attitude maneuvering in densely cluttered environments.

What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

Authors:Éric Jacopin

Date:2026-04-16 13:38:34

When do transformers commit to a decision, and what prevents them from correcting it? We introduce \textbf{prolepsis}: a transformer commits early, task-specific attention heads sustain the commitment, and no layer corrects it. Replicating \citeauthor{lindsey2025biology}'s (\citeyear{lindsey2025biology}) planning-site finding on open models (Gemma~2 2B, Llama~3.2 1B), we ask five questions. (Q1)~Planning is invisible to six residual-stream methods; CLTs are necessary. (Q2)~The planning-site spike replicates with identical geometry. (Q3)~Specific attention heads route the decision to the output, filling a gap flagged as invisible to attribution graphs. (Q4)~Search requires ${\leq}16$ layers; commitment requires more. (Q5)~Factual recall shows the same motif at a different network depth, with zero overlap between recurring planning heads and the factual top-10. Prolepsis is architectural: the template is shared, the routing substrates differ. All experiments run on a single consumer GPU (16\,GB VRAM).

On-Line Policy Iteration with Trajectory-Driven Policy Generation

Authors:Yuchao Li, Fei Chen, Yingke Li, Chuchu Fan, Dimitri Bertsekas

Date:2026-04-16 13:29:35

We consider deterministic finite-horizon optimal control problems with a fixed initial state. We introduce an on-line policy iteration method, which starting from a given policy, however obtained, generates a sequence of cost improving policies and corresponding trajectories. Each policy produces a trajectory, which is used in turn to generate data for training the next policy. The method is motivated by problems that are repeatedly solved starting from the same initial state, including discrete optimization and path planning for repetitive tasks. For such problems, the method is fast enough to be used on-line. Under a natural consistency condition, we show that the sequence of costs of the generated policies is monotonically improving for the given initial state (but not necessarily for other states). We illustrate our results with computational studies from combinatorial optimization and 3-dimensional path planning for drones in the presence of obstacles. We also discuss briefly a stochastic counterpart of our algorithm. Our proposed framework combines elements of rollout and policy iteration with flexible trajectory-based policy representations, and applies to problems involving a single as well as multiple decision makers. It also provides a principled way to train neural network-based policies using trajectory data, while preserving monotonic cost improvement.

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

Authors:Yuting Zeng, Zhiwen Zheng, Jingya Wang, You Zhou, JiaLing Xiao, Yongbin Yu, Manping Fan, Bo Gong, Liyong Ren

Date:2026-04-16 13:16:00

Safe and efficient assistive planning for visually impaired scenarios remains challenging, since existing methods struggle with multi-objective optimization, generalization, and interpretability. In response, this paper proposes a Momentum-Constrained Hybrid Heuristic Trajectory Optimization Framework (MHHTOF). To balance multiple objectives of comfort and safety, the framework designs a Heuristic Trajectory Sampling Cluster (HTSC) with a Momentum-Constrained Trajectory Optimization (MTO), which suppresses abrupt velocity and acceleration changes. In addition, a novel residual-enhanced deep reinforcement learning (DRL) module refines candidate trajectories, advancing temporal modeling and policy generalization. Finally, a dual-stage cost modeling mechanism (DCMM) is introduced to regulate optimization, where costs in the Frenet space ensure consistency, and reward-driven adaptive weights in the Cartesian space integrate user preferences for interpretability and user-centric decision-making. Experimental results show that the proposed framework converges in nearly half the iterations of baselines and achieves lower and more stable costs. In complex dynamic scenarios, MHHTOF further demonstrates stable velocity and acceleration curves with reduced risk, confirming its advantages in robustness, safety, and efficiency.

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

Authors:Jean-Bastien Grill, Michal Valko, Rémi Munos

Date:2026-04-16 13:07:35

You are a robot and you live in a Markov decision process (MDP) with a finite or an infinite number of transitions from state-action to next states. You got brains and so you plan before you act. Luckily, your roboparents equipped you with a generative model to do some Monte-Carlo planning. The world is waiting for you and you have no time to waste. You want your planning to be efficient. Sample-efficient. Indeed, you want to exploit the possible structure of the MDP by exploring only a subset of states reachable by following near-optimal policies. You want guarantees on sample complexity that depend on a measure of the quantity of near-optimal states. You want something, that is an extension of Monte-Carlo sampling (for estimating an expectation) to problems that alternate maximization (over actions) and expectation (over next states). But you do not want to StOP with exponential running time, you want something simple to implement and computationally efficient. You want it all and you want it now. You want TrailBlazer.

What if we have 90 minutes only to teach programming?

Authors:Attila Egri-Nagy

Date:2026-04-16 12:36:44

The advancement of automated coding tools may reduce in the future the number of people willing to learn computer programming. We assume that the skill of computational problem solving is not only for the immediate economic benefit, but an important part of our knowledge about the world. As the incentives to learn are weaker, we aim to lower the entry barrier. We ask: Can programming ideas be taught in a very short time? We describe a session plan that introduces programming and computing fundamentals for novices, assuming only basic mathematical background. It requires using a non-mainstream, functional and concatenative language reducing accidental complexity. This language, a by-product of research in category theory, allows direct access to fundamental ideas like recursion and advanced like Gödel-encoding in an entertaining puzzle-like manner.

Impact of deployment on energy efficiency of sub-THz transmission

Authors:Hardy Halbauer, Le Hang Nguyen, Thorsten Wild

Date:2026-04-16 11:25:14

Sub-THz bands are promising high bandwidth and data rates, and in the recent years the device technologies made large progress and provided a multitude of transceiver, power amplifier (PA) and phased array devices supporting the frequency bands above 100 GHz. The more painful aspect of sub-THz transmission is the increased power consumption, caused by the large data rates and the related data conversion and processing effort, and on the analog side the low achievable PA efficiency and the reduced achievable output power. When planning a deployment of sub-THz communication systems, the target coverage and throughput can be achieved with a variety of scenarios, which will be different with respect to locations and number of base stations and system architectures. Although leading to similar performance, they will differ significantly in the overall power consumption. With an accurate power consumption model, including also baseband (BB) processing functionality, and system level simulations for different hybrid beamforming and MIMO schemes the related variations in power consumption in relation to a given performance are evaluated. This paper shows the critical design aspects for energy efficient sub-THz deployments by highlighting the sub- THz specific trade-offs between different number of BS with different transmit powers but also changing number of BB units and RF chains.

An Intelligent Robotic and Bio-Digestor Framework for Smart Waste Management

Authors:Radhika Khatri, Adit Tewari, Nikhil Sharma, M. B. Srinivas

Date:2026-04-16 11:19:36

Rapid urbanization and continuous population growth have made municipal solid waste management increasingly challenging. These challenges highlight the need for smarter and automated waste management solutions. This paper presents the design and evaluation of an integrated waste management framework that combines two connected systems, a robotic waste segregation module and an optimized bio-digestor. The robotic waste segregation system uses a MyCobot 280 Jetson Nano robotic arm along with YOLOv8 object detection and robot operating system (ROS)-based path planning to identify and sort waste in real time. It classifies waste into four different categories with high precision, reducing the need for manual intervention. After segregation, the biodegradable waste is transferred to a bio-digestor system equipped with multiple sensors. These sensors continuously monitor key parameters, including temperature, pH, pressure, and motor revolutions per minute. The Particle Swarm Optimization (PSO) algorithm, combined with a regression model, is used to dynamically adjust system parameters. This intelligent optimization approach ensures stable operation and maximizes digestion efficiency under varying environmental conditions. System testing under dynamic conditions demonstrates a sorting accuracy of 98% along with highly efficient biological conversion. The proposed framework offers a scalable, intelligent, and practical solution for modern waste management, making it suitable for both residential and industrial applications.

TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

Authors:Yi Zhao, Yajuan Peng, Cam-Tu Nguyen, Zuchao Li, Xiaoliang Wang, Xiaoming Fu, Hai Zhao

Date:2026-04-16 10:33:00

Large Reasoning Models (LRMs) achieve strong performance on complex tasks through extended chains of thought but suffer from high inference latency due to autoregressive reasoning. Recent work explores using Small Reasoning Models (SRMs) to accelerate LRM inference. In this paper, we systematically characterize the capability boundaries of SRMs and identify three common types of reasoning risks: (1) path divergence, where SRMs lack the strategic ability to construct an initial plan, causing reasoning to deviate from the most probable path; (2) cognitive overload, where SRMs fail to solve particularly difficult steps; and (3) recovery inability, where SRMs lack robust self-reflection and error correction mechanisms. To address these challenges, we propose TrigReason, a trigger-based collaborative reasoning framework that replaces continuous polling with selective intervention. TrigReason delegates most reasoning to the SRM and activates LRM intervention only when necessary-during initial strategic planning (strategic priming trigger), upon detecting extraordinary overconfidence (cognitive offload trigger), or when reasoning falls into unproductive loops (intervention request trigger). The evaluation results on AIME24, AIME25, and GPQA-D indicate that TrigReason matches the accuracy of full LRMs and SpecReason, while offloading 1.70x - 4.79x more reasoning steps to SRMs. Under edge-cloud conditions, TrigReason reduces latency by 43.9\% and API cost by 73.3\%. Our code is available at \href{https://github.com/QQQ-yi/TrigReason}{https://github.com/QQQ-yi/TrigReason}

Simplification Ad Absurdum? Revisiting Gas Flow Modeling for Integrated Energy System Planning

Authors:Thomas Klatzer, Yannick Werner, Sonja Wogrin

Date:2026-04-16 10:24:54

This paper analyzes the implications of simplified pipeline gas flow models for integrated energy system planning. A case study of an integrated power-hydrogen expansion planning problem shows that simplifying pressure-flow relationships and gas dynamics can lead to expansion plans that incur substantial regret when evaluated under a more realistic dynamic gas flow model -- due to suboptimal system expansion, operation, and non-supplied hydrogen. Numerical experiments show that planning under the highly simplified transport and transport-linepack models -- commonly used in expansion studies -- can result in regret exceeding several thousand percent and yield expansion plans that lack robustness across demand levels. Planning under steady-state conditions partially mitigates these effects, but still leaves significant cost-reduction potential untapped compared to dynamic planning due to neglected linepack flexibility. Developing efficient solution algorithms for the dynamic model is a promising direction for future research.

Differentiable Object Pose Connectivity Metrics for Regrasp Sequence Optimization

Authors:Liang Qin, Weiwei Wan, Kensuke Harada

Date:2026-04-16 07:47:57

Regrasp planning is often required when one pick-and-place cannot transfer an object from an initial pose to a goal pose while maintaining grasp feasibility. The main challenge is to reason about shared-grasp connectivity across intermediate poses, where discrete search becomes brittle. We propose an implicit multi-step regrasp planning framework based on differentiable pose sequence connectivity metrics. We model grasp feasibility under an object pose using an Energy-Based Model (EBM) and leverage energy additivity to construct a continuous energy landscape that measures pose-pair connectivity, enabling gradient-based optimization of intermediate object poses. An adaptive iterative deepening strategy is introduced to determine the minimum number of intermediate steps automatically. Experiments show that the proposed cost formulation provides smooth and informative gradients, improving planning robustness over other alternatives. They also demonstrate generalization to unseen grasp poses and cross-end-effector transfer, where a model trained with suction constraints can guide parallel gripper grasp manipulation. The multi-step planning results further highlight the effectiveness of adaptive deepening and minimum-step search.

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

Authors:Runze Li, Hongyin Zhang, Junxi Jin, Qixin Zeng, Zifeng Zhuang, Yiqi Tang, Shangke Lyu, Donglin Wang

Date:2026-04-16 07:46:05

Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action. However, most existing approaches rely on direct action prediction, lacking the ability to reason over long-horizon trajectories and evaluate their consequences, which limits performance in complex decision-making tasks. In this work, we introduce World-Value-Action (WAV) model, a unified framework that enables implicit planning in VLA systems. Rather than performing explicit trajectory optimization, WAV model learn a structured latent representation of future trajectories conditioned on visual observations and language instructions. A learned world model predicts future states, while a trajectory value function evaluates their long-horizon utility. Action generation is then formulated as inference in this latent space, where the model progressively concentrates probability mass on high-value and dynamically feasible trajectories. We provide a theoretical perspective showing that planning directly in action space suffers from an exponential decay in the probability of feasible trajectories as the horizon increases. In contrast, latent-space inference reshapes the search distribution toward feasible regions, enabling efficient long-horizon decision making. Extensive simulations and real-world experiments demonstrate that the WAV model consistently outperforms state-of-the-art methods, achieving significant improvements in task success rate, generalization ability, and robustness, especially in long-horizon and compositional scenarios.

RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems

Authors:Seokwon Lee, Jaeyoung Sim, Sihyun Kim, Yuhsing Li, Yiwen Zhu, Kwanghyun Park

Date:2026-04-16 07:35:05

Recent advances in query optimization have shifted from traditional rule-based and cost-based techniques towards machine learning-driven approaches. Among these, reinforcement learning (RL) has attracted significant attention due to its ability to optimize long-term performance by learning policies over query planning. However, existing RL-based query optimizers often exhibit unstable performance at the level of individual queries, including severe performance regressions, and require prolonged training to reach the plan quality of expert, cost-based optimizers. These shortcomings make learned query optimizers difficult to deploy in practice and remain a major barrier to their adoption in production database systems. To address these challenges, we present RELOAD, a robust and efficient learned query optimizer for database systems. RELOAD focuses on (i) robustness, by minimizing query-level performance regressions and ensuring consistent optimization behavior across executions, and (ii) efficiency, by accelerating convergence to expert-level plan quality. Through extensive experiments on standard benchmarks, including Join Order Benchmark, TPC-DS, and Star Schema Benchmark, RELOAD demonstrates up to 2.4x higher robustness and 3.1x greater efficiency compared to state-of-the-art RL-based query optimization techniques.

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Authors:Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu

Date:2026-04-16 06:40:02

Deep Research Agents (DRAs) aim to solve complex, long-horizon research tasks involving planning, retrieval, multimodal understanding, and report generation, yet their evaluation remains challenging due to dynamic web environments and ambiguous task definitions. We propose DR$^{3}$-Eval, a realistic and reproducible benchmark for evaluating deep research agents on multimodal, multi-file report generation. DR$^{3}$-Eval is constructed from authentic user-provided materials and paired with a per-task static research sandbox corpus that simulates open-web complexity while remaining fully verifiable, containing supportive documents, distractors, and noise. Moreover, we introduce a multi-dimensional evaluation framework measuring Information Recall, Factual Accuracy, Citation Coverage, Instruction Following, and Depth Quality, and validate its alignment with human judgments. Experiments with our developed multi-agent system DR$^{3}$-Agent based on multiple state-of-the-art language models demonstrate that DR$^{3}$-Eval is highly challenging and reveals critical failure modes in retrieval robustness and hallucination control. Our code and data are publicly available.

Touching Space: Accessible Map Exploration Through Conversational Audio-Haptic Interaction

Authors:Li Liu, Jiaming Qu, Marc Jowell Bagaoisan, David T. Lee, Leilani H. Gilpin

Date:2026-04-16 05:26:09

Most existing assistive navigation tools focus on providing real-time guidance for Blind and Low-Vision (BLV) people, but few support building a holistic spatial understanding of unfamiliar environments before travel. Such cognitive map construction (e.g., knowing that a fountain is south of a tower and west of a hotel) is important for pre-travel planning, yet remains underexplored in prior work. To address this gap, we present Touching Space, an end-to-end system that retrieves map data for a target place and loads it into a frontend interface for exploration. The system combines haptic and audio feedback: users explore spatial layouts through touch and ask spoken questions to a conversational agent during exploration. Touching Space contributes a conversational interface that supports BLV users in building cognitive maps on commodity hardware.

Mind DeepResearch Technical Report

Authors:MindDR Team, Li Auto Inc

Date:2026-04-16 01:20:06

We present \textbf{Mind DeepResearch (MindDR)}, an efficient multi-agent deep research framework that achieves leading performance with only \textasciitilde30B-parameter models through a meticulously designed data synthesis and multi-stage training pipeline. The core innovation of MindDR lies in a collaborative three-agent architecture (Planning Agent, DeepSearch Agent, and Report Agent) and a four-stage agent-specialized training pipeline comprising SFT cold-start, Search-RL, Report-RL and preference alignment. With this regime, MindDR demonstrates competitive performance even with \textasciitilde30B-scale models. Specifically, MindDR achieves 45.7\% on BrowseComp-ZH, 42.8\% on BrowseComp, 46.5\% on WideSearch, 75.0\% on xbench-DS, and 52.5 on DeepResearch Bench, outperforming comparable-scale open-source agent systems and rivaling larger-scale models. MindDR has been deployed as an online product in Li Auto. Furthermore, we introduce \textbf{MindDR Bench}, a curated benchmark of 500 real-world Chinese queries from our internal product user interactions, evaluated through a comprehensive multi-dimensional rubric system rather than relying on a single RACE metric. On MindDR Bench, MindDR achieves a state-of-the-art score of 51.8.

CooperDrive: Enhancing Driving Decisions Through Cooperative Perception

Authors:Deyuan Qu, Qi Chen, Takayuki Shimizu, Onur Altintas

Date:2026-04-15 22:15:50

Autonomous vehicles equipped with robust onboard perception, localization, and planning still face limitations in occlusion and non-line-of-sight (NLOS) scenarios, where delayed reactions can increase collision risk. We propose CooperDrive, a cooperative perception framework that augments situational awareness and enables earlier, safer driving decisions. CooperDrive offers two key advantages: (i) each vehicle retains its native perception, localization, and planning stack, and (ii) a lightweight object-level sharing and fusion strategy bridges perception and planning. Specifically, CooperDrive reuses detector Bird's-Eye View (BEV) features to estimate accurate vehicle poses without additional heavy encoders, thereby reconstructing BEV representations and feeding the planner with low latency. On the planning side, CooperDrive leverages the expanded object set to anticipate potential conflicts earlier and adjust speed and trajectory proactively, thereby transforming reactive behaviors into predictive and safer driving decisions. Real-world closed-loop tests at occlusion-heavy NLOS intersections demonstrate that CooperDrive increases reaction lead time, minimum time-to-collision (TTC), and stopping margin, while requiring only 90 kbps bandwidth and maintaining an average end-to-end latency of 89 ms.

Integrated Investment and Policy Planning for Power Systems via Differentiable Scenario Generation

Authors:Robert Mieth

Date:2026-04-15 20:50:20

We formulate a method to co-optimize power system capacity planning decisions and policy investments that shape electricity load patterns. To this end, we leverage a gradient-based solution technique that enables the efficient solution of operation-aware planning models. To compute gradients with respect to the conditions that define daily electricity demand profiles, we introduce and formalize the concept of differentiable scenario generation and show that generative machine learning models satisfy the mathematical requirements needed to compute consistent gradients. We demonstrate the feasibility of the proposed approach through numerical experiments using a diffusion model-based scenario generator and a stylized generation and capacity expansion planning model.

Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

Authors:David Basil, Chirooth Girigowda, Bradley Hauer, Sahir Momin, Ning Shi, Grzegorz Kondrak

Date:2026-04-15 20:27:26

We study the task of automatically expanding WordNet-style lexical resources to new languages through sense generation. We generate senses by associating target-language lemmas with existing lexical concepts via semantic projection. Given a sense-tagged English corpus and its translation, our method projects English synsets onto aligned target-language tokens and assigns the corresponding lemmas to those synsets. To generate these alignments and ensure their quality, we augment a pre-trained base aligner with a bilingual dictionary, which is also used to filter out incorrect sense projections. We evaluate the method on multiple languages, comparing it to prior methods, as well as dictionary-based and large language model baselines. Results show that the proposed project-and-filter strategy improves precision while remaining interpretable and requiring few external resources. We plan to make our code, documentation, and generated sense inventories accessible.

AC-OPF Feasibility Analysis and Sensitivity-Guided Capacitor Placement in a High-PV Islanded Microgrid

Authors:Aaron Jones, Marija Ilic

Date:2026-04-15 19:40:41

This paper presents a comparative AC Optimal Power Flow study on a real world city scale islanded microgrid with high solar PV penetration, implemented within a Digital Twin framework. Four objective function cases economic dispatch, voltage stress exposure via PV power factor variation, then optimal load delivery, and capacitor enhanced economic dispatch as recovery options are evaluated over a 47 hour time series horizon on the same network under a shared loading scenario. Optimization sensitivities OSQ and OSV extracted from all cases are combined into a composite placement score used to rank candidate buses for shunt capacitor upgrades. A post processing planning optimization balances capacitor upgrade cost against avoided value-of-lost-load, enabling direct economic comparison of infrastructure investment versus reliability penalties. Results demonstrate that sensitivity guided capacitor placement restores full load service across the horizon and provides targeted reactive support at a quantifiable cost trade off against corrective load shedding.