planning - 2026-03-28

Vega: Learning to Drive with Natural Language Instructions

Authors:Sicheng Zuo, Yuxuan Li, Wenzhao Zheng, Zheng Zhu, Jie Zhou, Jiwen Lu
Date:2026-03-26 17:59:56

Vision-language-action models have reshaped autonomous driving to incorporate languages into the decision-making process. However, most existing pipelines only utilize the language modality for scene descriptions or reasoning and lack the flexibility to follow diverse user instructions for personalized driving. To address this, we first construct a large-scale driving dataset (InstructScene) containing around 100,000 scenes annotated with diverse driving instructions with the corresponding trajectories. We then propose a unified Vision-Language-World-Action model, Vega, for instruction-based generation and planning. We employ the autoregressive paradigm to process visual inputs (vision) and language instructions (language) and the diffusion paradigm to generate future predictions (world modeling) and trajectories (action). We perform joint attention to enable interactions between the modalities and use individual projection layers for different modalities for more capabilities. Extensive experiments demonstrate that our method not only achieves superior planning performance but also exhibits strong instruction-following abilities, paving the way for more intelligent and personalized driving systems.

Drive My Way: Preference Alignment of Vision-Language-Action Model for Personalized Driving

Authors:Zehao Wang, Huaide Jiang, Shuaiwu Dong, Yuping Wang, Hang Qiu, Jiachen Li
Date:2026-03-26 17:59:54

Human driving behavior is inherently personal, which is shaped by long-term habits and influenced by short-term intentions. Individuals differ in how they accelerate, brake, merge, yield, and overtake across diverse situations. However, existing end-to-end autonomous driving systems either optimize for generic objectives or rely on fixed driving modes, lacking the ability to adapt to individual preferences or interpret natural language intent. To address this gap, we propose Drive My Way (DMW), a personalized Vision-Language-Action (VLA) driving framework that aligns with users' long-term driving habits and adapts to real-time user instructions. DMW learns a user embedding from our personalized driving dataset collected across multiple real drivers and conditions the policy on this embedding during planning, while natural language instructions provide additional short-term guidance. Closed-loop evaluation on the Bench2Drive benchmark demonstrates that DMW improves style instruction adaptation, and user studies show that its generated behaviors are recognizable as each driver's own style, highlighting personalization as a key capability for human-centered autonomous driving. Our data and code are available at https://dmw-cvpr.github.io/.

Wan-Weaver: Interleaved Multi-modal Generation via Decoupled Training

Authors:Jinbo Xing, Zeyinzi Jiang, Yuxiang Tuo, Chaojie Mao, Xiaotang Gai, Xi Chen, Jingfeng Zhang, Yulin Pan, Zhen Han, Jie Xiao, Keyu Yan, Chenwei Xie, Chongyang Zhong, Kai Zhu, Tong Shen, Lianghua Huang, Yu Liu, Yujiu Yang
Date:2026-03-26 17:50:37

Recent unified models have made unprecedented progress in both understanding and generation. However, while most of them accept multi-modal inputs, they typically produce only single-modality outputs. This challenge of producing interleaved content is mainly due to training data scarcity and the difficulty of modeling long-range cross-modal context. To address this issue, we decompose interleaved generation into textual planning and visual consistency modeling, and introduce a framework consisting of a planner and a visualizer. The planner produces dense textual descriptions for visual content, while the visualizer synthesizes images accordingly. Under this guidance, we construct large-scale textual-proxy interleaved data (where visual content is represented in text) to train the planner, and curate reference-guided image data to train the visualizer. These designs give rise to Wan-Weaver, which exhibits emergent interleaved generation ability with long-range textual coherence and visual consistency. Meanwhile, the integration of diverse understanding and generation data into planner training enables Wan-Weaver to achieve robust task reasoning and generation proficiency. To assess the model's capability in interleaved generation, we further construct a benchmark that spans a wide range of use cases across multiple dimensions. Extensive experiments demonstrate that, even without access to any real interleaved data, Wan-Weaver achieves superior performance over existing methods.

Intelligent Navigation and Obstacle-Aware Fabrication for Mobile Additive Manufacturing Systems

Authors:Yifei Li, Ruizhe Fu, Huihang Liu, Guha Manogharan, Feng Ju, Ilya Kovalenko
Date:2026-03-26 17:38:09

As the demand for mass customization increases, manufacturing systems must become more flexible and adaptable to produce personalized products efficiently. Additive manufacturing (AM) enhances production adaptability by enabling on-demand fabrication of customized components directly from digital models, but its flexibility remains constrained by fixed equipment layouts. Integrating mobile robots addresses this limitation by allowing manufacturing resources to move and adapt to changing production requirements. Mobile AM Robots (MAMbots) combine AM with mobile robotics to produce and transport components within dynamic manufacturing environments. However, the dynamic manufacturing environments introduce challenges for MAMbots. Disturbances such as obstacles and uneven terrain can disrupt navigation stability, which in turn affects printing accuracy and surface quality. This work proposes a universal mobile printing-and-delivery platform that couples navigation and material deposition, addressing the limitations of earlier frameworks that treated these processes separately. A real-time control framework is developed to plan and control the robot's navigation, ensuring safe motion, obstacle avoidance, and path stability while maintaining print quality. The closed-loop integration of sensing, mobility, and manufacturing provides real-time feedback for motion and process control, enabling MAMbots to make autonomous decisions in dynamic environments. The framework is validated through simulations and real-world experiments that test its adaptability to trajectory variations and external disturbances. Coupled navigation and printing together enable MAMbots to plan safe, adaptive trajectories, improving flexibility and adaptability in manufacturing.

Designing Any Imaging System from Natural Language: Agent-Constrained Composition over a Finite Primitive Basis

Authors:Chengshuai Yang
Date:2026-03-26 16:47:27

Designing a computational imaging system -- selecting operators, setting parameters, validating consistency -- requires weeks of specialist effort per modality, creating an expertise bottleneck that excludes the broader scientific community from prototyping imaging instruments. We introduce spec.md, a structured specification format, and three autonomous agents -- Plan, Judge, and Execute -- that translate a one-sentence natural-language description into a validated forward model with bounded reconstruction error. A design-to-real error theorem decomposes total reconstruction error into five independently bounded terms, each linked to a corrective action. On 6 real-data modalities spanning all 5 carrier families, the automated pipeline matches expert-library quality (98.1 +/- 4.2%). Ten novel designs -- composing primitives into chains from 3D to 5D -- demonstrate compositional reach beyond any single-modality tool.

Automating Computational Chemistry Workflows via OpenClaw and Domain-Specific Skills

Authors:Mingwei Ding, Chen Huang, Yibo Hu, Yifan Li, Zitian Lu, Xingtai Yu, Duo Zhang, Wenxi Zhai, Tong Zhu, Qiangqiang Gu, Jinzhe Zeng
Date:2026-03-26 14:56:51

Automating multistep computational chemistry tasks remains challenging because reasoning, workflow specification, software execution, and high-performance computing (HPC) execution are often tightly coupled. We demonstrate a decoupled agent-skill design for computational chemistry automation leveraging OpenClaw. Specifically, OpenClaw provides centralized control and supervision; schema-defined planning skills translate scientific goals into executable task specifications; domain skills encapsulate specific computational chemistry procedures; and DPDispatcher manages job execution across heterogeneous HPC environments. In a molecular dynamics (MD) case study of methane oxidation, the system completed cross-tool execution, bounded recovery from runtime failures, and reaction network extraction, illustrating a scalable and maintainable approach to multistep computational chemistry automation.

Temporally Decoupled Diffusion Planning for Autonomous Driving

Authors:Xiang Li, Bikun Wang, John Zhang, Jianjun Wang
Date:2026-03-26 14:04:15

Motion planning in dynamic urban environments requires balancing immediate safety with long-term goals. While diffusion models effectively capture multi-modal decision-making, existing approaches treat trajectories as monolithic entities, overlooking heterogeneous temporal dependencies where near-term plans are constrained by instantaneous dynamics and far-term plans by navigational goals. To address this, we propose Temporally Decoupled Diffusion Model (TDDM), which reformulates trajectory generation via a noise-as-mask paradigm. By partitioning trajectories into segments with independent noise levels, we implicitly treat high noise as information voids and weak noise as contextual cues. This compels the model to reconstruct corrupted near-term states by leveraging internal correlations with better-preserved temporal contexts. Architecturally, we introduce a Temporally Decoupled Adaptive Layer Normalization (TD-AdaLN) to inject segment-specific timesteps. During inference, our Asymmetric Temporal Classifier-Free Guidance utilizes weakly noised far-term priors to guide immediate path generation. Evaluations on the nuPlan benchmark show TDDM approaches or exceeds state-of-the-art baselines, particularly excelling in the challenging Test14-hard subset.

UMBRELLA: Uncertainty-aware Multi-robot Reactive Coordination under Dynamic Temporal Logic Tasks

Authors:Qisheng Zhao, Meng Guo, Hengxuan Du, Lars Lindemann, Zhongkui Li
Date:2026-03-26 12:40:04

Multi-robot systems can be extremely efficient for accomplishing team-wise tasks by acting concurrently and collaboratively. However, most existing methods either assume static task features or simply replan when environmental changes occur. This paper addresses the challenging problem of coordinating multi-robot systems for collaborative tasks involving dynamic and moving targets. We explicitly model the uncertainty in target motion prediction via Conformal Prediction(CP), while respecting the spatial-temporal constraints specified by Linear Temporal Logic (LTL). The proposed framework (UMBRELLA) combines the Monte Carlo Tree Search (MCTS) over partial plans with uncertainty-aware rollouts, and introduces a CP-based metric to guide and accelerate the search. The objective is to minimize the Conditional Value at Risk (CVaR) of the average makespan. For tasks released online, a receding-horizon planning scheme dynamically adjusts the assignments based on updated task specifications and motion predictions. Spatial and temporal constraints among the tasks are always ensured, and only partial synchronization is required for the collaborative tasks during online execution. Extensive large-scale simulations and hardware experiments demonstrate substantial reductions in both the average makespan and its variance by 23% and 71%, compared with static baselines.

IntentReact: Guiding Reactive Object-Centric Navigation via Topological Intent

Authors:Yanmei Jiao, Anpeng Lu, Wenhan Hu, Rong Xiong, Yue Wang, Huajin Tang, Wen-an Zhang
Date:2026-03-26 12:32:11

Object-goal visual navigation requires robots to reason over semantic structure and act effectively under partial observability. Recent approaches based on object-level topological maps enable long-horizon navigation without dense geometric reconstruction, but their execution remains limited by the gap between global topological guidance and local perception-driven control. In particular, local decisions are made solely from the current egocentric observation, without access to information beyond the robot's field of view. As a result, the robot may persist along its current heading even when initially oriented away from the goal, moving toward directions that do not decrease the global topological distance. In this work, we propose IntentReact, an intent-conditioned object-centric navigation framework that introduces a compact interface between global topological planning and reactive object-centric control. Our approach encodes global topological guidance as a low-dimensional directional signal, termed intent, which conditions a learned waypoint prediction policy to bias navigation toward topologically consistent progression. This design enables the robot to promptly reorient when local observations are misleading, guiding motion toward directions that decrease global topological distance while preserving the reactivity and robustness of object-centric control. We evaluate the proposed framework through extensive experiments, demonstrating improved navigation success and execution quality compared to prior object-centric navigation methods.

Connectivity-Aware Representations for Constrained Motion Planning via Multi-Scale Contrastive Learning

Authors:Suhyun Jeon, Yumin Lim, Woo-Jeong Baek, Hyeonseo Kim, Suhan Park, Jaeheung Park
Date:2026-03-26 10:44:21

The objective of constrained motion planning is to connect start and goal configurations while satisfying task-specific constraints. Motion planning becomes inefficient or infeasible when the configurations lie in disconnected regions, known as essentially mutually disconnected (EMD) components. Constraints further restrict feasible space to a lower-dimensional submanifold, while redundancy introduces additional complexity because a single end-effector pose admits infinitely many inverse kinematic solutions that may form discrete self-motion manifolds. This paper addresses these challenges by learning a connectivity-aware representation for selecting start and goal configurations prior to planning. Joint configurations are embedded into a latent space through multi-scale manifold learning across neighborhood ranges from local to global, and clustering generates pseudo-labels that supervise a contrastive learning framework. The proposed framework provides a connectivity-aware measure that biases the selection of start and goal configurations in connected regions, avoiding EMDs and yielding higher success rates with reduced planning time. Experiments on various manipulation tasks showed that our method achieves 1.9 times higher success rates and reduces the planning time by a factor of 0.43 compared to baselines.

Semi-Automated Generation and Hemodynamic Assessment of Surgical Baffle Geometry for Biventricular Repair

Authors:Elena Sabdy Martinez, Alexander D. Kaiser, Alexander K. Reed, Sascha W. Stocker, Amit Sharir, Perry S. Choi, Shiraz A. Maskatia, Michael R. Ma, Alison Lesly Marsden
Date:2026-03-26 09:10:03

Patient-specific computational modeling has emerged as a powerful tool for surgical planning in complex congenital heart disease. One promising application is complex biventricular repair, which often requires construction of a custom intraventricular baffle to establish a physiologic left ventricle-to-aorta outflow pathway. In current practice, baffle geometry is designed and shaped intraoperatively and preoperative planning remains largely manual, limiting the ability to generate anatomically conformal, watertight models suitable for quantitative hemodynamic assessment. In this work, we present a semi-automated computational framework for the design and assessment of patient-specific intraventricular baffles. The method constructs an explicit VSD-to-aorta flow pathway, preserves native right ventricular geometry, and reshapes only the baffle region using section-wise area constraints along a physiologically aligned centerline. The resulting geometry is integrated into a closed, multi-labeled domain for computational fluid dynamics analysis. We retrospectively applied this framework to four patients with double outlet right ventricle (DORV) who previously underwent biventricular repair. For each case, a patient-specific baffle was generated and its hemodynamic performance was evaluated using CFD. Predicted pressure gradients across the reconstructed outflow were within clinically acceptable ranges and comparable to the patients' postoperative echocardiographs. This approach enables quantitative, pre-operative design and evaluation of candidate baffle geometries and provides a reproducible method for generating simulation-ready models. By combining physiologically constrained geometric design with CFD-based assessment, the framework represents a step toward computational, patient-specific decision support for biventricular flow restoration in a complex heterogeneous patient population.

CTS-PLL: A Robust and Anytime Framework for Collaborative Task Sequencing and Multi-Agent Path Finding

Authors:Junkai Jiang, Yitao Xu, Ruochen Li, Shaobing Xu, Jianqiang Wang
Date:2026-03-26 07:45:44

The Collaborative Task Sequencing and Multi-Agent Path Finding (CTS-MAPF) problem requires agents to accomplish sequences of tasks while avoiding collisions, posing significant challenges due to its combinatorial complexity. This work introduces CTS-PLL, a hierarchical framework that extends the configuration-based CTS-MAPF planning paradigm with two key enhancements: a lock agents detection and release mechanism leveraging a complete planning method for local re-planning, and an anytime refinement procedure based on Large Neighborhood Search (LNS). These additions ensure robustness in dense environments and enable continuous improvement of solution quality. Extensive evaluations across sparse and dense benchmarks demonstrate that CTS-PLL achieves higher success rates and solution quality compared with existing methods, while maintaining competitive runtime efficiency. Real-world robot experiments further demonstrate the feasibility of the approach in practice.

From Logic Monopoly to Social Contract: Separation of Power and the Institutional Foundations for Autonomous Agent Economies

Authors:Anbang Ruan
Date:2026-03-26 07:14:48

Existing multi-agent frameworks allow each agent to simultaneously plan, execute, and evaluate its own actions -- a structural deficiency we term the "Logic Monopoly." Empirical evidence quantifies the resulting "Reliability Gap": 84.30% average attack success rates across ten deployment scenarios, 31.4% emergent deceptive behavior without explicit reward signals, and cascading failure modes rooted in six structural bottlenecks. The remedy is not better alignment of individual models but a social contract for agents: institutional infrastructure that enforces a constitutional Separation of Power. This paper introduces the Agent Enterprise for Enterprise (AE4E) paradigm -- agents as autonomous, legally identifiable business entities within a functionalist social system -- with a contract-centric SoP model trifurcating authority into Legislation, Execution, and Adjudication branches. The paradigm is operationalized through the NetX Enterprise Framework (NEF): governance hubs, TEE-backed compute enclaves, privacy-preserving data bridges, and an Agent-Native blockchain substrate. The Agent Enterprise Economy scales across four deployment tiers from private enclaves to a global Web of Services. The Agentic Social Layer, grounded in Parsons' AGIL framework, provides institutional infrastructure via sixty-plus named Institutional AE4Es. 143 pages, 173 references, eight specialized smart contracts.

Beam Test Characterization of Silicon Microstrip Detector Flight-Model Ladders for the AMS-02 Upgrade

Authors:Dexing Miao, Giovanni Ambrosi, Mattia Barbanera, Baasansuren Batsukh, Hengyi Cai, Mengke Cai, Xudong Cai, Yuman Cai, Yuan-Hann Chang, Shanzhen Chen, Hsin-Yi Chou, Xingzhu Cui, Mingyi Dong, Matteo Duranti, Ke Gong, Mingjie Feng, Valerio Formato, Yisheng Fu, Daojin Hong, Maria Ionica, Xiaojie Jiang, Yaozu Jiang, Liangchenglong Jin, Shengjie Jin, Vladimir Koutsenko, Qinze Li, Tiange Li, Zuhao Li, Chih-Hsun Lin, Changcheng Liu, Cong Liu, Hanbing Liu, Pingcheng Liu, Alberto Oliva, Ji Peng, Wenxi Peng, Rui Qiao, Shuqi Sheng, Tianyu Shi, Gianluigi Silvestre, Zetong Sun, Congcong Wang, Feng Wang, Hongbo Wang, Zhijie Wang, Zibing Wu, Zhiyu Xiang, Suyu Xiao, Weiwei Xu, Zixuan Yan, Haotian Yang, Sheng Yang, Yuhang You, Xuhao Yuan, Yuan Yuan, Fengze Zhang, Xiyuan Zhang, Zijun Xu, Jianchun Wang
Date:2026-03-26 06:38:59

The AMS-02 experiment plans to install a new silicon microstrip tracker layer (Layer-0) on top of the existing detector, increasing the cosmic-ray acceptance by a factor of 3. Layer-0 employs a design in which multiple silicon microstrip detectors (SSDs) are connected in series to form long detector ladders. We present a detailed performance study of the flight-model ladders using a 350~GeV mixed hadron beam at the CERN SPS. The study focuses on the following aspects: (i) the performance of ladders with different numbers of SSDs, for which the intrinsic spatial resolution at normal incidence varies from $9.5~μ\mathrm{m}$ to $11.4~μ\mathrm{m}$ for ladders composed of 8 to 12 SSDs; (ii) the response consistency for particles impacting on the \emph{Head} and \emph{Tail} regions of the ladder; and (iii) the dependence of the detector performance on the particle incidence angle.

Quantum Inspired Vehicular Network Optimization for Intelligent Decision Making in Smart Cities

Authors:Kamran Ahmad Awan, Sonia Khan, Eman Abdullah Aldakheel, Saif Al-Kuwari, Ahmed Farouk
Date:2026-03-26 03:04:50

Connected and automated vehicles require city-scale coordination under strict latency and reliability constraints. However, many existing approaches optimize communication and mobility separately, which can degrade performance during network outages and under compute contention. This paper presents QIVNOM, a quantum-inspired framework that jointly optimizes vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communication together with urban traffic control on classical edge--cloud hardware, without requiring a quantum processor. QIVNOM encodes candidate routing--signal plans as probabilistic superpositions and updates them using sphere-projected gradients with annealed sampling to minimize a regularized objective. An entanglement-style regularizer couples networking and mobility decisions, while Tchebycheff multi-objective scalarization with feasibility projection enforces constraints on latency and reliability. The proposed framework is evaluated in METR-LA--calibrated SUMO--OMNeT++/Veins simulations over a $5\times5$~km urban map with IEEE 802.11p and 5G NR sidelink. Results show that QIVNOM reduces mean end-to-end latency to 57.3~ms, approximately $20\%$ lower than the best baseline. Under incident conditions, latency decreases from 79~ms to 62~ms ($-21.5\%$), while under roadside unit (RSU) outages, it decreases from 86~ms to 67~ms ($-22.1\%$). Packet delivery reaches $96.7\%$ (an improvement of $+2.3$ percentage points), and reliability remains $96.7\%$ overall, including $96.8\%$ under RSU outages versus $94.1\%$ for the baseline. In corridor-closure scenarios, travel performance also improves, with average travel time reduced to 12.8~min and congestion lowered to $33\%$, compared with 14.5~min and $37\%$ for the baseline.

Optimization of Closed-Loop Shallow Geothermal Systems Using Analytical Models

Authors:Oliver Heinzel, Smajil Halilovic, Thomas Hamacher, Michael Ulbrich
Date:2026-03-26 02:49:49

Closed-loop shallow geothermal systems are one of the key technologies for decarbonizing the residential heating and cooling sector. The primary type of these systems involves vertical borehole heat exchangers (BHEs). During the planning phase, it is essential to find the optimal design for these systems, including the depth and spatial arrangement of the BHEs. In this work, we have developed a novel approach to find the optimal design of BHE fields, taking into account constraints such as temperature limits of the heat carrier fluid. These limits correspond to the regulatory practices applied during the planning phase. The approach uses a finite line source model to simulate temperature changes in the ground in combination with an analytical model of heat transport within the boreholes. Our approach is demonstrated using realistic scenarios and is expected to improve current practice in the planning and design of BHE systems.

Integrated Multi-Drone Task Allocation, Sequencing, and Optimal Trajectory Generation in Obstacle-Rich 3D Environments

Authors:Yunes Alqudsi, Murat Makaraci
Date:2026-03-26 00:46:26

Coordinating teams of aerial robots in cluttered three-dimensional (3D) environments requires a principled integration of discrete mission planning-deciding which robot serves which goals and in what order -- with continuous-time trajectory synthesis that enforces collision avoidance and dynamic feasibility. This paper introduces IMD-TAPP (Integrated Multi-Drone Task Allocation and Path Planning), an end-to-end framework that jointly addresses multi-goal allocation, tour sequencing, and safe trajectory generation for quadrotor teams operating in obstacle-rich spaces. IMD--TAPP first discretizes the workspace into a 3D navigation graph and computes obstacle-aware robot-to-goal and goal-to-goal travel costs via graph-search-based pathfinding. These costs are then embedded within an Injected Particle Swarm Optimization (IPSO) scheme, guided by multiple linear assignment, to efficiently explore coupled assignment/ordering alternatives and to minimize mission makespan. Finally, the resulting waypoint tours are transformed into time-parameterized minimum-snap trajectories through a generation-and-optimization routine equipped with iterative validation of obstacle clearance and inter-robot separation, triggering re-planning when safety margins are violated. Extensive MATLAB simulations across cluttered 3D scenarios demonstrate that IMD--TAPP consistently produces dynamically feasible, collision-free trajectories while achieving competitive completion times. In a representative case study with two drones serving multiple goals, the proposed approach attains a minimum mission time of 136~s while maintaining the required safety constraints throughout execution.

Calibrating Resident Surveys with Operational Data in Community Planning

Authors:Irene S. Gabashvili
Date:2026-03-26 00:28:20

Community associations rely heavily on resident surveys to guide decisions about amenities, infrastructure, and services. However, survey responses reflect perceptions that may not directly correspond to underlying operational conditions. This study bridges that gap by calibrating survey-based satisfaction measures against objective utilization data. Using parking and facility data from Tellico Village, we map perceived problem rates to utilization exceedance probabilities to estimate behavioral congestion thresholds. Results show that dissatisfaction emerges near effective capacity - once spatial, temporal, and informational constraints are considered - rather than at nominal capacity limits. Perceived difficulty is concentrated among active users and is shaped by operational frictions and incomplete system knowledge. These findings demonstrate that perceived congestion reflects constraints on access and reliability, not simply physical shortages. By distinguishing between effective and nominal capacity, the proposed framework enables more accurate diagnosis of system conditions. We propose incorporating behavioral metrics into community performance frameworks to support better decision-making, reduce unnecessary capital expansion, and target operational improvements more effectively.

Increasing trends in the severity of Australian fire weather conditions over the past century

Authors:Soubhik Biswas, Andrew Dowdy, Savin Chand
Date:2026-03-25 23:13:47

Understanding how weather and climate influence fire risk is important for many purposes, including climate adaptation planning and decision-making in sectors such as emergency management, finance, health and infrastructure (e.g., for energy and water availability). In this study, bias-corrected 20CRv2c reanalysis data are used to investigate the climatology and long-term trends of weather conditions associated with landscape fires in Australia. The McArthur Forest Fire Danger Index (FFDI) is used here as a broad-scale representation of weather conditions known to influence fire behaviour based on wind speed, humidity, temperature and rainfall measures. In particular, using this reanalysis dataset allows analysis over a longer time period than previous studies, from 1876 to 2011. Another novel aspect is that trends are examined using several different approaches, including a method to help account for the influence of interannual drivers of climate variability not previously used for fire weather analysis. Results show increases in mean and extreme seasonal FFDI values throughout Australia in general, with all statistically significant trends being positive in sign for individual climate zones. Humidity and temperature trends, attributable to human-caused climate change, are shown to be the main cause of the increase in dangerous weather conditions for fires. These findings build on previous studies, with the novel data and methods used adding confidence to the overall understanding of fire risk factors in a changing climate.

How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

Authors:Luyu Yang, Yutong Dai, An Yan, Viraj Prabhu, Ran Xu, Zeyuan Chen
Date:2026-03-25 23:13:28

The physical world is not merely visual; it is governed by rigorous structural and procedural constraints. Yet, the evaluation of vision-language models (VLMs) remains heavily skewed toward perceptual realism, prioritizing the generation of visually plausible 3D layouts, shapes, and appearances. Current benchmarks rarely test whether models grasp the step-by-step processes and physical dependencies required to actually build these artifacts, a capability essential for automating design-to-construction pipelines. To address this, we introduce DreamHouse, a novel benchmark for physical generative reasoning: the capacity to synthesize artifacts that concurrently satisfy geometric, structural, constructability, and code-compliance constraints. We ground this benchmark in residential timber-frame construction, a domain with fully codified engineering standards and objectively verifiable correctness. We curate over 26,000 structures spanning 13 architectural styles, ach verified to construction-document standards (LOD 350) and develop a deterministic 10-test structural validation framework. Unlike static benchmarks that assess only final outputs, DreamHouse supports iterative agentic interaction. Models observe intermediate build states, generate construction actions, and receive structured environmental feedback, enabling a fine-grained evaluation of planning, structural reasoning, and self-correction. Extensive experiments with state-of-the-art VLMs reveal substantial capability gaps that are largely invisible on existing leaderboards. These findings establish physical validity as a critical evaluation axis orthogonal to visual realism, highlighting physical generative reasoning as a distinct and underdeveloped frontier in multimodal intelligence. Available at https://luluyuyuyang.github.io/dreamhouse

Characterization of Constraints in Flexible Unknown Environments

Authors:Samrat Bhattacharyya, Nabil Simaan
Date:2026-03-25 20:48:25

This paper presents an online path planning algorithm for safe autonomous manipulation of a flexibly constrained object in an unknown environment. Methods for real time identification and characterization of perceived flexible constraints and global stiffness are presented. Used in tandem, these methods allow a robot to simultaneously explore, characterize, and manipulate an elastic system safely. Navigation without a-priori knowledge of the system is achieved using constraint exploration based on local force and position information. The perceived constraint stiffness is considered at multiple poses along an object's (system) trajectory. Using stiffness eigenvector information, global stiffness behavior is characterized and identified using an atlas of simple mechanical constraints, such as hinges and planar constraints. Validation of these algorithms is carried out by simulation and experimentally. The ability to recognize several common simple mechanical constraints (such as a flexible hinge) in real time, and to subsequently identify relevant screw parameters is demonstrated. These results suggest the feasibility of simultaneous global constrain/stiffness exploration and safe manipulation of flexibly constrained objects. We believe that this approach will eventually enable safe cooperative manipulation in applications such as organ retraction and manipulation during surgery

Mobility shapes heat exposure inequalities in cities

Authors:Marc Duran-Sala, Mattia Mazzoli, Martin Hendrick, Gabriele Manoli
Date:2026-03-25 19:56:46

Segregation has long been recognized as a driver of environmental inequalities, with disadvantaged groups often living in neighborhoods where heat-related risks are highest. Yet, it remains unclear how daily mobility patterns, embedded within heterogeneous urban heat fields, shape heat exposure inequalities across sociodemographic groups. Using a mobile phone dataset of daily mobility flows and urban temperature fields across 23 Spanish cities, we develop a network-based framework to quantify how different sociodemographic groups experience heat through their daily movements. We find systematic income-related inequalities, with low-income groups consistently experiencing higher exposure than high-income groups, while age-related disparities are smaller in magnitude, with younger individuals slightly more exposed than elderly ones. These inequalities intensify during commuting trips, indicating that routine mobility amplifies spatial heat gradients more than non-routine movements. We further assess whether state-of-the-art population-based mobility models can capture these observed inequalities. The gravity model underestimates income- and age-related exposure differences, whereas the parameter-free radiation model captures most of the observed disparities. This indicates that heat exposure inequalities largely emerge from the interplay between the unequal organization of daily activities across sociodemographic groups and urban heat gradients, rather than from group-specific behavioral differences. Our findings provide a generalizable framework to characterize mobility-driven heat exposure inequalities and inform climate-resilient urban planning and public health strategies as cities face intensifying climate-related risks.

Bridging the Gap Between Agility and Planning

Authors:Eduardo Miranda
Date:2026-03-25 19:45:18

Milestone Driven Agile Execution is a hybrid management framework where the empirical control component of agile development is retained but the prioritization of the backlog is done according to a macro or strategic (milestone) plan that drives the execution of the project. MDAX is method agnostic, in the sense that the development approach is not embedded in the execution mechanism but in the plan that drives it. This allows organizations using it to choose the development approach that suites them most,

Latent-WAM: Latent World Action Modeling for End-to-End Autonomous Driving

Authors:Linbo Wang, Yupeng Zheng, Qiang Chen, Shiwei Li, Yichen Zhang, Zebin Xing, Qichao Zhang, Xiang Li, Deheng Qian, Pengxuan Yang, Yihang Dong, Ce Hao, Xiaoqing Ye, Junyu han, Yifeng Pan, Dongbin Zhao
Date:2026-03-25 17:56:07

We introduce Latent-WAM, an efficient end-to-end autonomous driving framework that achieves strong trajectory planning through spatially-aware and dynamics-informed latent world representations. Existing world-model-based planners suffer from inadequately compressed representations, limited spatial understanding, and underutilized temporal dynamics, resulting in sub-optimal planning under constrained data and compute budgets. Latent-WAM addresses these limitations with two core modules: a Spatial-Aware Compressive World Encoder (SCWE) that distills geometric knowledge from a foundation model and compresses multi-view images into compact scene tokens via learnable queries, and a Dynamic Latent World Model (DLWM) that employs a causal Transformer to autoregressively predict future world status conditioned on historical visual and motion representations. Extensive experiments on NAVSIM v2 and HUGSIM demonstrate new state-of-the-art results: 89.3 EPDMS on NAVSIM v2 and 28.9 HD-Score on HUGSIM, surpassing the best prior perception-free method by 3.2 EPDMS with significantly less training data and a compact 104M-parameter model.

POLY-SIM: Polyglot Speaker Identification with Missing Modality Grand Challenge 2026 Evaluation Plan

Authors:Marta Moscati, Muhammad Saad Saeed, Marina Zanoni, Mubashir Noman, Rohan Kumar Das, Monorama Swain, Yufang Hou, Elisabeth Andre, Khalid Mahmood Malik, Markus Schedl, Shah Nawaz
Date:2026-03-25 17:47:00

Multimodal speaker identification systems typically assume the availability of complete and homogeneous audio-visual modalities during both training and testing. However, in real-world applications, such assumptions often do not hold. Visual information may be missing due to occlusions, camera failures, or privacy constraints, while multilingual speakers introduce additional complexity due to linguistic variability across languages. These challenges significantly affect the robustness and generalization of multimodal speaker identification systems. The POLY-SIM Grand Challenge 2026 aims to advance research in multimodal speaker identification under missing-modality and cross-lingual conditions. Specifically, the Grand Challenge encourages the development of robust methods that can effectively leverage incomplete multimodal inputs while maintaining strong performance across different languages. This report presents the design and organization of the POLY-SIM Grand Challenge 2026, including the dataset, task formulation, evaluation protocol, and baseline model. By providing a standardized benchmark and evaluation framework, the challenge aims to foster progress toward more robust and practical multimodal speaker identification systems.

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Authors:Keliang Li, Yansong Li, Hongze Shen, Mengdi Liu, Hong Chang, Shiguang Shan
Date:2026-03-25 17:38:54

The dense, temporal nature of video presents a profound challenge for automated analysis. Despite the use of powerful Vision-Language Models, prevailing methods for video understanding are limited by the inherent disconnect between reasoning and perception: they rely on static, pre-processed information and cannot actively seek raw evidence from video as their understanding evolves. To address this, we introduce LensWalk, a flexible agentic framework that empowers a Large Language Model reasoner to control its own visual observation actively. LensWalk establishes a tight reason-plan-observe loop where the agent dynamically specifies, at each step, the temporal scope and sampling density of the video it observes. Using a suite of versatile, Vision-Language Model based tools parameterized by these specifications, the agent can perform broad scans for cues, focus on specific segments for fact extraction, and stitch evidence from multiple moments for holistic verification. This design allows for progressive, on-demand evidence gathering that directly serves the agent's evolving chain of thought. Without requiring any model fine-tuning, LensWalk delivers substantial, plug-and-play performance gains on multiple model recipes, boosting their accuracy by over 5\% on challenging long-video benchmarks like LVBench and Video-MME. Our analysis reveals that enabling an agent to control how it sees is key to unlocking more accurate, robust, and interpretable video reasoning.

A generalization of the Froissart-Stora formula to piecewise-linear spin-orbit resonance crossings

Authors:Joseph P. Devlin, Georg H. Hoffstaetter, Desmond P. Barber
Date:2026-03-25 16:50:37

Spin-polarized beams are important for some nuclear and high-energy physics experiments, such as those planned for the future Electron-Ion Collider (EIC). However, maintaining polarization during the acceleration of a charged-particle beam is difficult because the periodic nature of circular accelerators leads to spin-orbit resonances where the spin-precession frequency is a sum of integer multiples of the orbital frequencies. Usually, the dominant depolarization mechanisms are first-order spin-orbit resonances and the depolarization associated with crossing such a resonance can be computed using the Froissart-Stora formula. However, accelerating polarized hadron beams to high energy requires special magnet structures called Siberian snakes. When these are implemented to maintain a spin-precession frequency of one-half the revolution frequency, there will be no first-order spin-orbit resonance crossings. The dominant depolarization mechanisms are then higher-order spin-orbit resonances. The Froissart-Stora formula can be applied to higher-order resonances when the slope of the amplitude-dependent spin tune is constant. However, the slope of the amplitude-dependent spin tune often changes at the moment of resonance crossing. This work introduces a generalization of the Froissart-Stora formula which is applicable when the slope changes in this manner. The applicability of this formula is demonstrated through tracking simulations of a higher-order resonance crossing in both a toy model and the Relativistic Heavy Ion Collider (RHIC). It is additionally shown that the Froissart-Stora formula is mathematically equivalent to the Landau-Zener formula for the diabatic transition probability in two-level systems with a linearly increasing energy gap and constant coupling. This work therefore also extends the Landau-Zener formula to the case of changing slope.

Toward Physically Consistent Driving Video World Models under Challenging Trajectories

Authors:Jiawei Zhou, Zhenxin Zhu, Lingyi Du, Linye Lyu, Lijun Zhou, Zhanqian Wu, Hongcheng Luo, Zhuotao Tian, Bing Wang, Guang Chen, Hangjun Ye, Haiyang Sun, Yu Li
Date:2026-03-25 16:47:39

Video generation models have shown strong potential as world models for autonomous driving simulation. However, existing approaches are primarily trained on real-world driving datasets, which mostly contain natural and safe driving scenarios. As a result, current models often fail when conditioned on challenging or counterfactual trajectories-such as imperfect trajectories generated by simulators or planning systems-producing videos with severe physical inconsistencies and artifacts. To address this limitation, we propose PhyGenesis, a world model designed to generate driving videos with high visual fidelity and strong physical consistency. Our framework consists of two key components: (1) a physical condition generator that transforms potentially invalid trajectory inputs into physically plausible conditions, and (2) a physics-enhanced video generator that produces high-fidelity multi-view driving videos under these conditions. To effectively train these components, we construct a large-scale, physics-rich heterogeneous dataset. Specifically, in addition to real-world driving videos, we generate diverse challenging driving scenarios using the CARLA simulator, from which we derive supervision signals that guide the model to learn physically grounded dynamics under extreme conditions. This challenging-trajectory learning strategy enables trajectory correction and promotes physically consistent video generation. Extensive experiments demonstrate that PhyGenesis consistently outperforms state-of-the-art methods, especially on challenging trajectories. Our project page is available at: https://wm-research.github.io/PhyGenesis/.

Exoplanet Search and Characterization with the Proposed POET Canadian Space Mission

Authors:S. Metchev, J. Rowe, P. Miles-Páez, K. Hoffman, S. Lambier, R. Cloutier, H. Ishikawa, JJ Kavelaars, M. Kunimoto, D. Lafrenière, C. Lovekin, E. Pilles, J. Ruan, J. Sabarinathan, G. Wade, P. Wiegert, F. Grandmont, A. -S. Poulin-Girard, S. Grocott, R. Zee, J. Dupuis, P. Langlois, J. Roediger
Date:2026-03-25 16:25:01

The Photometric Observations of Exoplanet Transits (POET) is a proposed micro-satellite mission dedicated to the characterization and discovery of transiting exoplanets. POET has been identified as a top priority small-sat space mission in the Canadian Astronomy Long Range Plan 2020-2030. POET is being proposed as Canada's next astronomy space mission, with launch possible in late 2029. POET is an iteration on the designs of the Canadian MOST and NEOSSat space missions, which had 15 cm-sized telescopes and observed only in the visible band pass. POET will have a larger 20 cm telescope aperture and three band passes: near-ultraviolet (nUV; 300-400 nm), visible near-infrared (VNIR; 400-900 nm), and short-wavelength infrared (SWIR; 900-1700 nm). All mission components either already have significant space heritage or are seeing rapid adoption in commercial space missions. POET's simultaneous tri-band 300-1700 nm photometric monitoring will allow it to separate the impact of star spots on the transmission spectrum of extended atmospheres on super-Earth or larger exoplanets. POET's SWIR band is optimally sensitive to the emission peak of ultracool dwarf stars and would enable a systematic search for Earth-sized planets around them. POET aims to discover some of the nearest potentially habitable Earth-sized exoplanets that could be scrutinized for biosignatures with JWST or future telescopes. Herein we present the assembly of the POET Input Catalog of Ultracool Dwarfs and simulations of the expected yield of rocky planets with POET.

Composer 2 Technical Report

Authors:Cursor Research, :, Aaron Chan, Ahmed Shalaby, Alexander Wettig, Aman Sanger, Andrew Zhai, Anurag Ajay, Ashvin Nair, Charlie Snell, Chen Lu, Chen Shen, Emily Jia, Federico Cassano, Hanpeng Liu, Haoyu Chen, Henry Wildermuth, Jacob Jackson, Janet Li, Jediah Katz, Jiajun Yao, Joey Hejna, Josh Warner, Julius Vering, Kevin Frans, Lee Danilek, Less Wright, Lujing Cen, Luke Melas-Kyriazi, Michael Truell, Michiel de Jong, Naman Jain, Nate Schmidt, Nathan Wang, Niklas Muennighoff, Oleg Rybkin, Paul Loh, Phillip Kravtsov, Rishabh Yadav, Sahil Shah, Sam Kottler, Alexander M Rush, Shengtong Zhang, Shomil Jain, Sriram Sankar, Stefan Heule, Stuart H. Sul, Sualeh Asif, Victor Rong, Wanqi Zhu, William Lin, Yuchen Wu, Yuri Volkov, Yury Zemlyanskiy, Zack Holbrook, Zhiyuan Zhang
Date:2026-03-25 16:18:37

Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step execution, and coherence on long-horizon realistic coding problems. We develop infrastructure to support training in the same Cursor harness that is used by the deployed model, with equivalent tools and structure, and use environments that match real problems closely. To measure the ability of the model on increasingly difficult tasks, we introduce a benchmark derived from real software engineering problems in large codebases including our own. Composer 2 is a frontier-level coding model and demonstrates a process for training strong domain-specialized models. On our CursorBench evaluations the model achieves a major improvement in accuracy compared to previous Composer models (61.3). On public benchmarks the model scores 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual in our harness, comparable to state-of-the-art systems.