planning - 2025-04-08

Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation

Authors:Jiaming Chen, Wentao Zhao, Ziyu Meng, Donghui Mao, Ran Song, Wei Pan, Wei Zhang

Date:2025-04-07 16:13:09

Model Predictive Control (MPC) is a widely adopted control paradigm that leverages predictive models to estimate future system states and optimize control inputs accordingly. However, while MPC excels in planning and control, it lacks the capability for environmental perception, leading to failures in complex and unstructured scenarios. To address this limitation, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation planning framework that integrates the perception power of vision-language models (VLMs) with MPC. VLMPC utilizes a conditional action sampling module that takes a goal image or language instruction as input and leverages VLM to generate candidate action sequences. These candidates are fed into a video prediction model that simulates future frames based on the actions. In addition, we propose an enhanced variant, Traj-VLMPC, which replaces video prediction with motion trajectory generation to reduce computational complexity while maintaining accuracy. Traj-VLMPC estimates motion dynamics conditioned on the candidate actions, offering a more efficient alternative for long-horizon tasks and real-time applications. Both VLMPC and Traj-VLMPC select the optimal action sequence using a VLM-based hierarchical cost function that captures both pixel-level and knowledge-level consistency between the current observation and the task input. We demonstrate that both approaches outperform existing state-of-the-art methods on public benchmarks and achieve excellent performance in various real-world robotic manipulation tasks. Code is available at https://github.com/PPjmchen/VLMPC.

CARE: Aligning Language Models for Regional Cultural Awareness

Authors:Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu

Date:2025-04-07 14:57:06

Existing language models (LMs) often exhibit a Western-centric bias and struggle to represent diverse cultural knowledge. Previous attempts to address this rely on synthetic data and express cultural knowledge only in English. In this work, we study whether a small amount of human-written, multilingual cultural preference data can improve LMs across various model families and sizes. We first introduce CARE, a multilingual resource of 24.1k responses with human preferences on 2,580 questions about Chinese and Arab cultures, all carefully annotated by native speakers and offering more balanced coverage. Using CARE, we demonstrate that cultural alignment improves existing LMs beyond generic resources without compromising general capabilities. Moreover, we evaluate the cultural awareness of LMs, native speakers, and retrieved web content when queried in different languages. Our experiment reveals regional disparities among LMs, which may also be reflected in the documentation gap: native speakers often take everyday cultural commonsense and social norms for granted, while non-natives are more likely to actively seek out and document them. CARE is publicly available at https://github.com/Guochry/CARE (we plan to add Japanese data in the near future).

Safe and Efficient Coexistence of Autonomous Vehicles with Human-Driven Traffic at Signalized Intersections

Authors:Filippos N. Tzortzoglou, Logan E. Beaver, Andreas A. Malikopoulos

Date:2025-04-07 14:08:33

The proliferation of connected and automated vehicles (CAVs) has positioned mixed traffic environments, which encompass both CAVs and human driven vehicles (HDVs), as critical components of emerging mobility systems. Signalized intersections are paramount for optimizing transportation efficiency and enhancing energy economy, as they inherently induce stop and go traffic dynamics. In this paper, we present an integrated framework that concurrently optimizes signal timing and CAV trajectories at signalized intersections, with the dual objectives of maximizing traffic throughput and minimizing energy consumption for CAVs. We first formulate an optimal control strategy for CAVs that prioritizes trajectory planning to circumvent state constraints, while incorporating the impact of signal timing and HDV behavior. Furthermore, we introduce a traffic signal control methodology that dynamically adjusts signal phases based on vehicular density per lane, while mitigating disruption for CAVs scheduled to traverse the intersection. Acknowledging the system's inherent dynamism, we also explore event triggered replanning mechanisms that enable CAVs to iteratively refine their planned trajectories in response to the emergence of more efficient routing options. The efficacy of our proposed framework is evaluated through comprehensive simulations in MATLAB.

CloSE: A Compact Shape- and Orientation-Agnostic Cloth State Representation

Authors:Jay Kamat, Júlia Borràs, Carme Torras

Date:2025-04-07 12:54:58

Cloth manipulation is a difficult problem mainly because of the non-rigid nature of cloth, which makes a good representation of deformation essential. We present a new representation for the deformation-state of clothes. First, we propose the dGLI disk representation, based on topological indices computed for segments on the edges of the cloth mesh border that are arranged on a circular grid. The heat-map of the dGLI disk uncovers patterns that correspond to features of the cloth state that are consistent for different shapes, sizes of positions of the cloth, like the corners and the fold locations. We then abstract these important features from the dGLI disk onto a circle, calling it the Cloth StatE representation (CloSE). This representation is compact, continuous, and general for different shapes. Finally, we show the strengths of this representation in two relevant applications: semantic labeling and high- and low-level planning. The code, the dataset and the video can be accessed from : https://jaykamat99.github.io/close-representation

Constrained Gaussian Process Motion Planning via Stein Variational Newton Inference

Authors:Jiayun Li, Kay Pompetzki, An Thai Le, Haolei Tong, Jan Peters, Georgia Chalvatzaki

Date:2025-04-07 11:20:11

Gaussian Process Motion Planning (GPMP) is a widely used framework for generating smooth trajectories within a limited compute time--an essential requirement in many robotic applications. However, traditional GPMP approaches often struggle with enforcing hard nonlinear constraints and rely on Maximum a Posteriori (MAP) solutions that disregard the full Bayesian posterior. This limits planning diversity and ultimately hampers decision-making. Recent efforts to integrate Stein Variational Gradient Descent (SVGD) into motion planning have shown promise in handling complex constraints. Nonetheless, these methods still face persistent challenges, such as difficulties in strictly enforcing constraints and inefficiencies when the probabilistic inference problem is poorly conditioned. To address these issues, we propose a novel constrained Stein Variational Gaussian Process Motion Planning (cSGPMP) framework, incorporating a GPMP prior specifically designed for trajectory optimization under hard constraints. Our approach improves the efficiency of particle-based inference while explicitly handling nonlinear constraints. This advancement significantly broadens the applicability of GPMP to motion planning scenarios demanding robust Bayesian inference, strict constraint adherence, and computational efficiency within a limited time. We validate our method on standard benchmarks, achieving an average success rate of 98.57% across 350 planning tasks, significantly outperforming competitive baselines. This demonstrates the ability of our method to discover and use diverse trajectory modes, enhancing flexibility and adaptability in complex environments, and delivering significant improvements over standard baselines without incurring major computational costs.

The GRINTA hard X-ray mission: an Explorer of the Transient Sky

Authors:James Rodi, Lorenzo Natalucci

Date:2025-04-07 10:47:36

The era of time domain multi-messenger (MM) astrophysics requires sensitive, large field-of-view (FoV) observatories that are able to quickly react in order to respond to alerts from gravitational wave (GW) triggers, neutrino detections, and transient sources from all parts of the electromagnetic (EM) spectrum. This is particularly true at hard X-rays and soft gamma-rays where the EM counterparts to GW triggers, gamma-ray bursts (GRBs), emit most of their flux. While the present decade has a number of instruments capable of accomplishing this task, there are no missions planned for the 2030's when improved MM facilities will detect many more events. It is in this context that we present the GRINTA mission concept. GRINTA has a large area, large FoV detector to search for short, impulsive events in the 20 keV - 10 MeV energy range and a coded mask telescope for localizing and performing follow-up observations of sources from 5-200 keV. While GRINTA's main scientific goal is studying MM events, the instruments will observe numerous other sources to explore the sky at hard X-rays/soft gamma-rays.

Futureproof Static Memory Planning

Authors:Christos Lamprakos, Panagiotis Xanthopoulos, Manolis Katsaragakis, Sotirios Xydis, Dimitrios Soudris, Francky Catthoor

Date:2025-04-07 09:28:54

The NP-complete combinatorial optimization task of assigning offsets to a set of buffers with known sizes and lifetimes so as to minimize total memory usage is called dynamic storage allocation (DSA). Existing DSA implementations bypass the theoretical state-of-the-art algorithms in favor of either fast but wasteful heuristics, or memory-efficient approaches that do not scale beyond one thousand buffers. The "AI memory wall", combined with deep neural networks' static architecture, has reignited interest in DSA. We present idealloc, a low-fragmentation, high-performance DSA implementation designed for million-buffer instances. Evaluated on a novel suite of particularly hard benchmarks from several domains, idealloc ranks first against four production implementations in terms of a joint effectiveness/robustness criterion.

Approach to optimal quantum transport via states over time

Authors:Matt Hoogsteder-Riera, John Calsamiglia, Andreas Winter

Date:2025-04-07 09:13:56

We approach the problem of constructing a quantum analogue of the immensely fruitful classical transport cost theory of Monge from a new angle. Going back to the original motivations, by which the transport is a bilinear function of a mass distribution (without loss of generality a probability density) and a transport plan (a stochastic kernel), we explore the quantum version where the mass distribution is generalised to a density matrix, and the transport plan to a completely positive and trace preserving map. These two data are naturally integrated into their Jordan product, which is called state over time (``stote''), and the transport cost is postulated to be a linear function of it. We explore the properties of this transport cost, as well as the optimal transport cost between two given states (simply the minimum cost over all suitable transport plans). After that, we analyse in considerable detail the case of unitary invariant cost, for which we can calculate many costs analytically. These findings suggest that our quantum transport cost is qualitatively different from Monge's classical transport.

WLPCM Approach for Great Lakes Regulation

Authors:Xiangyi Chen, Wenbo Huang, Jiaqi Leng

Date:2025-04-07 06:21:22

This study develops a water-level management model for the Great Lakes using a predictive control framework. Requirement 1: Historical data (pre-2019) revealed consistent monthly water-level patterns. A simulated annealing algorithm optimized flow control via the Moses-Saunders Dam and Compensating Works to align levels with multi-year benchmarks. Requirement 2: A Water Level Predictive Control Model (WLPCM) integrated delayed differential equations (DDEs) and model predictive control (MPC) to account for inflow/outflow dynamics and upstream time lags. Natural variables (e.g., precipitation) were modeled via linear regression, while dam flow rates were optimized over 6-month horizons with feedback adjustments for robustness. Requirement 3: Testing WLPCM on 2017 data successfully mitigated Ottawa River flooding, outperforming historical records. Sensitivity analysis via the Sobol method confirmed model resilience to parameter variations. Requirement 4: Ice-clogging was identified as the most impactful natural variable (via RMSE-based sensitivity tests), followed by snowpack and precipitation. Requirement 5: Stakeholder demands (e.g., flood prevention, ecological balance) were incorporated into a fitness function. Compared to Plan 2014, WLPCM reduced catastrophic high levels in Lake Ontario and excessive St. Lawrence River flows by prioritizing long-term optimization. Key innovations include DDE-based predictive regulation, real-time feedback loops, and adaptive control under extreme conditions. The framework balances hydrological dynamics, stakeholder needs, and uncertainty management, offering a scalable solution for large freshwater systems.

MedGNN: Capturing the Links Between Urban Characteristics and Medical Prescriptions

Authors:Minwei Zhao, Sanja Scepanovic, Stephen Law, Daniele Quercia, Ivica Obadic

Date:2025-04-07 05:35:16

Understanding how urban socio-demographic and environmental factors relate with health is essential for public health and urban planning. However, traditional statistical methods struggle with nonlinear effects, while machine learning models often fail to capture geographical (nearby areas being more similar) and topological (unequal connectivity between places) effects in an interpretable way. To address this, we propose MedGNN, a spatio-topologically explicit framework that constructs a 2-hop spatial graph, integrating positional and locational node embeddings with urban characteristics in a graph neural network. Applied to MEDSAT, a comprehensive dataset covering over 150 environmental and socio-demographic factors and six prescription outcomes (depression, anxiety, diabetes, hypertension, asthma, and opioids) across 4,835 Greater London neighborhoods, MedGNN improved predictions by over 25% on average compared to baseline methods. Using depression prescriptions as a case study, we analyzed graph embeddings via geographical principal component analysis, identifying findings that: align with prior research (e.g., higher antidepressant prescriptions among older and White populations), contribute to ongoing debates (e.g., greenery linked to higher and NO2 to lower prescriptions), and warrant further study (e.g., canopy evaporation correlated with fewer prescriptions). These results demonstrate MedGNN's potential, and more broadly, of carefully applied machine learning, to advance transdisciplinary public health research.

Large-Scale Mixed-Traffic and Intersection Control using Multi-agent Reinforcement Learning

Authors:Songyang Liu, Muyang Fan, Weizi Li, Jing Du, Shuai Li

Date:2025-04-07 02:52:39

Traffic congestion remains a significant challenge in modern urban networks. Autonomous driving technologies have emerged as a potential solution. Among traffic control methods, reinforcement learning has shown superior performance over traffic signals in various scenarios. However, prior research has largely focused on small-scale networks or isolated intersections, leaving large-scale mixed traffic control largely unexplored. This study presents the first attempt to use decentralized multi-agent reinforcement learning for large-scale mixed traffic control in which some intersections are managed by traffic signals and others by robot vehicles. Evaluating a real-world network in Colorado Springs, CO, USA with 14 intersections, we measure traffic efficiency via average waiting time of vehicles at intersections and the number of vehicles reaching their destinations within a time window (i.e., throughput). At 80% RV penetration rate, our method reduces waiting time from 6.17 s to 5.09 s and increases throughput from 454 vehicles per 500 seconds to 493 vehicles per 500 seconds, outperforming the baseline of fully signalized intersections. These findings suggest that integrating reinforcement learning-based control large-scale traffic can improve overall efficiency and may inform future urban planning strategies.

HypRL: Reinforcement Learning of Control Policies for Hyperproperties

Authors:Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour

Date:2025-04-07 01:58:36

We study the problem of learning control policies for complex tasks whose requirements are given by a hyperproperty. The use of hyperproperties is motivated by their significant power to formally specify requirements of multi-agent systems as well as those that need expressiveness in terms of multiple execution traces (e.g., privacy and fairness). Given a Markov decision process M with unknown transitions (representing the environment) and a HyperLTL formula $\varphi$, our approach first employs Skolemization to handle quantifier alternations in $\varphi$. We introduce quantitative robustness functions for HyperLTL to define rewards of finite traces of M with respect to $\varphi$. Finally, we utilize a suitable reinforcement learning algorithm to learn (1) a policy per trace quantifier in $\varphi$, and (2) the probability distribution of transitions of M that together maximize the expected reward and, hence, probability of satisfaction of $\varphi$ in M. We present a set of case studies on (1) safety-preserving multi-agent path planning, (2) fairness in resource allocation, and (3) the post-correspondence problem (PCP).

Nonlinear Robust Optimization for Planning and Control

Authors:Arshiya Taj Abdul, Augustinos D. Saravanos, Evangelos A. Theodorou

Date:2025-04-06 20:33:53

This paper presents a novel robust trajectory optimization method for constrained nonlinear dynamical systems subject to unknown bounded disturbances. In particular, we seek optimal control policies that remain robustly feasible with respect to all possible realizations of the disturbances within prescribed uncertainty sets. To address this problem, we introduce a bi-level optimization algorithm. The outer level employs a trust-region successive convexification approach which relies on linearizing the nonlinear dynamics and robust constraints. The inner level involves solving the resulting linearized robust optimization problems, for which we derive tractable convex reformulations and present an Augmented Lagrangian method for efficiently solving them. To further enhance the robustness of our methodology on nonlinear systems, we also illustrate that potential linearization errors can be effectively modeled as unknown disturbances as well. Simulation results verify the applicability of our approach in controlling nonlinear systems in a robust manner under unknown disturbances. The promise of effectively handling approximation errors in such successive linearization schemes from a robust optimization perspective is also highlighted.

B4P: Simultaneous Grasp and Motion Planning for Object Placement via Parallelized Bidirectional Forests and Path Repair

Authors:Benjamin H. Leebron, Kejia Ren, Yiting Chen, Kaiyu Hang

Date:2025-04-06 20:02:17

Robot pick and place systems have traditionally decoupled grasp, placement, and motion planning to build sequential optimization pipelines with the assumption that the individual components will be able to work together. However, this separation introduces sub-optimality, as grasp choices may limit or even prohibit feasible motions for a robot to reach the target placement pose, particularly in cluttered environments with narrow passages. To this end, we propose a forest-based planning framework to simultaneously find grasp configurations and feasible robot motions that explicitly satisfy downstream placement configurations paired with the selected grasps. Our proposed framework leverages a bidirectional sampling-based approach to build a start forest, rooted at the feasible grasp regions, and a goal forest, rooted at the feasible placement regions, to facilitate the search through randomly explored motions that connect valid pairs of grasp and placement trees. We demonstrate that the framework's inherent parallelism enables superlinear speedup, making it scalable for applications for redundant robot arms (e.g., 7 Degrees of Freedom) to work efficiently in highly cluttered environments. Extensive experiments in simulation demonstrate the robustness and efficiency of the proposed framework in comparison with multiple baselines under diverse scenarios.

DexTOG: Learning Task-Oriented Dexterous Grasp with Language

Authors:Jieyi Zhang, Wenqiang Xu, Zhenjun Yu, Pengfei Xie, Tutian Tang, Cewu Lu

Date:2025-04-06 18:23:10

This study introduces a novel language-guided diffusion-based learning framework, DexTOG, aimed at advancing the field of task-oriented grasping (TOG) with dexterous hands. Unlike existing methods that mainly focus on 2-finger grippers, this research addresses the complexities of dexterous manipulation, where the system must identify non-unique optimal grasp poses under specific task constraints, cater to multiple valid grasps, and search in a high degree-of-freedom configuration space in grasp planning. The proposed DexTOG includes a diffusion-based grasp pose generation model, DexDiffu, and a data engine to support the DexDiffu. By leveraging DexTOG, we also proposed a new dataset, DexTOG-80K, which was developed using a shadow robot hand to perform various tasks on 80 objects from 5 categories, showcasing the dexterity and multi-tasking capabilities of the robotic hand. This research not only presents a significant leap in dexterous TOG but also provides a comprehensive dataset and simulation validation, setting a new benchmark in robotic manipulation research.

Memetic Search for Green Vehicle Routing Problem with Private Capacitated Refueling Stations

Authors:Rui Xu, Xing Fan, Shengcai Liu, Wenjie Chen, Ke Tang

Date:2025-04-06 15:52:49

The green vehicle routing problem with private capacitated alternative fuel stations (GVRP-PCAFS) extends the traditional green vehicle routing problem by considering refueling stations limited capacity, where a limited number of vehicles can refuel simultaneously with additional vehicles must wait. This feature presents new challenges for route planning, as waiting times at stations must be managed while keeping route durations within limits and reducing total travel distance. This article presents METS, a novel memetic algorithm (MA) with separate constraint-based tour segmentation (SCTS) and efficient local search (ELS) for solving GVRP-PCAFS. METS combines global and local search effectively through three novelties. For global search, the SCTS strategy splits giant tours to generate diverse solutions, and the search process is guided by a comprehensive fitness evaluation function to dynamically control feasibility and diversity to produce solutions that are both diverse and near-feasible. For local search, ELS incorporates tailored move operators with constant-time move evaluation mechanisms, enabling efficient exploration of large solution neighborhoods. Experimental results demonstrate that METS discovers 31 new best-known solutions out of 40 instances in existing benchmark sets, achieving substantial improvements over current state-of-the-art methods. Additionally, a new large-scale benchmark set based on real-world logistics data is introduced to facilitate future research.

A Classification View on Meta Learning Bandits

Authors:Mirco Mutti, Jeongyeol Kwon, Shie Mannor, Aviv Tamar

Date:2025-04-06 14:25:21

Contextual multi-armed bandits are a popular choice to model sequential decision-making. E.g., in a healthcare application we may perform various tests to asses a patient condition (exploration) and then decide on the best treatment to give (exploitation). When humans design strategies, they aim for the exploration to be fast, since the patient's health is at stake, and easy to interpret for a physician overseeing the process. However, common bandit algorithms are nothing like that: The regret caused by exploration scales with $\sqrt{H}$ over $H$ rounds and decision strategies are based on opaque statistical considerations. In this paper, we use an original classification view to meta learn interpretable and fast exploration plans for a fixed collection of bandits $\mathbb{M}$. The plan is prescribed by an interpretable decision tree probing decisions' payoff to classify the test bandit. The test regret of the plan in the stochastic and contextual setting scales with $O (\lambda^{-2} C_{\lambda} (\mathbb{M}) \log^2 (MH))$, being $M$ the size of $\mathbb{M}$, $\lambda$ a separation parameter over the bandits, and $C_\lambda (\mathbb{M})$ a novel classification-coefficient that fundamentally links meta learning bandits with classification. Through a nearly matching lower bound, we show that $C_\lambda (\mathbb{M})$ inherently captures the complexity of the setting.

Binary Weight Allocation for Multi-Objective Path Optimization: Efficient Earliest and Latest Path Discovery in Network Systems

Authors:Wei-Chang Yeh

Date:2025-04-06 14:18:52

This paper proposes earliest and latest path algorithms based on binary weight allocation, assigning weights of 2(i-1) and 2(m-i) to the i-th arc in a network. While traditional shortest path algorithms optimize only distance, our approach leverages Binary-Addition-Tree ordering to efficiently identify lexicographically smallest and largest paths that establish connectivity. These paths partition the solution space into three regions: guaranteed disconnection, transitional connectivity, and guaranteed no simple paths. Our weight allocation enables implicit encoding of multiple objectives directly in binary representations, maintaining the O((|V|+|E|)log|V|) complexity of Dijkstra's algorithm while allowing simultaneous optimization of competing factors like reliability and cost. Experimental validation demonstrates significant computational time reduction compared to traditional multi-objective methods. Applications span telecommunications, transportation networks, and supply chain management, providing efficient tools for network planning and reliability analysis under multiple constraints.

AI2STOW: End-to-End Deep Reinforcement Learning to Construct Master Stowage Plans under Demand Uncertainty

Authors:Jaike Van Twiller, Djordje Grbic, Rune Møller Jensen

Date:2025-04-06 12:45:25

The worldwide economy and environmental sustainability depend on eff icient and reliable supply chains, in which container shipping plays a crucial role as an environmentally friendly mode of transport. Liner shipping companies seek to improve operational efficiency by solving the stowage planning problem. Due to many complex combinatorial aspects, stowage planning is challenging and often decomposed into two NP-hard subproblems: master and slot planning. This article proposes AI2STOW, an end-to-end deep reinforcement learning model with feasibility projection and an action mask to create master plans under demand uncertainty with global objectives and constraints, including paired block stowage patterms. Our experimental results demonstrate that AI2STOW outperforms baseline methods from reinforcement learning and stochastic programming in objective performance and computational efficiency, based on simulated instances reflecting the scale of realistic vessels and operational planning horizons.

Deliberate Planning of 3D Bin Packing on Packing Configuration Trees

Authors:Hang Zhao, Juzhan Xu, Kexiong Yu, Ruizhen Hu, Chenyang Zhu, Kai Xu

Date:2025-04-06 09:07:10

Online 3D Bin Packing Problem (3D-BPP) has widespread applications in industrial automation. Existing methods usually solve the problem with limited resolution of spatial discretization, and/or cannot deal with complex practical constraints well. We propose to enhance the practical applicability of online 3D-BPP via learning on a novel hierarchical representation, packing configuration tree (PCT). PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL). The size of the packing action space is proportional to the number of leaf nodes, making the DRL model easy to train and well-performing even with continuous solution space. We further discover the potential of PCT as tree-based planners in deliberately solving packing problems of industrial significance, including large-scale packing and different variations of BPP setting. A recursive packing method is proposed to decompose large-scale packing into smaller sub-trees while a spatial ensemble mechanism integrates local solutions into global. For different BPP variations with additional decision variables, such as lookahead, buffering, and offline packing, we propose a unified planning framework enabling out-of-the-box problem solving. Extensive evaluations demonstrate that our method outperforms existing online BPP baselines and is versatile in incorporating various practical constraints. The planning process excels across large-scale problems and diverse problem variations. We develop a real-world packing robot for industrial warehousing, with careful designs accounting for constrained placement and transportation stability. Our packing robot operates reliably and efficiently on unprotected pallets at 10 seconds per box. It achieves averagely 19 boxes per pallet with 57.4% space utilization for relatively large-size boxes.

Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

Authors:Sergey Pastukhov

Date:2025-04-06 05:30:21

We introduce a novel hierarchical reinforcement learning (HRL) framework that performs top-down recursive planning via learned subgoals, successfully applied to the complex combinatorial puzzle game Sokoban. Our approach constructs a six-level policy hierarchy, where each higher-level policy generates subgoals for the level below. All subgoals and policies are learned end-to-end from scratch, without any domain knowledge. Our results show that the agent can generate long action sequences from a single high-level call. While prior work has explored 2-3 level hierarchies and subgoal-based planning heuristics, we demonstrate that deep recursive goal decomposition can emerge purely from learning, and that such hierarchies can scale effectively to hard puzzle domains.

Data Scaling Laws for End-to-End Autonomous Driving

Authors:Alexander Naumann, Xunjiang Gu, Tolga Dimlioglu, Mariusz Bojarski, Alperen Degirmenci, Alexander Popov, Devansh Bisla, Marco Pavone, Urs Müller, Boris Ivanovic

Date:2025-04-06 03:23:48

Autonomous vehicle (AV) stacks have traditionally relied on decomposed approaches, with separate modules handling perception, prediction, and planning. However, this design introduces information loss during inter-module communication, increases computational overhead, and can lead to compounding errors. To address these challenges, recent works have proposed architectures that integrate all components into an end-to-end differentiable model, enabling holistic system optimization. This shift emphasizes data engineering over software integration, offering the potential to enhance system performance by simply scaling up training resources. In this work, we evaluate the performance of a simple end-to-end driving architecture on internal driving datasets ranging in size from 16 to 8192 hours with both open-loop metrics and closed-loop simulations. Specifically, we investigate how much additional training data is needed to achieve a target performance gain, e.g., a 5% improvement in motion prediction accuracy. By understanding the relationship between model performance and training dataset size, we aim to provide insights for data-driven decision-making in autonomous driving development.

Learning Flatness-Preserving Residuals for Pure-Feedback Systems

Authors:Fengjun Yang, Jake Welde, Nikolai Matni

Date:2025-04-06 01:50:43

We study residual dynamics learning for differentially flat systems, where a nominal model is augmented with a learned correction term from data. A key challenge is that generic residual parameterizations may destroy flatness, limiting the applicability of flatness-based planning and control methods. To address this, we propose a framework for learning flatness-preserving residual dynamics in systems whose nominal model admits a pure-feedback form. We show that residuals with a lower-triangular structure preserve both the flatness of the system and the original flat outputs. Moreover, we provide a constructive procedure to recover the flatness diffeomorphism of the augmented system from that of the nominal model. We then introduce a learning algorithm that fits such residuals from trajectory data using smooth function approximators. Our approach is validated in simulation on a 2D quadrotor subject to unmodeled aerodynamic effects. We demonstrate that the resulting learned flat model enables tracking performance comparable to nonlinear model predictive control ($5\times$ lower tracking error than the nominal flat model) while also achieving over a $20\times$ speedup in computation.

A Self-Supervised Learning Approach with Differentiable Optimization for UAV Trajectory Planning

Authors:Yufei Jiang, Yuanzhu Zhan, Harsh Vardhan Gupta, Chinmay Borde, Junyi Geng

Date:2025-04-05 22:09:13

While Unmanned Aerial Vehicles (UAVs) have gained significant traction across various fields, path planning in 3D environments remains a critical challenge, particularly under size, weight, and power (SWAP) constraints. Traditional modular planning systems often introduce latency and suboptimal performance due to limited information sharing and local minima issues. End-to-end learning approaches streamline the pipeline by mapping sensory observations directly to actions but require large-scale datasets, face significant sim-to-real gaps, or lack dynamical feasibility. In this paper, we propose a self-supervised UAV trajectory planning pipeline that integrates a learning-based depth perception with differentiable trajectory optimization. A 3D cost map guides UAV behavior without expert demonstrations or human labels. Additionally, we incorporate a neural network-based time allocation strategy to improve the efficiency and optimality. The system thus combines robust learning-based perception with reliable physics-based optimization for improved generalizability and interpretability. Both simulation and real-world experiments validate our approach across various environments, demonstrating its effectiveness and robustness. Our method achieves a 31.33% improvement in position tracking error and 49.37% reduction in control effort compared to the state-of-the-art.

From Automation to Autonomy in Smart Manufacturing: A Bayesian Optimization Framework for Modeling Multi-Objective Experimentation and Sequential Decision Making

Authors:Avijit Saha Asru, Hamed Khosravi, Imtiaz Ahmed, Abdullahil Azeem

Date:2025-04-05 18:21:20

Discovering novel materials with desired properties is essential for driving innovation. Industry 4.0 and smart manufacturing have promised transformative advances in this area through real-time data integration and automated production planning and control. However, the reliance on automation alone has often fallen short, lacking the flexibility needed for complex processes. To fully unlock the potential of smart manufacturing, we must evolve from automation to autonomous systems that go beyond rigid programming and can dynamically optimize the search for solutions. Current discovery approaches are often slow, requiring numerous trials to find optimal combinations, and costly, particularly when optimizing multiple properties simultaneously. This paper proposes a Bayesian multi-objective sequential decision-making (BMSDM) framework that can intelligently select experiments as manufacturing progresses, guiding us toward the discovery of optimal design faster and more efficiently. The framework leverages sequential learning through Bayesian Optimization, which iteratively refines a statistical model representing the underlying manufacturing process. This statistical model acts as a surrogate, allowing for efficient exploration and optimization without requiring numerous real-world experiments. This approach can significantly reduce the time and cost of data collection required by traditional experimental designs. The proposed framework is compared with traditional DoE methods and two other multi-objective optimization methods. Using a manufacturing dataset, we evaluate and compare the performance of these approaches across five evaluation metrics. BMSDM comprehensively outperforms the competing methods in multi-objective decision-making scenarios. Our proposed approach represents a significant leap forward in creating an intelligent autonomous platform capable of novel material discovery.

Nuclear Physics and the European Particle Physics Strategy Update 2026

Authors:L. M. Fraile, J. J. Gaardhøje, U. van Kolck, H. Moutarde, N. Patronis, M. T. Peña, L. Popescu, V. Wagner, E. Widmann

Date:2025-04-05 08:22:34

This document provides input to the update of the European Strategy for Particle Physics in fields that are related to Nuclear Physics as described in the NuPECC Long Range Plan 2024 (arXiv:2503.15575).

Energy Efficient Planning for Repetitive Heterogeneous Tasks in Precision Agriculture

Authors:Shuangyu Xie, Ken Goldberg, Dezhen Song

Date:2025-04-04 21:09:17

Robotic weed removal in precision agriculture introduces a repetitive heterogeneous task planning (RHTP) challenge for a mobile manipulator. RHTP has two unique characteristics: 1) an observe-first-and-manipulate-later (OFML) temporal constraint that forces a unique ordering of two different tasks for each target and 2) energy savings from efficient task collocation to minimize unnecessary movements. RHTP can be framed as a stochastic renewal process. According to the Renewal Reward Theorem, the expected energy usage per task cycle is the long-run average. Traditional task and motion planning focuses on feasibility rather than optimality due to the unknown object and obstacle position prior to execution. However, the known target/obstacle distribution in precision agriculture allows minimizing the expected energy usage. For each instance in this renewal process, we first compute task space partition, a novel data structure that computes all possibilities of task multiplexing and its probabilities with robot reachability. Then we propose a region-based set-coverage problem to formulate the RHTP as a mixed-integer nonlinear programming. We have implemented and solved RHTP using Branch-and-Bound solver. Compared to a baseline in simulations based on real field data, the results suggest a significant improvement in path length, number of robot stops, overall energy usage, and number of replans.

Control Map Distribution using Map Query Bank for Online Map Generation

Authors:Ziming Liu, Leichen Wang, Ge Yang, Xinrun Li, Xingtao Hu, Hao Sun, Guangyu Gao

Date:2025-04-04 18:47:42

Reliable autonomous driving systems require high-definition (HD) map that contains detailed map information for planning and navigation. However, pre-build HD map requires a large cost. Visual-based Online Map Generation (OMG) has become an alternative low-cost solution to build a local HD map. Query-based BEV Transformer has been a base model for this task. This model learns HD map predictions from an initial map queries distribution which is obtained by offline optimization on training set. Besides the quality of BEV feature, the performance of this model also highly relies on the capacity of initial map query distribution. However, this distribution is limited because the limited query number. To make map predictions optimal on each test sample, it is essential to generate a suitable initial distribution for each specific scenario. This paper proposes to decompose the whole HD map distribution into a set of point representations, namely map query bank (MQBank). To build specific map query initial distributions of different scenarios, low-cost standard definition map (SD map) data is introduced as a kind of prior knowledge. Moreover, each layer of map decoder network learns instance-level map query features, which will lose detailed information of each point. However, BEV feature map is a point-level dense feature. It is important to keep point-level information in map queries when interacting with BEV feature map. This can also be solved with map query bank method. Final experiments show a new insight on SD map prior and a new record on OpenLaneV2 benchmark with 40.5%, 45.7% mAP on vehicle lane and pedestrian area.

Multi-encoder nnU-Net outperforms Transformer models with self-supervised pretraining

Authors:Seyedeh Sahar Taheri Otaghsara, Reza Rahmanzadeh

Date:2025-04-04 14:31:06

This study addresses the essential task of medical image segmentation, which involves the automatic identification and delineation of anatomical structures and pathological regions in medical images. Accurate segmentation is crucial in radiology, as it aids in the precise localization of abnormalities such as tumors, thereby enabling effective diagnosis, treatment planning, and monitoring of disease progression. Specifically, the size, shape, and location of tumors can significantly influence clinical decision-making and therapeutic strategies, making accurate segmentation a key component of radiological workflows. However, challenges posed by variations in MRI modalities, image artifacts, and the scarcity of labeled data complicate the segmentation task and impact the performance of traditional models. To overcome these limitations, we propose a novel self-supervised learning Multi-encoder nnU-Net architecture designed to process multiple MRI modalities independently through separate encoders. This approach allows the model to capture modality-specific features before fusing them for the final segmentation, thus improving accuracy. Our Multi-encoder nnU-Net demonstrates exceptional performance, achieving a Dice Similarity Coefficient (DSC) of 93.72%, which surpasses that of other models such as vanilla nnU-Net, SegResNet, and Swin UNETR. By leveraging the unique information provided by each modality, the model enhances segmentation tasks, particularly in scenarios with limited annotated data. Evaluations highlight the effectiveness of this architecture in improving tumor segmentation outcomes.

The AI Cosmologist I: An Agentic System for Automated Data Analysis

Authors:Adam Moss

Date:2025-04-04 13:12:08

We present the AI Cosmologist, an agentic system designed to automate cosmological/astronomical data analysis and machine learning research workflows. This implements a complete pipeline from idea generation to experimental evaluation and research dissemination, mimicking the scientific process typically performed by human researchers. The system employs specialized agents for planning, coding, execution, analysis, and synthesis that work together to develop novel approaches. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies, writes complete code, handles execution errors, analyzes results, and synthesizes new approaches based on experimental outcomes. We demonstrate the AI Cosmologist capabilities across several machine learning tasks, showing how it can successfully explore solution spaces, iterate based on experimental results, and combine successful elements from different approaches. Our results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery. The code and experimental data used in this paper are available on GitHub at https://github.com/adammoss/aicosmologist. Example papers included in the appendix demonstrate the system's capability to autonomously produce complete scientific publications, starting from only the dataset and task description