The increasing complexity of underwater robotic systems has led to a surge in simulation platforms designed to support perception, planning, and control tasks in marine environments. However, selecting the most appropriate underwater robotic simulator (URS) remains a challenge due to wide variations in fidelity, extensibility, and task suitability. This paper presents a comprehensive review and comparative analysis of five state-of-the-art, ROS-compatible, open-source URSs: Stonefish, DAVE, HoloOcean, MARUS, and UNav-Sim. Each simulator is evaluated across multiple criteria including sensor fidelity, environmental realism, sim-to-real capabilities, and research impact. We evaluate them across architectural design, sensor and physics modeling, task capabilities, and research impact. Additionally, we discuss ongoing challenges in sim-to-real transfer and highlight the need for standardization and benchmarking in the field. Our findings aim to guide practitioners in selecting effective simulation environments and inform future development of more robust and transferable URSs.
Path planning algorithms, such as the search-based A*, are a critical component of autonomous mobile robotics, enabling robots to navigate from a starting point to a destination efficiently and safely. We investigated the resilience of the A* algorithm in the face of potential adversarial interventions known as obstacle attacks. The adversary's goal is to delay the robot's timely arrival at its destination by introducing obstacles along its original path. We developed malicious software to execute the attacks and conducted experiments to assess their impact, both in simulation using TurtleBot in Gazebo and in real-world deployment with the Unitree Go1 robot. In simulation, the attacks resulted in an average delay of 36\%, with the most significant delays occurring in scenarios where the robot was forced to take substantially longer alternative paths. In real-world experiments, the delays were even more pronounced, with all attacks successfully rerouting the robot and causing measurable disruptions. These results highlight that the algorithm's robustness is not solely an attribute of its design but is significantly influenced by the operational environment. For example, in constrained environments like tunnels, the delays were maximized due to the limited availability of alternative routes.
Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel support for shared memory makes it difficult to avoid copying entirely. In this work, we introduce several new techniques to eliminate nearly all copying from pipelines: in particular, we implement a new kernel module that performs de-anonymization, thus eliminating a copy to intermediate data. We conclude by sharing our preliminary evaluation on different workloads types, as well as discussing our plan for future improvements.
Recent advancements in Multimodal Large Language Models (MLLMs) have led to significant improvements across various multimodal benchmarks. However, as evaluations shift from static datasets to open-world, dynamic environments, current game-based benchmarks remain inadequate because they lack visual-centric tasks and fail to assess the diverse reasoning skills required for real-world decision-making. To address this, we introduce Visual-centric Multiple Abilities Game Evaluation (V-MAGE), a game-based evaluation framework designed to assess visual reasoning capabilities of MLLMs. V-MAGE features five diverse games with 30+ handcrafted levels, testing models on core visual skills such as positioning, trajectory tracking, timing, and visual memory, alongside higher-level reasoning like long-term planning and deliberation. We use V-MAGE to evaluate leading MLLMs, revealing significant challenges in their visual perception and reasoning. In all game environments, the top-performing MLLMs, as determined by Elo rating comparisons, exhibit a substantial performance gap compared to humans. Our findings highlight critical limitations, including various types of perceptual errors made by the models, and suggest potential avenues for improvement from an agent-centric perspective, such as refining agent strategies and addressing perceptual inaccuracies. Code is available at https://github.com/CSU-JPG/V-MAGE.
Vehicle Routing Problems (VRP) are an extension of the Traveling Salesperson Problem and are a fundamental NP-hard challenge in combinatorial optimization. Solving VRP in real-time at large scale has become critical in numerous applications, from growing markets like last-mile delivery to emerging use-cases like interactive logistics planning. Such applications involve solving similar problem instances repeatedly, yet current state-of-the-art solvers treat each instance on its own without leveraging previous examples. We introduce a novel optimization framework that uses a reinforcement learning agent - trained on prior instances - to quickly generate initial solutions, which are then further optimized by genetic algorithms. Our framework, Evolutionary Algorithm with Reinforcement Learning Initialization (EARLI), consistently outperforms current state-of-the-art solvers across various time scales. For example, EARLI handles vehicle routing with 500 locations within 1s, 10x faster than current solvers for the same solution quality, enabling applications like real-time and interactive routing. EARLI can generalize to new data, as demonstrated on real e-commerce delivery data of a previously unseen city. Our hybrid framework presents a new way to combine reinforcement learning and genetic algorithms, paving the road for closer interdisciplinary collaboration between AI and optimization communities towards real-time optimization in diverse domains.
Safety is critical during human-robot interaction. But -- because people are inherently unpredictable -- it is often difficult for robots to plan safe behaviors. Instead of relying on our ability to anticipate humans, here we identify robot policies that are robust to unexpected human decisions. We achieve this by formulating human-robot interaction as a zero-sum game, where (in the worst case) the human's actions directly conflict with the robot's objective. Solving for the Nash Equilibrium of this game provides robot policies that maximize safety and performance across a wide range of human actions. Existing approaches attempt to find these optimal policies by leveraging Hamilton-Jacobi analysis (which is intractable) or linear-quadratic approximations (which are inexact). By contrast, in this work we propose a computationally efficient and theoretically justified method that converges towards the Nash Equilibrium policy. Our approach (which we call MCLQ) leverages linear-quadratic games to obtain an initial guess at safe robot behavior, and then iteratively refines that guess with a Monte Carlo search. Not only does MCLQ provide real-time safety adjustments, but it also enables the designer to tune how conservative the robot is -- preventing the system from focusing on unrealistic human behaviors. Our simulations and user study suggest that this approach advances safety in terms of both computation time and expected performance. See videos of our experiments here: https://youtu.be/KJuHeiWVuWY.
Inverse dynamics computation is a critical component in robot control, planning and simulation, enabling the calculation of joint torques required to achieve a desired motion. This paper presents a ROS2-based software library designed to solve the inverse dynamics problem for robotic systems. The library is built around an abstract class with three concrete implementations: one for simulated robots and two for real UR10 and Franka robots. This contribution aims to provide a flexible, extensible, robot-agnostic solution to inverse dynamics, suitable for both simulation and real-world scenarios involving planning and control applications. The related software is available at https://github.com/ros2-gbp/ros2-gbp-github-org/issues/732.
The vast majority of Multi-Agent Path Finding (MAPF) methods with completeness guarantees require planning full horizon paths. However, planning full horizon paths can take too long and be impractical in real-world applications. Instead, real-time planning and execution, which only allows the planner a finite amount of time before executing and replanning, is more practical for real world multi-agent systems. Several methods utilize real-time planning schemes but none are provably complete, which leads to livelock or deadlock. Our main contribution is to show the first Real-Time MAPF method with provable completeness guarantees. We do this by leveraging LaCAM (Okumura 2023) in an incremental fashion. Our results show how we can iteratively plan for congested environments with a cutoff time of milliseconds while still maintaining the same success rate as full horizon LaCAM. We also show how it can be used with a single-step learned MAPF policy. The proposed Real-Time LaCAM also provides us with a general mechanism for using iterative constraints for completeness in future real-time MAPF algorithms.
Monte-Carlo tree search (MCTS) has driven many recent breakthroughs in deep reinforcement learning (RL). However, scaling MCTS to parallel compute has proven challenging in practice which has motivated alternative planners like sequential Monte-Carlo (SMC). Many of these SMC methods adopt particle filters for smoothing through a reformulation of RL as a policy inference problem. Yet, persisting design choices of these particle filters often conflict with the aim of online planning in RL, which is to obtain a policy improvement at the start of planning. Drawing inspiration from MCTS, we tailor SMC planners specifically for RL by improving data generation within the planner through constrained action sampling and explicit terminal state handling, as well as improving policy and value target estimation. This leads to our Trust-Region Twisted SMC (TRT-SMC), which shows improved runtime and sample-efficiency over baseline MCTS and SMC methods in both discrete and continuous domains.
Pre-trained models (PTMs) are becoming a common component in open-source software (OSS) development, yet their roles, maintenance practices, and lifecycle challenges remain underexplored. This report presents a plan for an exploratory study to investigate how PTMs are utilized, maintained, and tested in OSS projects, focusing on models hosted on platforms like Hugging Face and PyTorch Hub. We plan to explore how PTMs are used in open-source software projects and their related maintenance practices by mining software repositories that use PTMs and analyzing their code-base, historical data, and reported issues. This study aims to provide actionable insights into improving the use and sustainability of PTM in open-source projects and a step towards a foundation for advancing software engineering practices in the context of model dependencies.
Automatic view positioning is crucial for cardiac computed tomography (CT) examinations, including disease diagnosis and surgical planning. However, it is highly challenging due to individual variability and large 3D search space. Existing work needs labor-intensive and time-consuming manual annotations to train view-specific models, which are limited to predicting only a fixed set of planes. However, in real clinical scenarios, the challenge of positioning semantic 2D slices with any orientation into varying coordinate space in arbitrary 3D volume remains unsolved. We thus introduce a novel framework, AVP-AP, the first to use Atlas Prompting for self-supervised Automatic View Positioning in the 3D CT volume. Specifically, this paper first proposes an atlas prompting method, which generates a 3D canonical atlas and trains a network to map slices into their corresponding positions in the atlas space via a self-supervised manner. Then, guided by atlas prompts corresponding to the given query images in a reference CT, we identify the coarse positions of slices in the target CT volume using rigid transformation between the 3D atlas and target CT volume, effectively reducing the search space. Finally, we refine the coarse positions by maximizing the similarity between the predicted slices and the query images in the feature space of a given foundation model. Our framework is flexible and efficient compared to other methods, outperforming other methods by 19.8% average structural similarity (SSIM) in arbitrary view positioning and achieving 9% SSIM in two-chamber view compared to four radiologists. Meanwhile, experiments on a public dataset validate our framework's generalizability.
In this study, we present a simple and intuitive method for accelerating optimal Reeds-Shepp path computation. Our approach uses geometrical reasoning to analyze the behavior of optimal paths, resulting in a new partitioning of the state space and a further reduction in the minimal set of viable paths. We revisit and reimplement classic methodologies from the literature, which lack contemporary open-source implementations, to serve as benchmarks for evaluating our method. Additionally, we address the under-specified Reeds-Shepp planning problem where the final orientation is unspecified. We perform exhaustive experiments to validate our solutions. Compared to the modern C++ implementation of the original Reeds-Shepp solution in the Open Motion Planning Library, our method demonstrates a 15x speedup, while classic methods achieve a 5.79x speedup. Both approaches exhibit machine-precision differences in path lengths compared to the original solution. We release our proposed C++ implementations for both the accelerated and under-specified Reeds-Shepp problems as open-source code.
This study analyzes the financial resilience of agricultural and food production companies in Spain amid the Ukraine-Russia war using cluster analysis based on financial ratios. This research utilizes centered log-ratios to transform financial ratios for compositional data analysis. The dataset comprises financial information from 1197 firms in Spain's agricultural and food sectors over the period 2021-2023. The analysis reveals distinct clusters of firms with varying financial performance, characterized by metrics of solvency and profitability. The results highlight an increase in resilient firms by 2023, underscoring sectoral adaptation to the conflict's economic challenges. These findings together provide insights for stakeholders and policymakers to improve sectorial stability and strategic planning.
Effective demand forecasting is critical for inventory management, production planning, and decision making across industries. Selecting the appropriate model and suitable features to efficiently capture patterns in the data is one of the main challenges in demand forecasting. In reality, this becomes even more complicated when the recorded sales have zeroes, which can happen naturally or due to some anomalies, such as stockouts and recording errors. Mistreating the zeroes can lead to the application of inappropriate forecasting methods, and thus leading to poor decision making. Furthermore, the demand itself can have different fundamental characteristics, and being able to distinguish one type from another might bring substantial benefits in terms of accuracy and thus decision making. We propose a two-stage model-based classification framework that in the first step, identifies artificially occurring zeroes, and then classifies demand to one of the possible types: regular/intermittent, intermittent smooth/lumpy, fractional/count. The framework utilises statistical modelling and information criteria to detect anomalous zeroes and then classify demand into those categories. We then argue that different types of demand need different features, and show empirically that they tend to increase the accuracy of the forecasting methods compared to those applied directly to the dataset without the generated features and the two-stage framework. Our general practical recommendation based on that is to use the mixture approach for intermittent demand, capturing the demand sizes and demand probability separately, as it seems to improve the accuracy of different forecasting approaches.
This work proposes a jointly optimized trajectory generation and camera control approach, enabling an autonomous agent, such as an unmanned aerial vehicle (UAV) operating in 3D environments, to plan and execute coverage trajectories that maximally cover the surface area of a 3D object of interest. Specifically, the UAV's kinematic and camera control inputs are jointly optimized over a rolling planning horizon to achieve complete 3D coverage of the object. The proposed controller incorporates ray-tracing into the planning process to simulate the propagation of light rays, thereby determining the visible parts of the object through the UAV's camera. This integration enables the generation of precise look-ahead coverage trajectories. The coverage planning problem is formulated as a rolling finite-horizon optimal control problem and solved using mixed-integer programming techniques. Extensive real-world and synthetic experiments validate the performance of the proposed approach.
This work proposes a coverage controller that enables an aerial team of distributed autonomous agents to collaboratively generate non-myopic coverage plans over a rolling finite horizon, aiming to cover specific points on the surface area of a 3D object of interest. The collaborative coverage problem, formulated, as a distributed model predictive control problem, optimizes the agents' motion and camera control inputs, while considering inter-agent constraints aiming at reducing work redundancy. The proposed coverage controller integrates constraints based on light-path propagation techniques to predict the parts of the object's surface that are visible with regard to the agents' future anticipated states. This work also demonstrates how complex, non-linear visibility assessment constraints can be converted into logical expressions that are embedded as binary constraints into a mixed-integer optimization framework. The proposed approach has been demonstrated through simulations and practical applications for inspecting buildings with unmanned aerial vehicles (UAVs).
This study presents residual U-Net (U-ResNet), a deep learning surrogate model for predicting hemodynamic fields in two-dimensional asymmetric stenotic channels at Reynolds numbers ranging from 200 to 800. By integrating residual connections with multi-scale feature extraction, U-ResNet achieves exceptional accuracy while significantly reducing computational costs compared to computational fluid dynamics (CFD) approaches. Comprehensive evaluation against U-Net, Fourier Neural Operator (FNO), and U-Net enhanced Fourier Neural Operator demonstrates U-ResNet's superior performance in capturing sharp hemodynamic gradients and complex flow features. Notably, U-ResNet demonstrates robust generalization to interpolated Reynolds numbers without retraining - a capability rarely achieved in existing models. The model's non-dimensional formulation ensures scalability across vessel sizes and anatomical locations, enhancing its applicability to diverse clinical scenarios. Statistical analysis of prediction errors reveals that U-ResNet maintains substantially narrower error distributions compared to spectral methods, confirming its reliability for critical hemodynamic assessment. These advances position U-ResNet as a promising tool for real-time clinical decision support, treatment planning, and medical device optimization, with future work focusing on extension to three-dimensional geometries and integration with patient-specific data.
Autonomous driving systems must operate safely in human-populated indoor environments, where challenges such as limited perception and occlusion sensitivity arise when relying solely on onboard sensors. These factors generate difficulties in the accurate recognition of human intentions and the generation of comfortable, socially aware trajectories. To address these issues, we propose SAP-CoPE, a social-aware planning framework that integrates cooperative infrastructure with a novel 3D human pose estimation method and a model predictive control-based controller. This real-time framework formulates an optimization problem that accounts for uncertainty propagation in the camera projection matrix while ensuring human joint coherence. The proposed method is adaptable to single- or multi-camera configurations and can incorporate sparse LiDAR point-cloud data. To enhance safety and comfort in human environments, we integrate a human personal space field based on human pose into a model predictive controller, enabling the system to navigate while avoiding discomfort zones. Extensive evaluations in both simulated and real-world settings demonstrate the effectiveness of our approach in generating socially aware trajectories for autonomous systems.
Chronic kidney disease (CKD) is a growing global health concern, necessitating precise and efficient image analysis to aid diagnosis and treatment planning. Automated segmentation of kidney pathology images plays a central role in facilitating clinical workflows, yet conventional segmentation models often require delicate threshold tuning. This paper proposes a novel \textit{Cascaded Threshold-Integrated U-Net (CTI-Unet)} to overcome the limitations of single-threshold segmentation. By sequentially integrating multiple thresholded outputs, our approach can reconcile noise suppression with the preservation of finer structural details. Experiments on the challenging KPIs2024 dataset demonstrate that CTI-Unet outperforms state-of-the-art architectures such as nnU-Net, Swin-Unet, and CE-Net, offering a robust and flexible framework for kidney pathology image segmentation.
The last few years have witnessed rapid growth in the on-demand delivery market, with many start-ups entering the field. However, not all of these start-ups have succeeded due to various reasons, among others, not being able to establish a large enough customer base. In this paper, we address this problem that many on-demand transportation start-ups face: how to establish themselves in a new market. When starting, such companies often have limited fleet resources to serve demand across a city. Depending on the use of the fleet, varying service quality is observed in different areas of the city, and in turn, the service quality impacts the respective growth of demand in each area. Thus, operational fulfillment decisions drive the longer-term demand development. To integrate strategic demand development into real-time fulfillment operations, we propose a two-step approach. First, we derive analytical insights into optimal allocation decisions for a stylized problem. Second, we use these insights to shape the training data of a reinforcement learning strategy for operational real-time fulfillment. Our experiments demonstrate that combining operational efficiency with long-term strategic planning is highly advantageous. Further, we show that the careful shaping of training data is essential for the successful development of demand.
We introduce Lazy-DaSH, an improvement over the recent state of the art multi-robot task and motion planning method DaSH, which scales to more than double the number of robots and objects compared to the original method and achieves an order of magnitude faster planning time when applied to a multi-manipulator object rearrangement problem. We achieve this improvement through a hierarchical approach, where a high-level task planning layer identifies planning spaces required for task completion, and motion feasibility is validated lazily only within these spaces. In contrast, DaSH precomputes the motion feasibility of all possible actions, resulting in higher costs for constructing state space representations. Lazy-DaSH maintains efficient query performance by utilizing a constraint feedback mechanism within its hierarchical structure, ensuring that motion feasibility is effectively conveyed to the query process. By maintaining smaller state space representations, our method significantly reduces both representation construction time and query time. We evaluate Lazy-DaSH in four distinct scenarios, demonstrating its scalability to increasing numbers of robots and objects, as well as its adaptability in resolving conflicts through the constraint feedback mechanism.
One approach to using prior experience in robot motion planning is to store solutions to previously seen problems in a database of paths. Methods that use such databases are characterized by how they query for a path and how they use queries given a new problem. In this work we present a new method, Path Database Guidance (PDG), which innovates on existing work in two ways. First, we use the database to compute a heuristic for determining which nodes of a search tree to expand, in contrast to prior work which generally pastes the (possibly transformed) queried path or uses it to bias a sampling distribution. We demonstrate that this makes our method more easily composable with other search methods by dynamically interleaving exploration according to a baseline algorithm with exploitation of the database guidance. Second, in contrast to other methods that treat the database as a single fixed prior, our database (and thus our queried heuristic) updates as we search the implicitly defined robot configuration space. We experimentally demonstrate the effectiveness of PDG in a variety of explicitly defined environment distributions in simulation.
Context: Smart contract vulnerabilities pose significant security risks for the Ethereum ecosystem, driving the development of automated tools for detection and mitigation. Smart contracts are written in Solidity, a programming language that is rapidly evolving to add features and improvements to enhance smart contract security. New versions of Solidity change the compilation process, potentially affecting how tools interpret and analyze smart contract code. Objective: In such a continuously evolving landscape, we aim to investigate the compatibility of detection tools with Solidity versions. More specifically, we present a plan to study detection tools by empirically assessing (i) their compatibility with the Solidity pragma directives, (ii) their detection effectiveness, and (iii) their execution time across different versions of Solidity. Method: We will conduct an exploratory study by running several tools and collecting a large number of real-world smart contracts to create a balanced dataset. We will track and analyze the tool execution through SmartBugs, a framework that facilitates the tool execution and allows the integration of new tools.
Zero Trust is the new cybersecurity model that challenges the traditional one by promoting continuous verification of users, devices, and applications, whatever their position or origin. This model is critical for reducing the attack surface and preventing lateral movement without relying on implicit trust. Adopting the zero trust principle in Intelligent Transportation Systems (ITS), especially in the context of connected vehicles (CVs), presents an adequate solution in the face of increasing cyber threats, thereby strengthening the ITS environment. This paper offers an understanding of Zero Trust security through a comprehensive review of existing literature, principles, and challenges. It specifically examines its applications in emerging technologies, particularly within connected vehicles, addressing potential issues and cyber threats faced by CVs. Inclusion/exclusion criteria for the systematic literature review were planned alongside a bibliometric analysis. Moreover, keyword co-occurrence analysis was done, which indicates trends and general themes for the Zero Trust model, Zero Trust implementation, and Zero Trust application. Furthermore, the paper explores various ZT models proposed in the literature for connected vehicles, shedding light on the challenges associated with their integration into CV systems. Future directions of this research will focus on incorporating Zero Trust principles within Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communication paradigms. This initiative intends to enhance the security posture and safety protocols within interconnected vehicular networks. The proposed research seeks to address the unique cybersecurity vulnerabilities inherent in the highly dynamic nature of vehicular communication systems.
As generative AI (GenAI) models are increasingly explored for educational applications, understanding educator preferences for AI-generated lesson plans is critical for their effective integration into K-12 instruction. This exploratory study compares lesson plans authored by human curriculum designers, a fine-tuned LLaMA-2-13b model trained on K-12 content, and a customized GPT-4 model to evaluate their pedagogical quality across multiple instructional measures: warm-up activities, main tasks, cool-down activities, and overall quality. Using a large-scale preference study with K-12 math educators, we examine how preferences vary across grade levels and instructional components. We employ both qualitative and quantitative analyses. The raw preference results indicate that human-authored lesson plans are generally favored, particularly for elementary education, where educators emphasize student engagement, scaffolding, and collaborative learning. However, AI-generated models demonstrate increasing competitiveness in cool-down tasks and structured learning activities, particularly in high school settings. Beyond quantitative results, we conduct thematic analysis using LDA and manual coding to identify key factors influencing educator preferences. Educators value human-authored plans for their nuanced differentiation, real-world contextualization, and student discourse facilitation. Meanwhile, AI-generated lesson plans are often praised for their structure and adaptability for specific instructional tasks. Findings suggest a human-AI collaborative approach to lesson planning, where GenAI can serve as an assistive tool rather than a replacement for educator expertise in lesson planning. This study contributes to the growing discourse on responsible AI integration in education, highlighting both opportunities and challenges in leveraging GenAI for curriculum development.
Model Predictive Control (MPC) is a widely adopted control paradigm that leverages predictive models to estimate future system states and optimize control inputs accordingly. However, while MPC excels in planning and control, it lacks the capability for environmental perception, leading to failures in complex and unstructured scenarios. To address this limitation, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation planning framework that integrates the perception power of vision-language models (VLMs) with MPC. VLMPC utilizes a conditional action sampling module that takes a goal image or language instruction as input and leverages VLM to generate candidate action sequences. These candidates are fed into a video prediction model that simulates future frames based on the actions. In addition, we propose an enhanced variant, Traj-VLMPC, which replaces video prediction with motion trajectory generation to reduce computational complexity while maintaining accuracy. Traj-VLMPC estimates motion dynamics conditioned on the candidate actions, offering a more efficient alternative for long-horizon tasks and real-time applications. Both VLMPC and Traj-VLMPC select the optimal action sequence using a VLM-based hierarchical cost function that captures both pixel-level and knowledge-level consistency between the current observation and the task input. We demonstrate that both approaches outperform existing state-of-the-art methods on public benchmarks and achieve excellent performance in various real-world robotic manipulation tasks. Code is available at https://github.com/PPjmchen/VLMPC.
Existing language models (LMs) often exhibit a Western-centric bias and struggle to represent diverse cultural knowledge. Previous attempts to address this rely on synthetic data and express cultural knowledge only in English. In this work, we study whether a small amount of human-written, multilingual cultural preference data can improve LMs across various model families and sizes. We first introduce CARE, a multilingual resource of 24.1k responses with human preferences on 2,580 questions about Chinese and Arab cultures, all carefully annotated by native speakers and offering more balanced coverage. Using CARE, we demonstrate that cultural alignment improves existing LMs beyond generic resources without compromising general capabilities. Moreover, we evaluate the cultural awareness of LMs, native speakers, and retrieved web content when queried in different languages. Our experiment reveals regional disparities among LMs, which may also be reflected in the documentation gap: native speakers often take everyday cultural commonsense and social norms for granted, while non-natives are more likely to actively seek out and document them. CARE is publicly available at https://github.com/Guochry/CARE (we plan to add Japanese data in the near future).
The proliferation of connected and automated vehicles (CAVs) has positioned mixed traffic environments, which encompass both CAVs and human driven vehicles (HDVs), as critical components of emerging mobility systems. Signalized intersections are paramount for optimizing transportation efficiency and enhancing energy economy, as they inherently induce stop and go traffic dynamics. In this paper, we present an integrated framework that concurrently optimizes signal timing and CAV trajectories at signalized intersections, with the dual objectives of maximizing traffic throughput and minimizing energy consumption for CAVs. We first formulate an optimal control strategy for CAVs that prioritizes trajectory planning to circumvent state constraints, while incorporating the impact of signal timing and HDV behavior. Furthermore, we introduce a traffic signal control methodology that dynamically adjusts signal phases based on vehicular density per lane, while mitigating disruption for CAVs scheduled to traverse the intersection. Acknowledging the system's inherent dynamism, we also explore event triggered replanning mechanisms that enable CAVs to iteratively refine their planned trajectories in response to the emergence of more efficient routing options. The efficacy of our proposed framework is evaluated through comprehensive simulations in MATLAB.
Cloth manipulation is a difficult problem mainly because of the non-rigid nature of cloth, which makes a good representation of deformation essential. We present a new representation for the deformation-state of clothes. First, we propose the dGLI disk representation, based on topological indices computed for segments on the edges of the cloth mesh border that are arranged on a circular grid. The heat-map of the dGLI disk uncovers patterns that correspond to features of the cloth state that are consistent for different shapes, sizes of positions of the cloth, like the corners and the fold locations. We then abstract these important features from the dGLI disk onto a circle, calling it the Cloth StatE representation (CloSE). This representation is compact, continuous, and general for different shapes. Finally, we show the strengths of this representation in two relevant applications: semantic labeling and high- and low-level planning. The code, the dataset and the video can be accessed from : https://jaykamat99.github.io/close-representation
Gaussian Process Motion Planning (GPMP) is a widely used framework for generating smooth trajectories within a limited compute time--an essential requirement in many robotic applications. However, traditional GPMP approaches often struggle with enforcing hard nonlinear constraints and rely on Maximum a Posteriori (MAP) solutions that disregard the full Bayesian posterior. This limits planning diversity and ultimately hampers decision-making. Recent efforts to integrate Stein Variational Gradient Descent (SVGD) into motion planning have shown promise in handling complex constraints. Nonetheless, these methods still face persistent challenges, such as difficulties in strictly enforcing constraints and inefficiencies when the probabilistic inference problem is poorly conditioned. To address these issues, we propose a novel constrained Stein Variational Gaussian Process Motion Planning (cSGPMP) framework, incorporating a GPMP prior specifically designed for trajectory optimization under hard constraints. Our approach improves the efficiency of particle-based inference while explicitly handling nonlinear constraints. This advancement significantly broadens the applicability of GPMP to motion planning scenarios demanding robust Bayesian inference, strict constraint adherence, and computational efficiency within a limited time. We validate our method on standard benchmarks, achieving an average success rate of 98.57% across 350 planning tasks, significantly outperforming competitive baselines. This demonstrates the ability of our method to discover and use diverse trajectory modes, enhancing flexibility and adaptability in complex environments, and delivering significant improvements over standard baselines without incurring major computational costs.