We present Auspex - a threat modeling system built using a specialized collection of generative artificial intelligence-based methods that capture threat modeling tradecraft. This new approach, called tradecraft prompting, centers on encoding the on-the-ground knowledge of threat modelers within the prompts that drive a generative AI-based threat modeling system. Auspex employs tradecraft prompts in two processing stages. The first stage centers on ingesting and processing system architecture information using prompts that encode threat modeling tradecraft knowledge pertaining to system decomposition and description. The second stage centers on chaining the resulting system analysis through a collection of prompts that encode tradecraft knowledge on threat identification, classification, and mitigation. The two-stage process yields a threat matrix for a system that specifies threat scenarios, threat types, information security categorizations and potential mitigations. Auspex produces formalized threat model output in minutes, relative to the weeks or months a manual process takes. More broadly, the focus on bespoke tradecraft prompting, as opposed to fine-tuning or agent-based add-ons, makes Auspex a lightweight, flexible, modular, and extensible foundational system capable of addressing the complexity, resource, and standardization limitations of both existing manual and automated threat modeling processes. In this connection, we establish the baseline value of Auspex to threat modelers through an evaluation procedure based on feedback collected from cybersecurity subject matter experts measuring the quality and utility of threat models generated by Auspex on real banking systems. We conclude with a discussion of system performance and plans for enhancements to Auspex.
In this paper, we revisit the concept of goal commitment from early planners in the presence of current forward chaining heuristic planners. We present a compilation that extends the original planning task with commit actions that enforce the persistence of specific goals once achieved, thereby committing to them in the search sub-tree. This approach imposes a specific goal achievement order in parts of the search tree, potentially introducing dead-end states. This can reduce search effort if the goal achievement order is correct. Otherwise, the search algorithm can expand nodes in the open list where goals do not persist. Experimental results demonstrate that the reformulated tasks suit state-of-the-art agile planners, enabling them to find better
Safe autonomous exploration of unknown environments is an essential skill for mobile robots to effectively and adaptively perform environmental mapping for diverse critical tasks. Due to its simplicity, most existing exploration methods rely on the standard frontier-based exploration strategy, which directs a robot to the boundary between the known safe and the unknown unexplored spaces to acquire new information about the environment. This typically follows a recurrent persistent planning strategy, first selecting an informative frontier viewpoint, then moving the robot toward the selected viewpoint until reaching it, and repeating these steps until termination. However, exploration with persistent planning may lack adaptivity to continuously updated maps, whereas highly adaptive exploration with online planning often suffers from high computational costs and potential issues with livelocks. In this paper, as an alternative to less-adaptive persistent planning and costly online planning, we introduce a new proactive preventive replanning strategy for effective exploration using the immediately available actionable information at a viewpoint to avoid redundant, uninformative last-mile exploration motion. We also use the actionable information of a viewpoint as a systematic termination criterion for exploration. To close the gap between perception and action, we perform safe and informative path planning that minimizes the risk of collision with detected obstacles and the distance to unexplored regions, and we apply action-aware viewpoint selection with maximal information utility per total navigation cost. We demonstrate the effectiveness of our action-aware proactive exploration method in numerical simulations and hardware experiments.
Image-guided surgery demands adaptive, real-time decision support, yet static AI models struggle with structured task planning and providing interactive guidance. Large vision-language models (VLMs) offer a promising solution by enabling dynamic task planning and predictive decision support. We introduce SurgicalVLM-Agent, an AI co-pilot for image-guided pituitary surgery, capable of conversation, planning, and task execution. The agent dynamically processes surgeon queries and plans the tasks such as MRI tumor segmentation, endoscope anatomy segmentation, overlaying preoperative imaging with intraoperative views, instrument tracking, and surgical visual question answering (VQA). To enable structured task planning, we develop the PitAgent dataset, a surgical context-aware dataset covering segmentation, overlaying, instrument localization, tool tracking, tool-tissue interactions, phase identification, and surgical activity recognition. Additionally, we propose FFT-GaLore, a fast Fourier transform (FFT)-based gradient projection technique for efficient low-rank adaptation, optimizing fine-tuning for LLaMA 3.2 in surgical environments. We validate SurgicalVLM-Agent by assessing task planning and prompt generation on our PitAgent dataset and evaluating zero-shot VQA using a public pituitary dataset. Results demonstrate state-of-the-art performance in task planning and query interpretation, with highly semantically meaningful VQA responses, advancing AI-driven surgical assistance.
In this paper we address the speed planning problem for a vehicle along a predefined path. A weighted average of two (conflicting) terms, energy consumption and travel time, is minimized. After deriving a non-convex mathematical model of the problem, we introduce a convex relaxation of the model and show that, after the application of a suitable feasibility-based bound tightening procedure, the convex relaxation shares the same optimal value and solution of the non-convex problem. We also establish that the feasible region of the non-convex problem is a lattice and, through that, a necessary and sufficient condition for the non-emptiness of the feasible region.
Image restoration (IR) is challenging due to the complexity of real-world degradations. While many specialized and all-in-one IR models have been developed, they fail to effectively handle complex, mixed degradations. Recent agentic methods RestoreAgent and AgenticIR leverage intelligent, autonomous workflows to alleviate this issue, yet they suffer from suboptimal results and inefficiency due to their resource-intensive finetunings, and ineffective searches and tool execution trials for satisfactory outputs. In this paper, we propose MAIR, a novel Multi-Agent approach for complex IR problems. We introduce a real-world degradation prior, categorizing degradations into three types: (1) scene, (2) imaging, and (3) compression, which are observed to occur sequentially in real world, and reverse them in the opposite order. Built upon this three-stage restoration framework, MAIR emulates a team of collaborative human specialists, including a "scheduler" for overall planning and multiple "experts" dedicated to specific degradations. This design minimizes search space and trial efforts, improving image quality while reducing inference costs. In addition, a registry mechanism is introduced to enable easy integration of new tools. Experiments on both synthetic and real-world datasets show that proposed MAIR achieves competitive performance and improved efficiency over the previous agentic IR system. Code and models will be made available.
As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and pipeline-have been successfully implemented for popular neural networks on main-stream hardware, optimizing the distributed deployment schedule requires extensive expertise and manual effort. Further more, while existing frameworks with most simple chain-like structures, they struggle with complex non-linear architectures. Mixture-of-experts and multi-modal models feature intricate MIMO and branch-rich topologies that require fine-grained operator-level parallelization beyond the capabilities of existing frameworks. We propose formulating parallelism planning as a scheduling optimization problem using mixed-integer programming. We propose a bi-level solution framework balancing optimality with computational efficiency, automatically generating effective distributed plans that capture both the heterogeneous structure of modern neural networks and the underlying hardware constraints. In experiments comparing against expert-designed strategies like DeepSeek's DualPipe, our framework achieves comparable or superior performance, reducing computational bubbles by half under the same memory constraints. The framework's versatility extends beyond throughput optimization to incorporate hardware utilization maximization, memory capacity constraints, and other considerations or potential strategies. Such capabilities position our solution as both a valuable research tool for exploring optimal parallelization strategies and a practical industrial solution for large-scale AI deployment.
Modular Aerial Robotic Systems (MARS) consist of multiple drone units that can self-reconfigure to adapt to various mission requirements and fault conditions. However, existing fault-tolerant control methods exhibit significant oscillations during docking and separation, impacting system stability. To address this issue, we propose a novel fault-tolerant control reallocation method that adapts to arbitrary number of modular robots and their assembly formations. The algorithm redistributes the expected collective force and torque required for MARS to individual unit according to their moment arm relative to the center of MARS mass. Furthermore, We propose an agile trajectory planning method for MARS of arbitrary configurations, which is collision-avoiding and dynamically feasible. Our work represents the first comprehensive approach to enable fault-tolerant and collision avoidance flight for MARS. We validate our method through extensive simulations, demonstrating improved fault tolerance, enhanced trajectory tracking accuracy, and greater robustness in cluttered environments. The videos and source code of this work are available at https://github.com/RuiHuangNUS/MARS-FTCC/
Computer agents powered by vision-language models (VLMs) have significantly advanced human-computer interaction, enabling users to perform complex tasks through natural language instructions. However, these agents are vulnerable to context deception attacks, an emerging threat where adversaries embed misleading content into the agent's operational environment, such as a pop-up window containing deceptive instructions. Existing defenses, such as instructing agents to ignore deceptive elements, have proven largely ineffective. As the first systematic study on protecting computer agents, we introduce textbf{in-context defense}, leveraging in-context learning and chain-of-thought (CoT) reasoning to counter such attacks. Our approach involves augmenting the agent's context with a small set of carefully curated exemplars containing both malicious environments and corresponding defensive responses. These exemplars guide the agent to first perform explicit defensive reasoning before action planning, reducing susceptibility to deceptive attacks. Experiments demonstrate the effectiveness of our method, reducing attack success rates by 91.2% on pop-up window attacks, 74.6% on average on environment injection attacks, while achieving 100% successful defenses against distracting advertisements. Our findings highlight that (1) defensive reasoning must precede action planning for optimal performance, and (2) a minimal number of exemplars (fewer than three) is sufficient to induce an agent's defensive behavior.
We attempt to take a comprehensive look at the challenges of representing the spatio-temporal structures and dynamic processes defining a city's overall characteristics. For the task of urban planning and urban operation, we take the stance that even if the necessary representations of these structures and processes can be achieved, the most important representation of the relevant mindsets of the citizens are, unfortunately, mostly neglected. After a review of major "traditional" urban models of structures behind urban scale, form, and dynamics, we turn to major recent modeling approaches triggered by recent advances in AI that enable multi-modal generative models. Some of these models can create representations of geometries, networks and images, and reason flexibly at a human-compatible semantic level. They provide huge amounts of knowledge extracted from Terabytes of text and image documents and cover the required rich representation spectrum including geographic knowledge by different knowledge sources, degrees of granularity and scales. We then discuss what these new opportunities mean for the modeling challenges posed by cities, in particular with regard to the role and impact of citizens and their interactions within the city infrastructure. We propose to integrate these possibilities with existing approaches, such as agent-based models, which opens up new modeling spaces including rich citizen models which are able to also represent social interactions. Finally, we put forward some thoughts about a vision of a "social AI in a city ecosystem" that adds relevant citizen models to state-of-the-art structural and process models. This extended city representation will enable urban planners to establish citizen-oriented planning of city infrastructures for human culture, city resilience and sustainability.
Advanced end-to-end autonomous driving systems predict other vehicles' motions and plan ego vehicle's trajectory. The world model that can foresee the outcome of the trajectory has been used to evaluate the end-to-end autonomous driving system. However, existing world models predominantly emphasize the trajectory of the ego vehicle and leave other vehicles uncontrollable. This limitation hinders their ability to realistically simulate the interaction between the ego vehicle and the driving scenario. In addition, it remains a challenge to match multiple trajectories with each vehicle in the video to control the video generation. To address above issues, a driving \textbf{W}orld \textbf{M}odel named EOT-WM is proposed in this paper, unifying \textbf{E}go-\textbf{O}ther vehicle \textbf{T}rajectories in videos. Specifically, we first project ego and other vehicle trajectories in the BEV space into the image coordinate to match each trajectory with its corresponding vehicle in the video. Then, trajectory videos are encoded by the Spatial-Temporal Variational Auto Encoder to align with driving video latents spatially and temporally in the unified visual space. A trajectory-injected diffusion Transformer is further designed to denoise the noisy video latents for video generation with the guidance of ego-other vehicle trajectories. In addition, we propose a metric based on control latent similarity to evaluate the controllability of trajectories. Extensive experiments are conducted on the nuScenes dataset, and the proposed model outperforms the state-of-the-art method by 30\% in FID and 55\% in FVD. The model can also predict unseen driving scenes with self-produced trajectories.
Long-term planning for robots operating in domestic environments poses unique challenges due to the interactions between humans, objects, and spaces. Recent advancements in trajectory planning have leveraged vision-language models (VLMs) to extract contextual information for robots operating in real-world environments. While these methods achieve satisfying performance, they do not explicitly model human activities. Such activities influence surrounding objects and reshape spatial constraints. This paper presents a novel approach to trajectory planning that integrates human preferences, activities, and spatial context through an enriched 3D scene graph (3DSG) representation. By incorporating activity-based relationships, our method captures the spatial impact of human actions, leading to more context-sensitive trajectory adaptation. Preliminary results demonstrate that our approach effectively assigns costs to spaces influenced by human activities, ensuring that the robot trajectory remains contextually appropriate and sensitive to the ongoing environment. This balance between task efficiency and social appropriateness enhances context-aware human-robot interactions in domestic settings. Future work includes implementing a full planning pipeline and conducting user studies to evaluate trajectory acceptability.
In future high-energy physics experiments, the electromagnetic calorimeter (ECAL) will operate in exceptionally high-luminosity. An ECAL featuring layered readout in the longitudinal direction and precise time-stamped information offers a multi-dimensional view, enriching our comprehension of the showering process of electromagnetic particles in high-luminosity environments. And it is taken as the baseline design for several new experiments, including the planned upgrades of the current running experiments. Reconstructing and matching the multi-dimensional information across different layers poses new challenges in utilizing layered data effectively. This work introduces a novel layered reconstruction framework for the ECAL with a layered readout information structure and develops the layered clustering algorithm. It expands the concept of clusters from planes to multiple layers. Additionally, this work presents the corresponding layered cluster correction methods, investigates the transverse shower profile, which is utilized for overlapping clusters splitting, and develops the layered merged $\pi^0$ reconstruction algorithm based on this framework. By incorporating energy and time information in 3-dimension, this framework provides a suitable software platform for the preliminary research of longitudinal segmented ECAL and new perspectives in physics analysis.
IoT-based devices and wearable sensors are now common in daily life, with smartwatches, smartphones, and other digital tools tracking physical activity and health data. This lifelogging process provides valuable insights into people's lives. This paper analyzes a publicly available lifelog dataset of 14 individuals to explore how exercise affects mood and, in turn, executive function. Results show that moderate physical activity significantly improves mood, reduces stress, and enhances cognitive functions like decision-making and focus. Improved mood not only boosts exercise performance but also strengthens executive function, suggesting exercise benefits both emotional and cognitive well-being. This opens the door for personalized exercise plans tailored to emotional states to optimize brain function.
MOBA (Multiplayer Online Battle Arena) games require a delicate interplay of strategic planning and real-time decision-making, particularly in professional esports, where players exhibit varying levels of skill and strategic insight. While team strategies have been widely studied, analyzing inconsistencies in professional matches remains a significant challenge. The complexity lies in defining and quantifying the difference between real-time and preferred professional strategies, as well as understanding the disparities between them. Establishing direct causal links between specific strategic decisions and game outcomes also demands a comprehensive analysis of the entire match progression. To tackle these challenges, we present the StratIncon Detector, a visual analytics system designed to assist professional players and coaches in efficiently identifying strategic inconsistencies. The system detects real-time strategies, predicts preferred professional strategies, extracts relevant human factors, and uncovers their impact on subsequent game phases. Findings from a case study, a user study with 24 participants, and expert interviews suggest that, compared to traditional methods, the StratIncon Detector enables users to more comprehensively and efficiently identify inconsistencies, infer their causes, evaluate their effects on subsequent game outcomes, and gain deeper insights into team collaboration-ultimately enhancing future teamwork.
This paper introduces and tests a framework integrating traffic regulation compliance into automated driving systems (ADS). The framework enables ADS to follow traffic laws and make informed decisions based on the driving environment. Using RGB camera inputs and a vision-language model (VLM), the system generates descriptive text to support a regulation-aware decision-making process, ensuring legal and safe driving practices. This information is combined with a machine-readable ADS regulation database to guide future driving plans within legal constraints. Key features include: 1) a regulation database supporting ADS decision-making, 2) an automated process using sensor input for regulation-aware path planning, and 3) validation in both simulated and real-world environments. Particularly, the real-world vehicle tests not only assess the framework's performance but also evaluate the potential and challenges of VLMs to solve complex driving problems by integrating detection, reasoning, and planning. This work enhances the legality, safety, and public trust in ADS, representing a significant step forward in the field.
The integration of Agentic AI into scientific discovery marks a new frontier in research automation. These AI systems, capable of reasoning, planning, and autonomous decision-making, are transforming how scientists perform literature review, generate hypotheses, conduct experiments, and analyze results. This survey provides a comprehensive overview of Agentic AI for scientific discovery, categorizing existing systems and tools, and highlighting recent progress across fields such as chemistry, biology, and materials science. We discuss key evaluation metrics, implementation frameworks, and commonly used datasets to offer a detailed understanding of the current state of the field. Finally, we address critical challenges, such as literature review automation, system reliability, and ethical concerns, while outlining future research directions that emphasize human-AI collaboration and enhanced system calibration.
It is often difficult to write code that you can ensure will be executed in the right order when programing for parallel compute tasks. Due to the way that today's parallel compute hardware, primarily Graphical Processing Units (GPUs), allows you to write code. It is easy to write code that may result in one thread reading or modifying data before it should, thus resulting in a data race. It would be useful to have a tool that could verify that the code will execute as expected. However, most static analysis done at the language level has to be completely retooled to work on a different languages. Therefore, it would be of great use to be able to perform verification and analysis on the Memory Model of a parallel compute code, in a lower level intermediary representations that most languages pass through on their way to something that the GPU hardware can understand. This body of work aims to deal with the question of if there is still enough of the information in the intermediary representations to be able to perform memory model verification to check for data races. To determine this we plan to analyze as a case study the GeSpMM Sparse Matrix Multiplication Algorithm, implemented in CUDA C++ with the LLVM compiler and Julia with CUDA.jl.
Mutual adaptation can significantly enhance overall task performance in human-robot co-transportation by integrating both the robot's and human's understanding of the environment. While human modeling helps capture humans' subjective preferences, two challenges persist: (i) the uncertainty of human preference parameters and (ii) the need to balance adaptation strategies that benefit both humans and robots. In this paper, we propose a unified framework to address these challenges and improve task performance through mutual adaptation. First, instead of relying on fixed parameters, we model a probability distribution of human choices by incorporating a range of uncertain human parameters. Next, we introduce a time-varying stubbornness measure and a coordination mode transition model, which allows either the robot to lead the team's trajectory or, if a human's preferred path conflicts with the robot's plan and their stubbornness exceeds a threshold, the robot to transition to following the human. Finally, we introduce a pose optimization strategy to mitigate the uncertain human behaviors when they are leading. To validate the framework, we design and perform experiments with real human feedback. We then demonstrate, through simulations, the effectiveness of our models in enhancing task performance with mutual adaptation and pose optimization.
To navigate crowds without collisions, robots must interact with humans by forecasting their future motion and reacting accordingly. While learning-based prediction models have shown success in generating likely human trajectory predictions, integrating these stochastic models into a robot controller presents several challenges. The controller needs to account for interactive coupling between planned robot motion and human predictions while ensuring both predictions and robot actions are safe (i.e. collision-free). To address these challenges, we present a receding horizon crowd navigation method for single-robot multi-human environments. We first propose a diffusion model to generate joint trajectory predictions for all humans in the scene. We then incorporate these multi-modal predictions into a SICNav Bilevel MPC problem that simultaneously solves for a robot plan (upper-level) and acts as a safety filter to refine the predictions for non-collision (lower-level). Combining planning and prediction refinement into one bilevel problem ensures that the robot plan and human predictions are coupled. We validate the open-loop trajectory prediction performance of our diffusion model on the commonly used ETH/UCY benchmark and evaluate the closed-loop performance of our robot navigation method in simulation and extensive real-robot experiments demonstrating safe, efficient, and reactive robot motion.
We present the design of the Compact Network of Detectors with Orbital Range (CONDOR), a proposed high-altitude gamma-ray and cosmic-ray (CR) observatory set to become the highest of its kind. Planned for installation at Cerro Toco in the Atacama Desert, Chile, at 5300 meters above sea level (m.a.s.l.), CONDOR is optimized to operate in the 100 GeV to 1 TeV range using the extensive air-shower technique. The design prioritizes simplicity, modularity, and robustness to ensure reliable performance in a harsh environment. The CONDOR array has a full coverage factor of 90 and consists of 6000 plastic scintillator panels, each approximately 1 m^2, read by wavelength-shifting fibers and SiPMs. The readout electronics are based on fast ADCs, with White Rabbit technology ensuring time synchronization. We present an analysis of angular resolution and effective area by variation of the CORSIKA design to meet the developing GeV threshold, complementing other ground-based observatories in gamma-ray and proton CR measurements. CONDOR has the potential to support an extensive research program in astroparticle physics and multimessenger astronomy from the Southern Hemisphere, operating in all-sky mode 24 hours per day, year-round, with satellite data ranges.
This manuscript focuses on the environmental, social, and individual sustainability dimensions within the modern software development lifecycle, aiming to establish a holistic approach termed Sustainable DevOps (SusDevOps). Moving beyond the already well-researched economic and technical aspects, our approach to SusDevOps emphasizes the importance of minimizing environmental impacts, fostering social inclusion, and supporting individual well-being in software engineering practices. We highlight some key challenges in incorporating these dimensions, such as reducing ecological footprints, promoting workforce inclusion, and addressing the individual well-being of developers. We plan to adopt a structured approach incorporating systematic literature reviews, surveys, and interviews to deepen our understanding, identify gaps, and evolve actionable, sustainable practices within the DevOps community. Collectively, these initiatives can contribute to a more sustainable software engineering ecosystem.
Motivated by the Grothendieck construction, we study the functorialities of the comma construction for strict $\omega$-categories. To state the most general functorialities, we use the language of Gray $\omega$-categories, that is, categories enriched in the category of strict $\omega$-categories endowed with the oplax Gray tensor product. Our main result is that the comma construction of strict $\omega$-categories defines a Gray $\omega$-functor, that is, a morphism of Gray $\omega$-categories. To makes sense of this statement, we prove that slices of Gray $\omega$-categories exist. Coming back to the Grothendieck construction, we propose a definition in terms of the comma construction and, as a consequence, we get that the Grothendieck construction of strict $\omega$-categories defines a Gray $\omega$-functor. Finally, as a by-product, we get a notion of Grothendieck construction for Gray $\omega$-functors, which we plan to investigate in future work.
Salps are marine animals consisting of chains of jellyfish-like units. Their capacity for effective underwater undulatory locomotion through coordinating multi-jet propulsion has aroused significant interest in the field of robotics and inspired extensive research including design, modeling, and control. In this paper, we conduct a comprehensive analysis of the locomotion of salp-like systems using the robotic platform "LandSalp" based on geometric mechanics, including mechanism design, dynamic modeling, system identification, and motion planning and control. Our work takes a step toward a better understanding of salps' underwater locomotion and provides a clear path for extending these insights to more complex and capable underwater robotic systems. Furthermore, this study illustrates the effectiveness of geometric mechanics in bio-inspired robots for efficient data-driven locomotion modeling, demonstrated by learning the dynamics of LandSalp from only 3 minutes of experimental data. Lastly, we extend the geometric mechanics principles to multi-jet propulsion systems with stability considerations and validate the theory through experiments on the LandSalp hardware.
We propose a stochastic Model Predictive Control (MPC) framework that ensures closed-loop chance constraint satisfaction for linear systems with general sub-Gaussian process and measurement noise. By considering sub-Gaussian noise, we can provide guarantees for a large class of distributions, including time-varying distributions. Specifically, we first provide a new characterization of sub-Gaussian random vectors using matrix variance proxies, which can more accurately represent the predicted state distribution. We then derive tail bounds under linear propagation for the new characterization, enabling tractable computation of probabilistic reachable sets of linear systems. Lastly, we utilize these probabilistic reachable sets to formulate a stochastic MPC scheme that provides closed-loop guarantees for general sub-Gaussian noise. We further demonstrate our approach in simulations, including a challenging task of surgical planning from image observations.
This paper proposes a task-oriented co-design framework that integrates communication, computing, and control to address the key challenges of bandwidth limitations, noise interference, and latency in mission-critical industrial Cyber-Physical Systems (CPS). To improve communication efficiency and robustness, we design a task-oriented Joint Source-Channel Coding (JSCC) using Information Bottleneck (IB) to enhance data transmission efficiency by prioritizing task-specific information. To mitigate the perceived End-to-End (E2E) delays, we develop a Delay-Aware Trajectory-Guided Control Prediction (DTCP) strategy that integrates trajectory planning with control prediction, predicting commands based on E2E delay. Moreover, the DTCP is co-designed with task-oriented JSCC, focusing on transmitting task-specific information for timely and reliable autonomous driving. Experimental results in the CARLA simulator demonstrate that, under an E2E delay of 1 second (20 time slots), the proposed framework achieves a driving score of 48.12, which is 31.59 points higher than using Better Portable Graphics (BPG) while reducing bandwidth usage by 99.19%.
The sharing of knowledge about software architecture is crucial in software development, particularly during the onboarding of new developers. However, existing documentation often falls short due to issues like incompleteness and ambiguity. Consequently, oral explanations are used for knowledge transfer. This study investigates what constitutes a good explanation of software architecture through an empirical study. It aims to explore how software architecture explanations are conducted, identify the main challenges, and suggest improvements. It addresses five key areas: relevant architectural concerns, explanation plans, supporting artefacts, typical questions, and expectations. An exploratory field study was conducted using semi-structured interviews with 17 software professionals, including 9 architecture explainers and 8 explainees. The study discovers that an explanation must balance both problem and technical domains while considering the explainee's role, experience, and the goal of the explanation. The concept of the explanation window, which adjusts the level of detail and scope, is introduced to address these variables. We also extend the Twin Peaks model to guide the interplay between problem and solution domains during architectural explanations by adding an emphasis to the context surrounding both domains. Future research should focus on developing better tools and processes to support architecture explanations.
Cross-embodiment robotic manipulation synthesis for complicated tasks is challenging, partially due to the scarcity of paired cross-embodiment datasets and the impediment of designing intricate controllers. Inspired by robotic learning via guided human expert demonstration, we here propose a novel cross-embodiment robotic manipulation algorithm via CycleVAE and human behavior transformer. First, we utilize unsupervised CycleVAE together with a bidirectional subspace alignment algorithm to align latent motion sequences between cross-embodiments. Second, we propose a casual human behavior transformer design to learn the intrinsic motion dynamics of human expert demonstrations. During the test case, we leverage the proposed transformer for the human expert demonstration generation, which will be aligned using CycleVAE for the final human-robotic manipulation synthesis. We validated our proposed algorithm through extensive experiments using a dexterous robotic manipulator with the robotic hand. Our results successfully generate smooth trajectories across intricate tasks, outperforming prior learning-based robotic motion planning algorithms. These results have implications for performing unsupervised cross-embodiment alignment and future autonomous robotics design. Complete video demonstrations of our experiments can be found in https://sites.google.com/view/humanrobots/home.
Although end-to-end autonomous driving (E2E-AD) technologies have made significant progress in recent years, there remains an unsatisfactory performance on closed-loop evaluation. The potential of leveraging planning in query design and interaction has not yet been fully explored. In this paper, we introduce a multi-granularity planning query representation that integrates heterogeneous waypoints, including spatial, temporal, and driving-style waypoints across various sampling patterns. It provides additional supervision for trajectory prediction, enhancing precise closed-loop control for the ego vehicle. Additionally, we explicitly utilize the geometric properties of planning trajectories to effectively retrieve relevant image features based on physical locations using deformable attention. By combining these strategies, we propose a novel end-to-end autonomous driving framework, termed HiP-AD, which simultaneously performs perception, prediction, and planning within a unified decoder. HiP-AD enables comprehensive interaction by allowing planning queries to iteratively interact with perception queries in the BEV space while dynamically extracting image features from perspective views. Experiments demonstrate that HiP-AD outperforms all existing end-to-end autonomous driving methods on the closed-loop benchmark Bench2Drive and achieves competitive performance on the real-world dataset nuScenes.
Manipulation of deformable linear objects (DLOs) in constrained environments is a challenging task. This paper describes a two-layered approach for placing DLOs on a flat surface using a single robot hand. The high-level layer is a novel DLO surface placement method based on Euler's elastica solutions. During this process one DLO endpoint is manipulated by the robot gripper while a variable interior point of the DLO serves as the start point of the portion aligned with the placement surface. The low-level layer forms a pipeline controller. The controller estimates the DLO current shape using a Residual Neural Network (ResNet) and uses low-level feedback to ensure task execution in the presence of modeling and placement errors. The resulting DLO placement approach can recover from states where the high-level manipulation planner has failed as required by practical robot manipulation systems. The DLO placement approach is demonstrated with simulations and experiments that use silicon mock-up objects prepared for fresh food applications.