AI co-scientists are emerging as a tool to assist human researchers in achieving their research goals. A crucial feature of these AI co-scientists is the ability to generate a research plan given a set of aims and constraints. The plan may be used by researchers for brainstorming, or may even be implemented after further refinement. However, language models currently struggle to generate research plans that follow all constraints and implicit requirements. In this work, we study how to leverage the vast corpus of existing research papers to train language models that generate better research plans. We build a scalable, diverse training corpus by automatically extracting research goals and goal-specific grading rubrics from papers across several domains. We then train models for research plan generation via reinforcement learning with self-grading. A frozen copy of the initial policy acts as the grader during training, with the rubrics creating a generator-verifier gap that enables improvements without external human supervision. To validate this approach, we conduct a study with human experts for machine learning research goals, spanning 225 hours. The experts prefer plans generated by our finetuned Qwen3-30B-A3B model over the initial model for 70% of research goals, and approve 84% of the automatically extracted goal-specific grading rubrics. To assess generality, we also extend our approach to research goals from medical papers, and new arXiv preprints, evaluating with a jury of frontier models. Our finetuning yields 12-22% relative improvements and significant cross-domain generalization, proving effective even in problem settings like medical research where execution feedback is infeasible. Together, these findings demonstrate the potential of a scalable, automated training recipe as a step towards improving general AI co-scientists.
OpenPBR is a physically based, standardized uber-shader developed for interoperable material authoring and rendering across VFX, animation, and design visualization workflows. This document serves as a companion to the official specification, offering deeper insight into the model's development and more detailed implementation guidance, including code examples and mathematical derivations. We begin with a description of the model's formal structure and theoretical foundations - covering slab-based layering, statistical mixing, and microfacet theory - before turning to its physical components. These include metallic, dielectric, subsurface, and glossy-diffuse base substrates, followed by thin-film iridescence, coat, and fuzz layers. A special-case mode for rendering thin-walled objects is also described. Additional sections explore technical topics in greater depth, such as the decoupling of specular reflectivity from transmission, the choice of parameterization for subsurface scattering, and the detailed physics of coat darkening and thin-film interference. We also discuss planned extensions, including hazy specular reflection and retroreflection.
Path planning is a fundamental component in autonomous mobile robotics, enabling a robot to navigate from its current location to a desired goal while avoiding obstacles. Among the various techniques, Artificial Potential Field (APF) methods have gained popularity due to their simplicity, real-time responsiveness, and low computational requirements. However, a major limitation of conventional APF approaches is the local minima trap problem, where the robot becomes stuck in a position with no clear direction toward the goal. This paper proposes a novel path planning technique, termed the Bulldozer, which addresses the local minima issue while preserving the inherent advantages of APF. The Bulldozer technique introduces a backfilling mechanism that systematically identifies and eliminates local minima regions by increasing their potential values, analogous to a bulldozer filling potholes in a road. Additionally, a ramp-based enhancement is incorporated to assist the robot in escaping trap areas when starting within a local minimum. The proposed technique is experimentally validated using a physical mobile robot across various maps with increasing complexity. Comparative analyses are conducted against standard APF, adaptive APF, and well-established planning algorithms such as A*, PRM, and RRT. Results demonstrate that the Bulldozer technique effectively resolves the local minima problem while achieving superior execution speed and competitive path quality. Furthermore, a kinematic tracking controller is employed to assess the smoothness and traceability of the planned paths, confirming their suitability for real-world execution.
The primary research questions of this paper center on defining the amount of context that is necessary and/or appropriate when investigating the relationship between language model probabilities and cognitive phenomena. We investigate whether whole utterances are necessary to observe probabilistic reduction and demonstrate that n-gram representations suffice as cognitive units of planning.
Omnimodal large language models have made significant strides in unifying audio and visual modalities; however, they often lack the fine-grained cross-modal understanding and have difficulty with multimodal alignment. To address these limitations, we introduce OmniAgent, a fully audio-guided active perception agent that dynamically orchestrates specialized tools to achieve more fine-grained audio-visual reasoning. Unlike previous works that rely on rigid, static workflows and dense frame-captioning, this paper demonstrates a paradigm shift from passive response generation to active multimodal inquiry. OmniAgent employs dynamic planning to autonomously orchestrate tool invocation on demand, strategically concentrating perceptual attention on task-relevant cues. Central to our approach is a novel coarse-to-fine audio-guided perception paradigm, which leverages audio cues to localize temporal events and guide subsequent reasoning. Extensive empirical evaluations on three audio-video understanding benchmarks demonstrate that OmniAgent achieves state-of-the-art performance, surpassing leading open-source and proprietary models by substantial margins of 10% - 20% accuracy.
Spatio-temporal alignment is crucial for temporal modeling of end-to-end (E2E) perception in autonomous driving (AD), providing valuable structural and textural prior information. Existing methods typically rely on the attention mechanism to align objects across frames, simplifying the motion model with a unified explicit physical model (constant velocity, etc.). These approaches prefer semantic features for implicit alignment, challenging the importance of explicit motion modeling in the traditional perception paradigm. However, variations in motion states and object features across categories and frames render this alignment suboptimal. To address this, we propose HAT, a spatio-temporal alignment module that allows each object to adaptively decode the optimal alignment proposal from multiple hypotheses without direct supervision. Specifically, HAT first utilizes multiple explicit motion models to generate spatial anchors and motion-aware feature proposals for historical instances. It then performs multi-hypothesis decoding by incorporating semantic and motion cues embedded in cached object queries, ultimately providing the optimal alignment proposal for the target frame. On nuScenes, HAT consistently improves 3D temporal detectors and trackers across diverse baselines. It achieves state-of-the-art tracking results with 46.0% AMOTA on the test set when paired with the DETR3D detector. In an object-centric E2E AD method, HAT enhances perception accuracy (+1.3% mAP, +3.1% AMOTA) and reduces the collision rate by 32%. When semantics are corrupted (nuScenes-C), the enhancement of motion modeling by HAT enables more robust perception and planning in the E2E AD.
Specifying robotic manipulation tasks in a manner that is both expressive and precise remains a central challenge. While visual goals provide a compact and unambiguous task specification, existing goal-conditioned policies often struggle with long-horizon manipulation due to their reliance on single-step action prediction without explicit modeling of task progress. We propose Act2Goal, a general goal-conditioned manipulation policy that integrates a goal-conditioned visual world model with multi-scale temporal control. Given a current observation and a target visual goal, the world model generates a plausible sequence of intermediate visual states that captures long-horizon structure. To translate this visual plan into robust execution, we introduce Multi-Scale Temporal Hashing (MSTH), which decomposes the imagined trajectory into dense proximal frames for fine-grained closed-loop control and sparse distal frames that anchor global task consistency. The policy couples these representations with motor control through end-to-end cross-attention, enabling coherent long-horizon behavior while remaining reactive to local disturbances. Act2Goal achieves strong zero-shot generalization to novel objects, spatial layouts, and environments. We further enable reward-free online adaptation through hindsight goal relabeling with LoRA-based finetuning, allowing rapid autonomous improvement without external supervision. Real-robot experiments demonstrate that Act2Goal improves success rates from 30% to 90% on challenging out-of-distribution tasks within minutes of autonomous interaction, validating that goal-conditioned world models with multi-scale temporal control provide structured guidance necessary for robust long-horizon manipulation. Project page: https://act2goal.github.io/
[Context and Motivation] Global energy consumption has been steadily increasing in recent years, with data centers emerging as major contributors. This growth is largely driven by the widespread migration of applications to the Cloud, alongside a rising number of users consuming digital content. Dynamic adaptation (or self-adaptive) approaches appear as a way to reduce, at runtime and under certain constraints, the energy consumption of software applications. [Question/Problem] Despite efforts to make energy-efficiency a primary goal in the dynamic adaptation of software applications, there is still a gap in understanding how to equip these self-adaptive software systems (SAS), which are dynamically adapted at runtime, with effective energy consumption monitoring tools that enable energy-awareness. Furthermore, the extent to which such an energy consumption monitoring tool impacts the overall energy consumption of the SAS ecosystem has not yet been thoroughly explored. [Methodology] To address this gap, we introduce the EnCoMSAS (Energy Consumption Monitoring for Self-Adaptive Systems) tool that allows to gather the energy consumed by distributed software applications deployed, for instance, in the Cloud. EnCoMSAS enables the evaluation of energy consumption of SAS variants at runtime. It allows to integrate energy-efficiency as a main goal in the analysis and execution of new adaptation plans for the SAS. In order to evaluate the effectiveness of EnCoMSAS and investigate its impact on the overall energy consumption of the SAS ecosystem, we conduct an empirical study by using the Adaptable TeaStore case study. Adaptable TeaStore is a self-adaptive extension of the TeaStore application, a microservice benchmarking application. For this study, we focus on the recommender service of Adaptable TeaStore. Regarding the experiments, we first equip Adaptable TeaStore with EnCoMSAS. Next, we execute Adaptable TeaStore by varying workload conditions that simulate users interactions. Finally, we use EnCoMSAS for gathering and assessing the energy consumption of the recommender algorithms of Adaptable TeaStore. To run these experiments, we use nodes of the Grid5000 testbed. [Results] The results show that EnCoMSAS is effective in collecting energy consumption of software applications for enabling dynamic adaptation at runtime. The observed correlation between CPU usage and energy consumption collected by EnCoMSAS provides evidence supporting the validity of the collected energy measurements. Moreover, we point out, through EnCoMSAS, that energy consumption is influenced not only by the algorithmic complexity but also by the characteristics of the deployment environment. Finally, the results show that the impact of EnCoMSAS on the overall energy consumption of the SAS ecosystem is comparatively modest with respect to the entire set of the TeaStore applications microservices.
The Adaptable TeaStore specification provides a microservice-based case study for implementing self-adaptation through a control loop. We argue that implementations of this specification should be informed by key properties of self-adaptation: system-wide consistency (coordinated adaptations across replicas), planning (executing an adaptation until appropriate conditions are met), and modularity (clean integration of adaptation logic). In this implementation discussion paper, we examine how software architectural methods, the cloud-native Operator pattern, and legacy programming language techniques can decouple self-adaptive control logic from the TeaStore application. We analyze the trade-offs that these different approaches make between fine-grained expressive adaptation and system-wide control, and highlight when reuse of adaptation strategies is most effective. Our analysis suggests that these approaches are not mutually exclusive but can be combined into a multi-tiered architecture for self-adaptive microservices.
We study the family of commutative subspaces in the trigonometric holonomy Lie algebra $t^{\mathrm{trig}}_Φ$, introduced by Toledano Laredo, for an arbitrary root system $Φ$. We call these subspaces \emph{Bethe subspaces} because they can be regarded as quadratic components of \emph{Bethe subalgebras} in the Yangian corresponding to the root system $Φ$, that are responsible for integrals of the generalized XXX Heisenberg spin chain. Bethe subspaces are naturally parametrized by the complement of the corresponding toric arrangement . We prove that this family extends regularly to the minimal wonderful model $X_Φ$ of the toric arrangement described by De Concini and Gaiffi, thus giving a compactification of the parameter space for Bethe subspaces. For classical types $A_n, B_n, C_n, D_n$, we show that this extension is faithful. As a special case, when $Φ$ is of type $A_n$, our construction agrees with the main result of Aguirre--Felder--Veselov on the closure of the set of quadratic Gaudin subalgebras. Our work is also closely related to, and refines in this root system setting, a parallel compactification result of J. Peters obtained for more general toric arrangements arising from quantum multiplication on hypertoric varieties. Next, we show that the Bethe subspaces assemble into a vector bundle over $X_Φ$, which we identify with the logarithmic tangent bundle of $X_Φ$. Finally, we formulate conjectures extending these results to the setting of Bethe subalgebras in Yangians and to the quantum cohomology rings of Springer resolutions. We plan to address this in our next papers.
Monitoring states of road surfaces provides valuable information for the planning and controlling vehicles and active vehicle control systems. Classical road monitoring methods are expensive and unsystematic because they require time for measurements. This article proposes an real time system based on weather conditional data and road surface condition data. For this purpose, we collected data with a mobile phone camera on the roads around the campus of the Karlsruhe Institute of Technology. We tested a large number of different image-based deep learning algorithms for road classification. In addition, we used road acceleration data along with road image data for training by using them as images. We compared the performances of acceleration-based and camera image-based approaches. The performances of the simple Alexnet, LeNet, VGG, and Resnet algorithms were compared as deep learning algorithms. For road condition classification, 5 classes were considered: asphalt, damaged asphalt, gravel road, damaged gravel road, pavement road and over 95% accuracy performance was achieved. It is also proposed to use the acceleration or the camera image to classify the road surface according to the weather and the time of day using fuzzy logic.
World models have become crucial for autonomous driving, as they learn how scenarios evolve over time to address the long-tail challenges of the real world. However, current approaches relegate world models to limited roles: they operate within ostensibly unified architectures that still keep world prediction and motion planning as decoupled processes. To bridge this gap, we propose DriveLaW, a novel paradigm that unifies video generation and motion planning. By directly injecting the latent representation from its video generator into the planner, DriveLaW ensures inherent consistency between high-fidelity future generation and reliable trajectory planning. Specifically, DriveLaW consists of two core components: DriveLaW-Video, our powerful world model that generates high-fidelity forecasting with expressive latent representations, and DriveLaW-Act, a diffusion planner that generates consistent and reliable trajectories from the latent of DriveLaW-Video, with both components optimized by a three-stage progressive training strategy. The power of our unified paradigm is demonstrated by new state-of-the-art results across both tasks. DriveLaW not only advances video prediction significantly, surpassing best-performing work by 33.3% in FID and 1.8% in FVD, but also achieves a new record on the NAVSIM planning benchmark.
Mixed-initiative visual analytics (VA) systems, where human and artificial intelligence (AI) agents collaborate as equal partners during analysis, represented a paradigm shift in human-computer interaction. With recent advances in AI, these systems have seen an increase in sophisticated software agents that have improved task planning, reasoning, and completion capabilities. However, while existing work characterizes agent interplay and communication strategies, there is a limited understanding of the overarching design principles for intelligent agents. Through a systematic review of 90 systems (and 207 unique agents), we propose a design space of intelligent agents comprising six dimensions that collectively characterize an agent's perception, environmental understanding, action capability, and communication strategies. We contribute a novel framework for researchers and designers to explore various design choices for new systems and to situate a system in the current landscape. We conclude with future research opportunities for intelligent agents in mixed-initiative VA systems.
We study the connection of two problems within the planning and verification community: Conformant planning and model-checking of hyperproperties. Conformant planning is the task of finding a sequential plan that achieves a given objective independent of non-deterministic action effects during the plan's execution. Hyperproperties are system properties that relate multiple execution traces of a system and, e.g., capture information-flow and fairness policies. In this paper, we show that model-checking of $\exists^*\forall^*$ hyperproperties is closely related to the problem of computing a conformant plan. Firstly, we show that we can efficiently reduce a hyperproperty model-checking instance to a conformant planning instance, and prove that our encoding is sound and complete. Secondly, we establish the converse direction: Every conformant planning problem is, itself, a hyperproperty model-checking task.
Unmanned Aerial Vehicles (UAVs) have revolutionized inspection tasks by offering a safer, more efficient, and flexible alternative to traditional methods. However, battery limitations often constrain their effectiveness, necessitating the development of optimized flight paths and data collection techniques. While existing approaches like coverage path planning (CPP) ensure comprehensive data collection, they can be inefficient, especially when inspecting multiple non connected Regions of Interest (ROIs). This paper introduces the Fast Inspection of Scattered Regions (FISR) problem and proposes a novel solution, the multi UAV Disjoint Areas Inspection (mUDAI) method. The introduced approach implements a two fold optimization procedure, for calculating the best image capturing positions and the most efficient UAV trajectories, balancing data resolution and operational time, minimizing redundant data collection and resource consumption. The mUDAI method is designed to enable rapid, efficient inspections of scattered ROIs, making it ideal for applications such as security infrastructure assessments, agricultural inspections, and emergency site evaluations. A combination of simulated evaluations and real world deployments is used to validate and quantify the method's ability to improve operational efficiency while preserving high quality data capture, demonstrating its effectiveness in real world operations. An open source Python implementation of the mUDAI method can be found on GitHub (https://github.com/soc12/mUDAI) and the collected and processed data from the real world experiments are all hosted on Zenodo (https://zenodo.org/records/13866483). Finally, this online platform (https://sites.google.com/view/mudai-platform/) allows interested readers to interact with the mUDAI method and generate their own multi UAV FISR missions.
Multimodal fusion of remote sensing images serves as a core technology for overcoming the limitations of single-source data and improving the accuracy of surface information extraction, which exhibits significant application value in fields such as environmental monitoring and urban planning. To address the deficiencies of existing methods, including the failure of fixed resolutions to balance efficiency and detail, as well as the lack of semantic hierarchy in single-scale alignment, this study proposes a Vision-language Model (VLM) framework integrated with two key innovations: the Dynamic Resolution Input Strategy (DRIS) and the Multi-scale Vision-language Alignment Mechanism (MS-VLAM).Specifically, the DRIS adopts a coarse-to-fine approach to adaptively allocate computational resources according to the complexity of image content, thereby preserving key fine-grained features while reducing redundant computational overhead. The MS-VLAM constructs a three-tier alignment mechanism covering object, local-region and global levels, which systematically captures cross-modal semantic consistency and alleviates issues of semantic misalignment and granularity imbalance.Experimental results on the RS-GPT4V dataset demonstrate that the proposed framework significantly improves the accuracy of semantic understanding and computational efficiency in tasks including image captioning and cross-modal retrieval. Compared with conventional methods, it achieves superior performance in evaluation metrics such as BLEU-4 and CIDEr for image captioning, as well as R@10 for cross-modal retrieval. This technical framework provides a novel approach for constructing efficient and robust multimodal remote sensing systems, laying a theoretical foundation and offering technical guidance for the engineering application of intelligent remote sensing interpretation.
Human-vehicle cooperative driving serves as a vital bridge to fully autonomous driving by improving driving flexibility and gradually building driver trust and acceptance of autonomous technology. To establish more natural and effective human-vehicle interaction, we propose a Human-Oriented Cooperative Driving (HOCD) approach that primarily minimizes human-machine conflict by prioritizing driver intention and state. In implementation, we take both tactical and operational levels into account to ensure seamless human-vehicle cooperation. At the tactical level, we design an intention-aware trajectory planning method, using intention consistency cost as the core metric to evaluate the trajectory and align it with driver intention. At the operational level, we develop a control authority allocation strategy based on reinforcement learning, optimizing the policy through a designed reward function to achieve consistency between driver state and authority allocation. The results of simulation and human-in-the-loop experiments demonstrate that our proposed approach not only aligns with driver intention in trajectory planning but also ensures a reasonable authority allocation. Compared to other cooperative driving approaches, the proposed HOCD approach significantly enhances driving performance and mitigates human-machine conflict.The code is available at https://github.com/i-Qin/HOCD.
While Multimodal Large Language Models (MLLMs) have exhibited remarkable general intelligence across diverse domains, their potential in low-altitude applications dominated by Unmanned Aerial Vehicles (UAVs) remains largely underexplored. Existing MLLM benchmarks rarely cover the unique challenges of low-altitude scenarios, while UAV-related evaluations mainly focus on specific tasks such as localization or navigation, without a unified evaluation of MLLMs'general intelligence. To bridge this gap, we present MM-UAVBench, a comprehensive benchmark that systematically evaluates MLLMs across three core capability dimensions-perception, cognition, and planning-in low-altitude UAV scenarios. MM-UAVBench comprises 19 sub-tasks with over 5.7K manually annotated questions, all derived from real-world UAV data collected from public datasets. Extensive experiments on 16 open-source and proprietary MLLMs reveal that current models struggle to adapt to the complex visual and cognitive demands of low-altitude scenarios. Our analyses further uncover critical bottlenecks such as spatial bias and multi-view understanding that hinder the effective deployment of MLLMs in UAV scenarios. We hope MM-UAVBench will foster future research on robust and reliable MLLMs for real-world UAV intelligence.
This paper presents a feasibility study, including simulations and prototype tests, on the autonomous operation of a multi-limbed intra-vehicular robot (mobile manipulator), shortly MLIVR, designed to assist astronauts with logistical tasks on the International Space Station (ISS). Astronauts spend significant time on tasks such as preparation, close-out, and the collection and transportation of goods, reducing the time available for critical mission activities. Our study explores the potential for a mobile manipulator to support these operations, emphasizing the need for autonomous functionality to minimize crew and ground operator effort while enabling real-time task execution. We focused on the robot's transportation capabilities, simulating its motion planning in 3D space. The actual motion execution was tested with a prototype on a 2D table to mimic a microgravity environment. The results demonstrate the feasibility of performing these tasks with minimal human intervention, offering a promising solution to enhance operational efficiency on the ISS.
Linear covariance (LinCov) techniques have gained widespread traction in the modeling of uncertainty, including in the preliminary study of spacecraft navigation performance. While LinCov methods offer improved computational efficiency compared to Monte Carlo based uncertainty analysis, they inherently rely on linearization approximations. Understanding the fidelity of these approximations and identifying when they are deficient is critically important for spacecraft navigation and mission planning, especially when dealing with highly nonlinear systems and large state uncertainties. This work presents a number of computational techniques for assessing linear covariance performance. These new LinCov fidelity measures are formulated using higher-order statistics, constrained optimization, and the unscented transform.
Robots are typically described in software by specification files (e.g., URDF, SDF, MJCF, USD) that encode only basic kinematic, dynamic, and geometric information. As a result, downstream applications such as simulation, planning, and control must repeatedly re-derive richer data, leading to redundant computations, fragmented implementations, and limited standardization. In this work, we introduce the Universal Robot Description Directory (URDD), a modular representation that organizes derived robot information into structured, easy-to-parse JSON and YAML modules. Our open-source toolkit automatically generates URDDs from URDFs, with a Rust implementation supporting Bevy-based visualization. Additionally, we provide a JavaScript/Three.js viewer for web-based inspection of URDDs. Experiments on multiple robot platforms show that URDDs can be generated efficiently, encapsulate substantially richer information than standard specification files, and directly enable the construction of core robotics subroutines. URDD provides a unified, extensible resource for reducing redundancy and establishing shared standards across robotics frameworks. We conclude with a discussion on the limitations and implications of our work.
We present PathoSyn, a unified generative framework for Magnetic Resonance Imaging (MRI) image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable anatomical manifold. Current generative models typically operate in the global pixel domain or rely on binary masks, these paradigms often suffer from feature entanglement, leading to corrupted anatomical substrates or structural discontinuities. PathoSyn addresses these limitations by decomposing the synthesis task into deterministic anatomical reconstruction and stochastic deviation modeling. Central to our framework is a Deviation-Space Diffusion Model designed to learn the conditional distribution of pathological residuals, thereby capturing localized intensity variations while preserving global structural integrity by construction. To ensure spatial coherence, the diffusion process is coupled with a seam-aware fusion strategy and an inference-time stabilization module, which collectively suppress boundary artifacts and produce high-fidelity internal lesion heterogeneity. PathoSyn provides a mathematically principled pipeline for generating high-fidelity patient-specific synthetic datasets, facilitating the development of robust diagnostic algorithms in low-data regimes. By allowing interpretable counterfactual disease progression modeling, the framework supports precision intervention planning and provides a controlled environment for benchmarking clinical decision-support systems. Quantitative and qualitative evaluations on tumor imaging benchmarks demonstrate that PathoSyn significantly outperforms holistic diffusion and mask-conditioned baselines in both perceptual realism and anatomical fidelity. The source code of this work will be made publicly available.
The diffusion of artificial intelligence, particularly generative models, is expected to transform labor markets in uneven ways across sectors, territories, and social groups. This paper proposes a methodological framework to estimate the potential exposure of employment to AI using sector based data, addressing the limitations of occupation centered approaches in the Spanish context. By constructing an AI CNAE incidence matrix and applying it to provincial employment data for the period 2021 to 2023, we provide a territorial and gender disaggregated assessment of AI exposure across Spain. The results reveal stable structural patterns, with higher exposure in metropolitan and service oriented regions and a consistent gender gap, as female employment exhibits higher exposure in all territories. Rather than predicting job displacement, the framework offers a structural perspective on where AI is most likely to reshape work and skill demands, supporting evidence based policy and strategic planning.
Traditional flight computers -- including mechanical "whiz-wheels" (e.g. E6B, CRP series) and electronic flight calculators (e.g. ASA CX-3, Sportys E6-B) -- have long played a central role in flight planning and training within general aviation (GA). While these tools remain pedagogically valuable, their fixed form factors, constrained interaction models, and limited extensibility are increasingly misaligned with the expectations and workflows of pilots operating in modern digital environments. This paper presents E6BJA (Jamies Flight Computer), a fully featured, multi-platform, software-based flight computer designed natively for Apple iOS, Android, and Microsoft Windows devices, with a complementary web-based implementation. E6BJA reproduces the core calculations of traditional flight computers while extending them through enhanced modelling capabilities such as the 1976 International Standard Atmosphere, carburettor icing risk estimation, and aircraft-specific weight and balance calculators. Each calculator is accompanied by embedded educational monographs that explain underlying assumptions, variables, and equations. We compare E6BJA with mechanical and electronic flight computers across functional, cognitive, and technical dimensions, demonstrating improvements in accuracy, error reduction, discoverability, and educational value. We also discuss design trade-offs associated with native multi-platform development and examine how contemporary mobile computing environments can support safer and more intuitive pre-flight planning for pilots, trainees, instructors, and flight planning personnel. By combining the conceptual rigour of traditional flight planning methods with modern human-computer interaction design, E6BJA represents a meaningful evolution in pilot-facing flight tools, supporting both computation and instruction in aviation training contexts.
Direct dark matter detection experiments search for rare signals induced by hypothetical, galactic dark matter particles in low-background detectors operated deep underground. I will briefly review the direct detection principles, the expected signals and backgrounds, and the main experimental techniques. I will then discuss the status of ongoing experiments aiming to discover new particles in the keV - TeV mass range, as well as future detectors and their sensitivity.
This report provides an architecture-led analysis of two modern vision-language models (VLMs), Qwen2.5-VL-7B-Instruct and Llama-4-Scout-17B-16E-Instruct, and explains how their architectural properties map to a practical video-to-artifact pipeline implemented in the BodyLanguageDetection repository [1]. The system samples video frames, prompts a VLM to detect visible people and generate pixel-space bounding boxes with prompt-conditioned attributes (emotion by default), validates output structure using a predefined schema, and optionally renders an annotated video. We first summarize the shared multimodal foundation (visual tokenization, Transformer attention, and instruction following), then describe each architecture at a level sufficient to justify engineering choices without speculative internals. Finally, we connect model behavior to system constraints: structured outputs can be syntactically valid while semantically incorrect, schema validation is structural (not geometric correctness), person identifiers are frame-local in the current prompting contract, and interactive single-frame analysis returns free-form text rather than schema-enforced JSON. These distinctions are critical for writing defensible claims, designing robust interfaces, and planning evaluation.
Recent advances in vision, language, and multimodal learning have substantially accelerated progress in robotic foundation models, with robot manipulation remaining a central and challenging problem. This survey examines robot manipulation from an algorithmic perspective and organizes recent learning-based approaches within a unified abstraction of high-level planning and low-level control. At the high level, we extend the classical notion of task planning to include reasoning over language, code, motion, affordances, and 3D representations, emphasizing their role in structured and long-horizon decision making. At the low level, we propose a training-paradigm-oriented taxonomy for learning-based control, organizing existing methods along input modeling, latent representation learning, and policy learning. Finally, we identify open challenges and prospective research directions related to scalability, data efficiency, multimodal physical interaction, and safety. Together, these analyses aim to clarify the design space of modern foundation models for robotic manipulation.
The Alternating Current Optimal Power Flow (ACOPF) problem remains one of the most fundamental yet computationally challenging tasks in power systems operation and planning due to its nonconvex, nonlinear, and multimodal nature. This paper proposes a convex reformulation of the AC power flow problem by introducing auxiliary variables to isolate nonlinear terms, applying logarithmic transformations to exploit product-sum properties, and approximating with Bezier curves using a novel convexifying butterfly shaped function. This model is intended for assessing and operating weak power systems that face challenges with reactive power supply and overall network robustness. Its formulation closely mirrors the AC formulation, particularly regarding active and reactive power dispatch and network voltage levels. The proposed model achieves convergence on large test systems (e.g., IEEE 118 bus) in seconds and is validated against exact AC solutions. This convex formulation stands out not only for its mathematical transparency and intuitive structure but also for its ease of validation and implementation, making it an accessible and reliable tool for researchers and system operators for energy planning. The numerical analysis conducted on the IEEE 118 bus system yielded average percentage errors in the state variables specifically, the magnitudes and angles of nodal voltages of just 0.0008 percentage and 0.014 degree, respectively, when compared with the precise AC formulation. These results underscore the high accuracy and reliability of the proposed methodology.
Autonomous driving requires generating safe and reliable trajectories from complex multimodal inputs. Traditional modular pipelines separate perception, prediction, and planning, while recent end-to-end (E2E) systems learn them jointly. Vision-language models (VLMs) further enrich this paradigm by introducing cross-modal priors and commonsense reasoning, yet current VLM-based planners face three key challenges: (i) a mismatch between discrete text reasoning and continuous control, (ii) high latency from autoregressive chain-of-thought decoding, and (iii) inefficient or non-causal planners that limit real-time deployment. We propose ColaVLA, a unified vision-language-action framework that transfers reasoning from text to a unified latent space and couples it with a hierarchical, parallel trajectory decoder. The Cognitive Latent Reasoner compresses scene understanding into compact, decision-oriented meta-action embeddings through ego-adaptive selection and only two VLM forward passes. The Hierarchical Parallel Planner then generates multi-scale, causality-consistent trajectories in a single forward pass. Together, these components preserve the generalization and interpretability of VLMs while enabling efficient, accurate and safe trajectory generation. Experiments on the nuScenes benchmark show that ColaVLA achieves state-of-the-art performance in both open-loop and closed-loop settings with favorable efficiency and robustness.
The rapid spread of multimodal misinformation poses a growing challenge for automated fact-checking systems. Existing approaches, including large vision language models (LVLMs) and deep multimodal fusion methods, often fall short due to limited reasoning and shallow evidence utilization. A key bottleneck is the lack of dedicated datasets that provide complete real-world multimodal misinformation instances accompanied by annotated reasoning processes and verifiable evidence. To address this limitation, we introduce RW-Post, a high-quality and explainable dataset for real-world multimodal fact-checking. RW-Post aligns real-world multimodal claims with their original social media posts, preserving the rich contextual information in which the claims are made. In addition, the dataset includes detailed reasoning and explicitly linked evidence, which are derived from human written fact-checking articles via a large language model assisted extraction pipeline, enabling comprehensive verification and explanation. Building upon RW-Post, we propose AgentFact, an agent-based multimodal fact-checking framework designed to emulate the human verification workflow. AgentFact consists of five specialized agents that collaboratively handle key fact-checking subtasks, including strategy planning, high-quality evidence retrieval, visual analysis, reasoning, and explanation generation. These agents are orchestrated through an iterative workflow that alternates between evidence searching and task-aware evidence filtering and reasoning, facilitating strategic decision-making and systematic evidence analysis. Extensive experimental results demonstrate that the synergy between RW-Post and AgentFact substantially improves both the accuracy and interpretability of multimodal fact-checking.