planning - 2025-12-08

SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models

Authors:Haowen Liu, Shaoxiong Yao, Haonan Chen, Jiawei Gao, Jiayuan Mao, Jia-Bin Huang, Yilun Du

Date:2025-12-05 18:51:03

Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from training VLMs on static internet-scale visual-language data that contain no causal interactions or action-conditioned changes. Consequently, it remains challenging to leverage VLMs for fine-grained robotic manipulation tasks that require physical understanding, reasoning, and corresponding action planning. To overcome this, we present SIMPACT, a test-time, SIMulation-enabled ACTion Planning framework that equips VLMs with physical reasoning through simulation-in-the-loop world modeling, without requiring any additional training. From a single RGB-D observation, SIMPACT efficiently constructs physics simulations, enabling the VLM to propose informed actions, observe simulated rollouts, and iteratively refine its reasoning. By integrating language reasoning with physics prediction, our simulation-enabled VLM can understand contact dynamics and action outcomes in a physically grounded way. Our method demonstrates state-of-the-art performance on five challenging, real-world rigid-body and deformable manipulation tasks that require fine-grained physical reasoning, outperforming existing general-purpose robotic manipulation models. Our results demonstrate that embedding physics understanding via efficient simulation into VLM reasoning at test time offers a promising path towards generalizable embodied intelligence. Project webpage can be found at https://simpact-bot.github.io

Removing correlated noise stripes from the Nancy Grace Roman Space Telescope survey images

Authors:Katherine Laliotis, Christopher M. Hirata, Emily Macbeth, Kaili Cao

Date:2025-12-05 18:45:27

Weak gravitational lensing has emerged as a powerful tool for investigating the matter distribution in the Universe and how it has evolved over cosmic time. The Wide Field Instrument (WFI) on the Nancy Grace Roman Space Telescope (Roman) will deliver some of the highest precision measurements of weak lensing ever made. Since weak lensing is based on statistics of faint sources, it can be biased by even tiny instrument systematics, including correlated read noise. Previous works have shown the infrared detectors used in the Roman WFI show correlations in their noise fields at a level significant for weak lensing measurements, even after application of standard reference pixel corrections; of particular concern is 1/f noise, which appears as horizontal banding in the detector frame. In this paper, we present imDestripe: a new Python module utilizing the multiple roll angles in Roman's observing strategy and linear algebra techniques to remove correlated noise stripes from observed images. We test imDestripe in a hybrid simulation by combining real noise realizations (from darks taken during ground testing) with simulated images of the astronomical scene, and find that the power spectrum of the banding can be suppressed by factors of 10--30 on large scales. We briefly discuss plans for further development of imDestripe in the context of the WFI pipeline.

Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech

Authors:Xuanru Zhou, Jiachen Lian, Henry Hong, Xinyi Yang, Gopala Anumanchipalli

Date:2025-12-05 18:19:36

Current speech-language models (SLMs) typically use a cascade of speech encoder and large language model, treating speech understanding as a single black box. They analyze the content of speech well but reason weakly about other aspects, especially under sparse supervision. Thus, we argue for explicit reasoning over speech states and actions with modular and transparent decisions. Inspired by cognitive science we adopt a modular perspective and a world model view in which the system learns forward dynamics over latent states. We factorize speech understanding into four modules that communicate through a causal graph, establishing a cognitive state search space. Guided by posterior traces from this space, an instruction-tuned language model produces a concise causal analysis and a user-facing response, enabling counterfactual interventions and interpretability under partial supervision. We present the first graph based modular speech model for explicit reasoning and we will open source the model and data to promote the development of advanced speech understanding.

World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

Authors:Zhiting Mei, Tenny Yin, Micah Baker, Ola Shorinwa, Anirudha Majumdar

Date:2025-12-05 18:06:18

Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is conditioned on text and action inputs, e.g., in instruction-guided video editing and world modeling in robotics. Despite these exceptional capabilities, controllable video models often hallucinate - generating future video frames that are misaligned with physical reality - which raises serious concerns in many tasks such as robot policy evaluation and planning. However, state-of-the-art video models lack the ability to assess and express their confidence, impeding hallucination mitigation. To rigorously address this challenge, we propose C3, an uncertainty quantification (UQ) method for training continuous-scale calibrated controllable video models for dense confidence estimation at the subpatch level, precisely localizing the uncertainty in each generated video frame. Our UQ method introduces three core innovations to empower video models to estimate their uncertainty. First, our method develops a novel framework that trains video models for correctness and calibration via strictly proper scoring rules. Second, we estimate the video model's uncertainty in latent space, avoiding training instability and prohibitive training costs associated with pixel-space approaches. Third, we map the dense latent-space uncertainty to interpretable pixel-level uncertainty in the RGB space for intuitive visualization, providing high-resolution uncertainty heatmaps that identify untrustworthy regions. Through extensive experiments on large-scale robot learning datasets (Bridge and DROID) and real-world evaluations, we demonstrate that our method not only provides calibrated uncertainty estimates within the training distribution, but also enables effective out-of-distribution detection.

NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction

Authors:Jiawen Yang, Yihui Cao, Xuanyu Tian, Yuyao Zhang, Hongjiang Wei

Date:2025-12-05 17:56:44

Orthognathic surgery is a crucial intervention for correcting dentofacial skeletal deformities to enhance occlusal functionality and facial aesthetics. Accurate postoperative facial appearance prediction remains challenging due to the complex nonlinear interactions between skeletal movements and facial soft tissue. Existing biomechanical, parametric models and deep-learning approaches either lack computational efficiency or fail to fully capture these intricate interactions. To address these limitations, we propose Neural Implicit Craniofacial Model (NICE) which employs implicit neural representations for accurate anatomical reconstruction and surgical outcome prediction. NICE comprises a shape module, which employs region-specific implicit Signed Distance Function (SDF) decoders to reconstruct the facial surface, maxilla, and mandible, and a surgery module, which employs region-specific deformation decoders. These deformation decoders are driven by a shared surgical latent code to effectively model the complex, nonlinear biomechanical response of the facial surface to skeletal movements, incorporating anatomical prior knowledge. The deformation decoders output point-wise displacement fields, enabling precise modeling of surgical outcomes. Extensive experiments demonstrate that NICE outperforms current state-of-the-art methods, notably improving prediction accuracy in critical facial regions such as lips and chin, while robustly preserving anatomical integrity. This work provides a clinically viable tool for enhanced surgical planning and patient consultation in orthognathic procedures.

Learning the Cosmic Web: Graph-based Classification of Simulated Galaxies by their Dark Matter Environments

Authors:Dakshesh Kololgi, Krishna Naidoo, Amelie Saintonge, Ofer Lahav

Date:2025-12-05 17:44:39

We present a novel graph-based machine learning classifier for identifying the dark matter cosmic web environments of galaxies. Large galaxy surveys offer comprehensive statistical views of how galaxy properties are shaped by large-scale structure, but this requires robust classifications of galaxies' cosmic web environments. Using stellar mass-selected IllustrisTNG-300 galaxies, we apply a three-stage, simulation-based framework to link galaxies to the total (mainly dark) underlying matter distribution. Here, we apply the following three steps: First, we assign the positions of simulated galaxies to a void, wall, filament, or cluster environment using the T-Web classification of the underlying matter distribution. Second, we construct a Delaunay triangulation of the galaxy distribution to summarise the local geometric structure with ten graph metrics for each galaxy. Third, we train a graph attention network (GAT) on each galaxy's graph metrics to predict its cosmic web environment. For galaxies with stellar mass $\mathrm{>10^9 M_{\odot}}$, our GAT+ model achieves an accuracy of $85\,\%$, outperforming graph-agnostic multilayer perceptrons and graph convolutional networks. Our results demonstrate that graph-based representations of galaxy positions provide a powerful and physically meaningful way to infer dark matter environments. We plan to apply this simulation-based graph modelling to investigate how the properties of observed galaxies from the Dark Energy Spectroscopic Instrument (DESI) survey are influenced by their dark matter environments.

Optimal Safety-Aware Scheduling for Multi-Agent Aerial 3D Printing with Utility Maximization under Dependency Constraints

Authors:Marios-Nektarios Stamatopoulos, Shridhar Velhal, Avijit Banerjee, George Nikolakopoulos

Date:2025-12-05 15:34:43

This article presents a novel coordination and task-planning framework to enable the simultaneous conflict-free collaboration of multiple unmanned aerial vehicles (UAVs) for aerial 3D printing. The proposed framework formulates an optimization problem that takes a construction mission divided into sub-tasks and a team of autonomous UAVs, along with limited volume and battery. It generates an optimal mission plan comprising task assignments and scheduling while accounting for task dependencies arising from the geometric and structural requirements of the 3D design, inter-UAV safety constraints, material usage, and total flight time of each UAV. The potential conflicts occurring during the simultaneous operation of the UAVs are addressed at a segment level by dynamically selecting the starting time and location of each task to guarantee collision-free parallel execution. An importance prioritization is proposed to accelerate the computation by guiding the solution toward more important tasks. Additionally, a utility maximization formulation is proposed to dynamically determine the optimal number of UAVs required for a given mission, balancing the trade-off between minimizing makespan and the deployment of excess agents. The proposed framework's effectiveness is evaluated through a Gazebo-based simulation setup, where agents are coordinated by a mission control module allocating the printing tasks based on the generated optimal scheduling plan while remaining within the material and battery constraints of each UAV.

Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots

Authors:Sushmita Bhattacharya, Ninad Jadhav, Hammad Izhar, Karen Li, Kevin George, Robert Wood, Stephanie Gil

Date:2025-12-05 15:27:58

We introduce a system for real-time sperm whale rendezvous at sea using an autonomous uncrewed aerial vehicle. Our system employs model-based reinforcement learning that combines in situ sensor data with an empirical whale dive model to guide navigation decisions. Key challenges include (i) real-time acoustic tracking in the presence of multiple whales, (ii) distributed communication and decision-making for robot deployments, and (iii) on-board signal processing and long-range detection from fish-trackers. We evaluate our system by conducting rendezvous with sperm whales at sea in Dominica, performing hardware experiments on land, and running simulations using whale trajectories interpolated from marine biologists' surface observations.

3D Path Planning for Robot-assisted Vertebroplasty from Arbitrary Bi-plane X-ray via Differentiable Rendering

Authors:Blanca Inigo, Benjamin D. Killeen, Rebecca Choi, Michelle Song, Ali Uneri, Majid Khan, Christopher Bailey, Axel Krieger, Mathias Unberath

Date:2025-12-05 15:26:13

Robotic systems are transforming image-guided interventions by enhancing accuracy and minimizing radiation exposure. A significant challenge in robotic assistance lies in surgical path planning, which often relies on the registration of intraoperative 2D images with preoperative 3D CT scans. This requirement can be burdensome and costly, particularly in procedures like vertebroplasty, where preoperative CT scans are not routinely performed. To address this issue, we introduce a differentiable rendering-based framework for 3D transpedicular path planning utilizing bi-planar 2D X-rays. Our method integrates differentiable rendering with a vertebral atlas generated through a Statistical Shape Model (SSM) and employs a learned similarity loss to refine the SSM shape and pose dynamically, independent of fixed imaging geometries. We evaluated our framework in two stages: first, through vertebral reconstruction from orthogonal X-rays for benchmarking, and second, via clinician-in-the-loop path planning using arbitrary-view X-rays. Our results indicate that our method outperformed a normalized cross-correlation baseline in reconstruction metrics (DICE: 0.75 vs. 0.65) and achieved comparable performance to the state-of-the-art model ReVerteR (DICE: 0.77), while maintaining generalization to arbitrary views. Success rates for bipedicular planning reached 82% with synthetic data and 75% with cadaver data, exceeding the 66% and 31% rates of a 2D-to-3D baseline, respectively. In conclusion, our framework facilitates versatile, CT-free 3D path planning for robot-assisted vertebroplasty, effectively accommodating real-world imaging diversity without the need for preoperative CT scans.

Task-Specific Trust Evaluation for Multi-Hop Collaborator Selection via GNN-Aided Distributed Agentic AI

Authors:Botao Zhu, Xianbin Wang, Dusit Niyato

Date:2025-12-05 15:16:04

The success of collaborative task completion among networked devices hinges on the effective selection of trustworthy collaborators. However, accurate task-specific trust evaluation of multi-hop collaborators can be extremely complex. The reason is that their trust evaluation is determined by a combination of diverse trust-related perspectives with different characteristics, including historical collaboration reliability, volatile and sensitive conditions of available resources for collaboration, as well as continuously evolving network topologies. To address this challenge, this paper presents a graph neural network (GNN)-aided distributed agentic AI (GADAI) framework, in which different aspects of devices' task-specific trustworthiness are separately evaluated and jointly integrated to facilitate multi-hop collaborator selection. GADAI first utilizes a GNN-assisted model to infer device trust from historical collaboration data. Specifically, it employs GNN to propagate and aggregate trust information among multi-hop neighbours, resulting in more accurate device reliability evaluation. Considering the dynamic and privacy-sensitive nature of device resources, a privacy-preserving resource evaluation mechanism is implemented using agentic AI. Each device hosts a large AI model-driven agent capable of autonomously determining whether its local resources meet the requirements of a given task, ensuring both task-specific and privacy-preserving trust evaluation. By combining the outcomes of these assessments, only the trusted devices can coordinate a task-oriented multi-hop cooperation path through their agents in a distributed manner. Experimental results show that our proposed GADAI outperforms the comparison algorithms in planning multi-hop paths that maximize the value of task completion.

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Authors:Ziyang Wang, Honglu Zhou, Shijie Wang, Junnan Li, Caiming Xiong, Silvio Savarese, Mohit Bansal, Michael S. Ryoo, Juan Carlos Niebles

Date:2025-12-05 15:03:48

Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly redundant and irrelevant content. While agentic pipelines improve video reasoning capabilities, prevailing frameworks rely on a query-agnostic captioner to perceive video information, which wastes computation on irrelevant content and blurs fine-grained temporal and spatial information. Motivated by active perception theory, we argue that LVU agents should actively decide what, when, and where to observe, and continuously assess whether the current observation is sufficient to answer the query. We present Active Video Perception (AVP), an evidence-seeking framework that treats the video as an interactive environment and acquires compact, queryrelevant evidence directly from pixels. Concretely, AVP runs an iterative plan-observe-reflect process with MLLM agents. In each round, a planner proposes targeted video interactions, an observer executes them to extract time-stamped evidence, and a reflector evaluates the sufficiency of the evidence for the query, either halting with an answer or triggering further observation. Across five LVU benchmarks, AVP achieves highest performance with significant improvements. Notably, AVP outperforms the best agentic method by 5.7% in average accuracy while only requires 18.4% inference time and 12.4% input tokens.

Teaching Language Models Mechanistic Explainability Through Arrow-Pushing

Authors:Théo A. Neukomm, Zlatko Jončev, Philippe Schwaller

Date:2025-12-05 13:57:50

Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computational framework for teaching language models to predict chemical reaction mechanisms through arrow pushing formalism, a century-old notation that tracks electron flow while respecting conservation laws. We developed MechSMILES, a compact textual format encoding molecular structure and electron flow, and trained language models on four mechanism prediction tasks of increasing complexity using mechanistic reaction datasets, such as mech-USPTO-31k and FlowER. Our models achieve more than 95\% top-3 accuracy on elementary step prediction and scores that surpass 73\% on mech-USPTO-31k, and 93\% on FlowER dataset for the retrieval of complete reaction mechanisms on our hardest task. This mechanistic understanding enables three key applications. First, our models serve as post-hoc validators for CASP systems, filtering chemically implausible transformations. Second, they enable holistic atom-to-atom mapping that tracks all atoms, including hydrogens. Third, they extract catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. By grounding predictions in physically meaningful electron moves that ensure conservation of mass and charge, this work provides a pathway toward more explainable and chemically valid computational synthesis planning, while providing an architecture-agnostic framework for the benchmarking of mechanism prediction.

Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning

Authors:Ali Krayani, Seyedeh Fatemeh Sadati, Lucio Marcenaro, Carlo Regazzoni

Date:2025-12-05 13:38:52

This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert-generated demonstrations with probabilistic generative modeling to encode high-level symbolic planning, low-level motion policies, and wireless signal feedback. During deployment, the UAV performs online inference to anticipate interference, localize jammers, and adapt its trajectory accordingly, without prior knowledge of jammer locations. Simulation results demonstrate that the proposed method achieves near-expert performance, significantly reducing communication interference and mission cost compared to model-free reinforcement learning baselines, while maintaining robust generalization in dynamic environments.

Scenario-aware Uncertainty Quantification for Trajectory Prediction with Statistical Guarantees

Authors:Yiming Shu, Jiahui Xu, Linghuan Kong, Fangni Zhang, Guodong Yin, Chen Sun

Date:2025-12-05 12:54:39

Reliable uncertainty quantification in trajectory prediction is crucial for safety-critical autonomous driving systems, yet existing deep learning predictors lack uncertainty-aware frameworks adaptable to heterogeneous real-world scenarios. To bridge this gap, we propose a novel scenario-aware uncertainty quantification framework to provide the predicted trajectories with prediction intervals and reliability assessment. To begin with, predicted trajectories from the trained predictor and their ground truth are projected onto the map-derived reference routes within the Frenet coordinate system. We then employ CopulaCPTS as the conformal calibration method to generate temporal prediction intervals for distinct scenarios as the uncertainty measure. Building upon this, within the proposed trajectory reliability discriminator (TRD), mean error and calibrated confidence intervals are synergistically analyzed to establish reliability models for different scenarios. Subsequently, the risk-aware discriminator leverages a joint risk model that integrates longitudinal and lateral prediction intervals within the Frenet coordinate to identify critical points. This enables segmentation of trajectories into reliable and unreliable segments, holding the advantage of informing downstream planning modules with actionable reliability results. We evaluated our framework using the real-world nuPlan dataset, demonstrating its effectiveness in scenario-aware uncertainty quantification and reliability assessment across diverse driving contexts.

On Dynamic Programming Theory for Leader-Follower Stochastic Games

Authors:Jilles Steeve Dibangoye, Thibaut Le Marre, Ocan Sankur, François Schwarzentruber

Date:2025-12-05 12:23:56

Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.

The Topology of Hardship: Empirical Curriculum Graphs and Structural Bottlenecks in Engineering Degrees

Authors:H. R. Paz

Date:2025-12-05 09:34:29

Engineering degrees are often perceived as "hard", yet this hardness is usually discussed in terms of content difficulty or student weaknesses rather than as a structural property of the curriculum itself. Recent work on course-prerequisite networks and curriculum graphs has shown that study plans can be modelled as complex networks with identifiable hubs and bottlenecks, but most studies rely on official syllabi rather than on how students actually progress through the system (Simon de Blas et al., 2021; Stavrinides & Zuev, 2023; Yang et al., 2024; Wang et al., 2025). This paper introduces the notion of topology of hardship: a quantitative description of curriculum complexity derived from empirical student trajectories in long-cycle engineering programmes. Building on the CAPIRE framework for multilevel trajectory modelling (Paz, 2025a, 2025b), we reconstruct degree-curriculum graphs from enrolment and completion data for 29 engineering curricula across several cohorts. For each graph we compute structural metrics (e.g., density, longest path, bottleneck centrality) and empirical hardship measures capturing blocking probability and time-to-progress. These are combined into a composite hardship index, which is then related to observed dropout rates and time to degree. Our findings show that curriculum hardness is not a vague perception but a measurable topological property: a small number of structurally dense, bottleneck-heavy curricula account for a disproportionate share of dropout and temporal desynchronisation. We discuss implications for curriculum reform, accreditation, and data-informed policy design.

Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning

Authors:Pengcheng Dai, Dongming Wang, Wenwu Yu, Wei Ren

Date:2025-12-05 05:51:14

This paper studies networked multi-agent reinforcement learning (NMARL) with interdependent rewards and coupled policies. In this setting, each agent's reward depends on its own state-action pair as well as those of its direct neighbors, and each agent's policy is parameterized by its local parameters together with those of its $κ_{p}$-hop neighbors, with $κ_{p}\geq 1$ denoting the coupled radius. The objective of the agents is to collaboratively optimize their policies to maximize the discounted average cumulative reward. To address the challenge of interdependent policies in collaborative optimization, we introduce a novel concept termed the neighbors' averaged $Q$-function and derive a new expression for the coupled policy gradient. Based on these theoretical foundations, we develop a distributed scalable coupled policy (DSCP) algorithm, where each agent relies only on the state-action pairs of its $κ_{p}$-hop neighbors and the rewards its their $(κ_{p}+1)$-hop neighbors. Specially, in the DSCP algorithm, we employ a geometric 2-horizon sampling method that does not require storing a full $Q$-table to obtain an unbiased estimate of the coupled policy gradient. Moreover, each agent interacts exclusively with its direct neighbors to obtain accurate policy parameters, while maintaining local estimates of other agents' parameters to execute its local policy and collect samples for optimization. These estimates and policy parameters are updated via a push-sum protocol, enabling distributed coordination of policy updates across the network. We prove that the joint policy produced by the proposed algorithm converges to a first-order stationary point of the objective function. Finally, the effectiveness of DSCP algorithm is demonstrated through simulations in a robot path planning environment, showing clear improvement over state-of-the-art methods.

Computing Supported Models via Transformation to Stable Models

Authors:Fang Li, Gopal Gupta

Date:2025-12-05 05:22:23

Answer Set Programming (ASP) with stable model semantics has proven highly effective for knowledge representation and reasoning. However, the minimality requirement of stable models can be restrictive for applications requiring exploration of non-minimal but logically consistent solution spaces. Supported models, introduced by Apt, Blair, and Walker in 1988, relax this minimality constraint while maintaining a support condition ensuring every true atom is justified by some rule. Despite their theoretical significance, supported models lack practical computational tools integrated with modern ASP solvers. We present a novel transformation-based method enabling computation of supported models using standard ASP infrastructure. Our approach transforms any ground logic program into an equivalent program whose stable models correspond exactly to the supported models of the original program. We implement this transformation for Clingo, providing the first practical tool for computing supported models with state-of-the-art ASP solvers. We demonstrate applications in software verification, medical diagnosis, and planning where supported models enable valuable exploratory reasoning capabilities beyond those provided by stable models. We also provide an empirical evaluation to justify the practical utility of our approach compared to established methods. Our implementation is publicly available and compatible with standard ASP syntax.

Bita: A Conversational Assistant for Fairness Testing

Authors:Keeryn Johnson, Cleyton Magalhaes, Ronnie de Souza Santos

Date:2025-12-05 05:04:59

Bias in AI systems can lead to unfair and discriminatory outcomes, especially when left untested before deployment. Although fairness testing aims to identify and mitigate such bias, existing tools are often difficult to use, requiring advanced expertise and offering limited support for real-world workflows. To address this, we introduce Bita, a conversational assistant designed to help software testers detect potential sources of bias, evaluate test plans through a fairness lens, and generate fairness-oriented exploratory testing charters. Bita integrates a large language model with retrieval-augmented generation, grounding its responses in curated fairness literature. Our validation demonstrates how Bita supports fairness testing tasks on real-world AI systems, providing structured, reproducible evidence of its utility. In summary, our work contributes a practical tool that operationalizes fairness testing in a way that is accessible, systematic, and directly applicable to industrial practice.

Entropic selection for optimal transport on the line with distance cost

Authors:Armand Ley

Date:2025-12-04 22:10:07

We study the small-regularisation limit of the entropic optimal transport problem on the line with distance cost. While convergence of entropic minimizers is well understood in the discrete setting and in the case where the cost is continuous and there is a unique optimal transport plan, the question of existence and characterization outside these settings remains largely open. We propose a natural candidate for the limiting object and establish its convergence under mutual singularity of the marginals. For arbitrary marginals, we moreover prove that every limit point of entropic minimizers obeys a structural condition known as weak multiplicativity. The construction of our candidate relies on a decomposition theorem for optimal transport plan that we believe is of independent interest. This article complements the previous work of Di Marino and Louet.

Multi-bandpass Photometry for Exoplanet Atmosphere Reconnaissance (MPEAR) with the Habitable Worlds Observatory (HWO) -- I. Differentiating Earth from Neptunes During Discovery

Authors:Eleonora Alei, Avi M. Mandell, Miles H. Currie, Aki Roberge, Christopher C. Stark, Allison Payne, Vincent Kofman, Geronimo L. Villanueva, Renyu Hu, Amber V. Young

Date:2025-12-04 22:02:12

As the architecture for the Habitable Worlds Observatory (HWO) is being developed, it is crucial to optimize the observing strategies for a survey to detect and characterize Earth-like planets around Sun-like stars. Efficient target identification and characterization will help drive mission requirements that can be matched to the planned observations. Current HWO concepts allow simultaneous multi-bandpass observations with the coronagraph instrument, critical for performing a qualitative planetary reconnaissance to optimize observing time for deriving orbital constraints and prioritize characterization of promising targets. We describe a new algorithm designed to determine the best combination of broadband photometric observations for extracting maximum information from the first visit. It identifies degeneracies in the orbital configurations, fluxes, and noise, and determines optimal secondary photometry bands to reduce these. We demonstrate its application by comparing an Earth seen at quadrature with a cold and a warm Neptune at inclined orbits and varying phases, with comparable flux in the discovery bandpass centered at 500 nm (20\% bandwidth). Using the noise and exposure time calculator that we developed for the HWO coronagraph instrument, we find that the baseline $S/N=7$ (corresponding to 3.2 hours observing time for a planet at 10pc) is only sufficient to marginally differentiate the Earth from a cold Neptune-like planet assuming two parallel bandpasses (550 nm + 850 nm). However, increasing to $S/N=15$ (7 hours observing time) and using three parallel bandpasses (360 nm + 500 nm + 1.11 micron) would differentiate the Earth from either a warm or cold Neptune.

Search at Scale: Improving Numerical Conditioning of Ergodic Coverage Optimization for Multi-Scale Domains

Authors:Yanis Lahrach, Christian Hughes, Ian Abraham

Date:2025-12-04 20:06:36

Recent methods in ergodic coverage planning have shown promise as tools that can adapt to a wide range of geometric coverage problems with general constraints, but are highly sensitive to the numerical scaling of the problem space. The underlying challenge is that the optimization formulation becomes brittle and numerically unstable with changing scales, especially under potentially nonlinear constraints that impose dynamic restrictions, due to the kernel-based formulation. This paper proposes to address this problem via the development of a scale-agnostic and adaptive ergodic coverage optimization method based on the maximum mean discrepancy metric (MMD). Our approach allows the optimizer to solve for the scale of differential constraints while annealing the hyperparameters to best suit the problem domain and ensure physical consistency. We also derive a variation of the ergodic metric in the log space, providing additional numerical conditioning without loss of performance. We compare our approach with existing coverage planning methods and demonstrate the utility of our approach on a wide range of coverage problems.

Predictions of the Nancy Grace Roman Space Telescope Galactic Exoplanet Survey. V. Detection Rates of Multiplanetary Systems in High Magnification Microlensing Events

Authors:Vito Saggese, Étienne Bachelet, Sebastiano Calchi Novati, Valerio Bozza, Giovanni Covone, Farzaneh Zohrabi, Michael D. Albrow, Jay Anderson, Charles Beichman, David P. Bennett, Aparna Bhattacharya, Christopher Brandon, Sean Carey, Jessie Christiansen, Alison Crisp, William DeRocco, B. Scott Gaudi, Jon Hulberg, Macy J. Huston, Stela Ishitani Silva, Eamonn Kerins, Somayeh Khakpash, Katarzyna Kruszyńska, Casey Lam, Jessica R. Lu, Amber Malpas, Arjun Murlidhar, Marz Newman, Greg Olmschenk, Matthew Penny, Keivan G. Stassun, Alexander P. Stephan, Rachel A. Street, Takahiro Sumi, Sean K. Terry, Himanshu Verma, Weicheng Zang

Date:2025-12-04 19:00:01

The Nancy Grace Roman Space Telescope will expand the reach of gravitational microlensing surveys by increasing the number of events monitored and the precision of their light curves. We investigate Roman's ability to detect triple-lens microlensing systems, cases where a foreground star with two bound exoplanets produces detectable anomalies in a microlensing event, using its planned high-cadence observations toward the Galactic bulge. We simulate a large set of high-magnification microlensing light curves based on Roman's expected survey characteristics. A detection criterion, based on a required $χ^2$ improvement for a two-planet model, is applied to determine whether the second planet can be reliably distinguished from a single-planet (binary-lens) model. Our simulations show that the majority of two-planet microlensing events would be detectable with Roman. Events in which both planets are relatively massive (planet-star mass ratios of order $10^{-3}$), or in which the more massive planet occupies a favorable resonant configuration, produce strong central perturbations, resulting in detection efficiencies of roughly 90\%. By contrast, systems with only low-mass planets ($q \sim 10^{-4}$) or with less favorable alignments generate much weaker signals, which often fall below the detection threshold. In general, the planetary mass ratios and the resulting caustic geometry (e.g., central caustic size in resonant versus wide/close orbits) are the dominant factors governing detectability. Taking into account the expected frequency of planetary systems and the fraction of high-magnification events, we estimate that Roman will detect a high-magnification triple-lens event in approximately 4.5\% of multi-planet microlensing events, corresponding to about 64 events over the course of the full survey.

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

Authors:Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng Li

Date:2025-12-04 18:59:53

Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abstract textual planning. To this end, we propose Draft-as-CoT (DraCo), a novel interleaved reasoning paradigm that fully leverages both textual and visual contents in CoT for better planning and verification. Our method first generates a low-resolution draft image as preview, providing more concrete and structural visual planning and guidance. Then, we employ the model's inherent understanding capability to verify potential semantic misalignments between the draft and input prompt, and performs refinement through selective corrections with super-resolution. In this way, our approach addresses two fundamental challenges: the coarse-grained nature of textual planning and the difficulty in generating rare attribute combinations. To support training, we curate DraCo-240K, aiming to enhance three atomic capabilities spanning general correction, instance manipulation, and layout reorganization. Supported by DraCo-CFG, a specialized classifier-free guidance (CFG) strategy for interleaved reasoning, DraCo achieves a tremendous increase on GenEval (+8%), Imagine-Bench (+0.91), and GenEval++ (+3%), significantly outperforming direct generation and other generation methods empowered by CoT.

Two-Stage Camera Calibration Method for Multi-Camera Systems Using Scene Geometry

Authors:Aleksandr Abramov

Date:2025-12-04 16:47:39

Calibration of multi-camera systems is a key task for accurate object tracking. However, it remains a challenging problem in real-world conditions, where traditional methods are not applicable due to the lack of accurate floor plans, physical access to place calibration patterns, or synchronized video streams. This paper presents a novel two-stage calibration method that overcomes these limitations. In the first stage, partial calibration of individual cameras is performed based on an operator's annotation of natural geometric primitives (parallel, perpendicular, and vertical lines, or line segments of equal length). This allows estimating key parameters (roll, pitch, focal length) and projecting the camera's Effective Field of View (EFOV) onto the horizontal plane in a base 3D coordinate system. In the second stage, precise system calibration is achieved through interactive manipulation of the projected EFOV polygons. The operator adjusts their position, scale, and rotation to align them with the floor plan or, in its absence, using virtual calibration elements projected onto all cameras in the system. This determines the remaining extrinsic parameters (camera position and yaw). Calibration requires only a static image from each camera, eliminating the need for physical access or synchronized video. The method is implemented as a practical web service. Comparative analysis and demonstration videos confirm the method's applicability, accuracy, and flexibility, enabling the deployment of precise multi-camera tracking systems in scenarios previously considered infeasible.

On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator

Authors:Matteo Masoni, Vincenzo Palermo, Marco Gabiccini, Martino Gulisano, Giorgio Previati, Massimiliano Gobbi, Francesco Comolli, Gianpiero Mastinu, Massimo Guiggiani

Date:2025-12-04 15:45:40

This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.

The Stagnant Persistence Paradox: Survival Analysis and Temporal Efficiency in Exact Sciences and Engineering Education

Authors:H. R. Paz

Date:2025-12-04 14:11:28

Research on student progression in higher education has traditionally focused on vertical outcomes such as persistence and dropout, often reducing complex academic histories to binary indicators. While the structural component of horizontal mobility (major switching, plan changes, re-entries) has recently been recognised as a core feature of contemporary university systems, the temporal cost and efficiency of these pathways remain largely unquantified. Using forty years of administrative records from a large faculty of engineering and exact sciences in Argentina (N = 24,016), this study applies a dual-outcome survival analysis framework to two key outcomes: definitive dropout and first major switch. We reconstruct academic trajectories as sequences of enrolment spells and typed transitions under the CAPIRE protocol, and then deploy non-parametric Kaplan-Meier estimators to model time-to-event under right-censoring. Results uncover a critical systemic inefficiency: a global median survival time of 4.33 years prior to definitive dropout, with a pronounced long tail of extended enrolment. This pattern reveals a phenomenon of stagnant persistence, where students remain formally enrolled for long periods without commensurate curricular progression. In contrast, major switching follows an early-event regime, with a median time of 1.0 year among switchers and most switches concentrated within the first academic year. We argue that academic failure in rigid engineering curricula is not a sudden outcome but a long-tail process that generates high opportunity costs, and that institutional indicators should shift from static retention metrics towards measures of curricular velocity based on time-to-event analysis.

Neural Policy Composition from Free Energy Minimization

Authors:Francesca Rossi, Veronica Centorrino, Francesco Bullo, Giovanni Russo

Date:2025-12-04 12:31:41

The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.

E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving

Authors:Yihong Tang, Haicheng Liao, Tong Nie, Junlin He, Ao Qu, Kehua Chen, Wei Ma, Zhenning Li, Lijun Sun, Chengzhong Xu

Date:2025-12-04 12:17:25

End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and AD acceptance. We introduce Open-Domain End-to-End (OD-E2E) autonomous driving, where an autonomous vehicle (AV) must interpret free-form natural-language commands, infer the emotion, and plan a physically feasible trajectory. We propose E3AD, an emotion-aware VLA framework that augments semantic understanding with two cognitively inspired components: a continuous Valenc-Arousal-Dominance (VAD) emotion model that captures tone and urgency from language, and a dual-pathway spatial reasoning module that fuses egocentric and allocentric views for human-like spatial cognition. A consistency-oriented training scheme, combining modality pretraining with preference-based alignment, further enforces coherence between emotional intent and driving actions. Across real-world datasets, E3AD improves visual grounding and waypoint planning and achieves state-of-the-art (SOTA) VAD correlation for emotion estimation. These results show that injecting emotion into VLA-style driving yields more human-aligned grounding, planning, and human-centric feedback.

POLARIS: Is Multi-Agentic Reasoning the Next Wave in Engineering Self-Adaptive Systems?

Authors:Divyansh Pandey, Vyakhya Gupta, Prakhar Singhal, Karthik Vaidhyanathan

Date:2025-12-04 11:51:03

The growing scale, complexity, interconnectivity, and autonomy of modern software ecosystems introduce unprecedented uncertainty, challenging the foundations of traditional self-adaptation. Existing approaches, typically rule-driven controllers or isolated learning components, struggle to generalize to novel contexts or coordinate responses across distributed subsystems, leaving them ill-equipped for emergent unknown unknowns. Recent discussions on Self-Adaptation 2.0 emphasize an equal partnership between AI and adaptive systems, merging learning-driven intelligence with adaptive control for predictive and proactive behavior. Building on this foundation, we introduce POLARIS, a three-layer multi-agentic self-adaptation framework that advances beyond reactive adaptation. POLARIS integrates: (1) a low-latency Adapter layer for monitoring and safe execution, (2) a transparent Reasoning layer that generates and verifies plans using tool-aware, explainable agents, and (3) a Meta layer that records experiences and meta-learns improved adaptation policies over time. Through shared knowledge and predictive models, POLARIS handles uncertainty, learns from past actions, and evolves its strategies, enabling systems that anticipate change and maintain resilient, goal-directed behavior. Preliminary evaluation on two self-adaptive exemplars, SWIM and SWITCH, shows that POLARIS consistently outperforms state-of-the-art baselines. We argue this marks a shift toward Self-Adaptation 3.0, akin to Software 3.0: a paradigm where systems not only learn from their environment but also reason about and evolve their own adaptation processes, continuously improving to meet novel challenges.