planning - 2025-12-07

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

Authors:Dongzhi Jiang, Renrui Zhang, Haodong Li, Zhuofan Zong, Ziyu Guo, Jun He, Claire Guo, Junyan Ye, Rongyao Fang, Weijia Li, Rui Liu, Hongsheng Li
Date:2025-12-04 18:59:53

Recent unified multimodal large language models (MLLMs) have shown impressive capabilities, incorporating chain-of-thought (CoT) reasoning for enhanced text-to-image generation. However, existing approaches remain limited, either treating the model merely as a standalone generator or relying on abstract textual planning. To this end, we propose Draft-as-CoT (DraCo), a novel interleaved reasoning paradigm that fully leverages both textual and visual contents in CoT for better planning and verification. Our method first generates a low-resolution draft image as preview, providing more concrete and structural visual planning and guidance. Then, we employ the model's inherent understanding capability to verify potential semantic misalignments between the draft and input prompt, and performs refinement through selective corrections with super-resolution. In this way, our approach addresses two fundamental challenges: the coarse-grained nature of textual planning and the difficulty in generating rare attribute combinations. To support training, we curate DraCo-240K, aiming to enhance three atomic capabilities spanning general correction, instance manipulation, and layout reorganization. Supported by DraCo-CFG, a specialized classifier-free guidance (CFG) strategy for interleaved reasoning, DraCo achieves a tremendous increase on GenEval (+8%), Imagine-Bench (+0.91), and GenEval++ (+3%), significantly outperforming direct generation and other generation methods empowered by CoT.

On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator

Authors:Matteo Masoni, Vincenzo Palermo, Marco Gabiccini, Martino Gulisano, Giorgio Previati, Massimiliano Gobbi, Francesco Comolli, Gianpiero Mastinu, Massimo Guiggiani
Date:2025-12-04 15:45:40

This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.

The Stagnant Persistence Paradox: Survival Analysis and Temporal Efficiency in Exact Sciences and Engineering Education

Authors:H. R. Paz
Date:2025-12-04 14:11:28

Research on student progression in higher education has traditionally focused on vertical outcomes such as persistence and dropout, often reducing complex academic histories to binary indicators. While the structural component of horizontal mobility (major switching, plan changes, re-entries) has recently been recognised as a core feature of contemporary university systems, the temporal cost and efficiency of these pathways remain largely unquantified. Using forty years of administrative records from a large faculty of engineering and exact sciences in Argentina (N = 24,016), this study applies a dual-outcome survival analysis framework to two key outcomes: definitive dropout and first major switch. We reconstruct academic trajectories as sequences of enrolment spells and typed transitions under the CAPIRE protocol, and then deploy non-parametric Kaplan-Meier estimators to model time-to-event under right-censoring. Results uncover a critical systemic inefficiency: a global median survival time of 4.33 years prior to definitive dropout, with a pronounced long tail of extended enrolment. This pattern reveals a phenomenon of stagnant persistence, where students remain formally enrolled for long periods without commensurate curricular progression. In contrast, major switching follows an early-event regime, with a median time of 1.0 year among switchers and most switches concentrated within the first academic year. We argue that academic failure in rigid engineering curricula is not a sudden outcome but a long-tail process that generates high opportunity costs, and that institutional indicators should shift from static retention metrics towards measures of curricular velocity based on time-to-event analysis.

Neural Policy Composition from Free Energy Minimization

Authors:Francesca Rossi, Veronica Centorrino, Francesco Bullo, Giovanni Russo
Date:2025-12-04 12:31:41

The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.

E3AD: An Emotion-Aware Vision-Language-Action Model for Human-Centric End-to-End Autonomous Driving

Authors:Yihong Tang, Haicheng Liao, Tong Nie, Junlin He, Ao Qu, Kehua Chen, Wei Ma, Zhenning Li, Lijun Sun, Chengzhong Xu
Date:2025-12-04 12:17:25

End-to-end autonomous driving (AD) systems increasingly adopt vision-language-action (VLA) models, yet they typically ignore the passenger's emotional state, which is central to comfort and AD acceptance. We introduce Open-Domain End-to-End (OD-E2E) autonomous driving, where an autonomous vehicle (AV) must interpret free-form natural-language commands, infer the emotion, and plan a physically feasible trajectory. We propose E3AD, an emotion-aware VLA framework that augments semantic understanding with two cognitively inspired components: a continuous Valenc-Arousal-Dominance (VAD) emotion model that captures tone and urgency from language, and a dual-pathway spatial reasoning module that fuses egocentric and allocentric views for human-like spatial cognition. A consistency-oriented training scheme, combining modality pretraining with preference-based alignment, further enforces coherence between emotional intent and driving actions. Across real-world datasets, E3AD improves visual grounding and waypoint planning and achieves state-of-the-art (SOTA) VAD correlation for emotion estimation. These results show that injecting emotion into VLA-style driving yields more human-aligned grounding, planning, and human-centric feedback.

POLARIS: Is Multi-Agentic Reasoning the Next Wave in Engineering Self-Adaptive Systems?

Authors:Divyansh Pandey, Vyakhya Gupta, Prakhar Singhal, Karthik Vaidhyanathan
Date:2025-12-04 11:51:03

The growing scale, complexity, interconnectivity, and autonomy of modern software ecosystems introduce unprecedented uncertainty, challenging the foundations of traditional self-adaptation. Existing approaches, typically rule-driven controllers or isolated learning components, struggle to generalize to novel contexts or coordinate responses across distributed subsystems, leaving them ill-equipped for emergent unknown unknowns. Recent discussions on Self-Adaptation 2.0 emphasize an equal partnership between AI and adaptive systems, merging learning-driven intelligence with adaptive control for predictive and proactive behavior. Building on this foundation, we introduce POLARIS, a three-layer multi-agentic self-adaptation framework that advances beyond reactive adaptation. POLARIS integrates: (1) a low-latency Adapter layer for monitoring and safe execution, (2) a transparent Reasoning layer that generates and verifies plans using tool-aware, explainable agents, and (3) a Meta layer that records experiences and meta-learns improved adaptation policies over time. Through shared knowledge and predictive models, POLARIS handles uncertainty, learns from past actions, and evolves its strategies, enabling systems that anticipate change and maintain resilient, goal-directed behavior. Preliminary evaluation on two self-adaptive exemplars, SWIM and SWITCH, shows that POLARIS consistently outperforms state-of-the-art baselines. We argue this marks a shift toward Self-Adaptation 3.0, akin to Software 3.0: a paradigm where systems not only learn from their environment but also reason about and evolve their own adaptation processes, continuously improving to meet novel challenges.

Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap

Authors:Jialong Li, Mingyue Zhang, Nianyu Li, Danny Weyns, Zhi Jin, Kenji Tei
Date:2025-12-04 11:13:43

Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a feedback loop with four core functionalities: monitoring, analyzing, planning, and execution. Recently, generative artificial intelligence (GenAI), especially the area of large language models, has shown impressive performance in data comprehension and logical reasoning. These capabilities are highly aligned with the functionalities required in SASs, suggesting a strong potential to employ GenAI to enhance SASs. However, the specific benefits and challenges of employing GenAI in SASs remain unclear. Yet, providing a comprehensive understanding of these benefits and challenges is complex due to several reasons: limited publications in the SAS field, the technological and application diversity within SASs, and the rapid evolution of GenAI technologies. To that end, this paper aims to provide researchers and practitioners a comprehensive snapshot that outlines the potential benefits and challenges of employing GenAI's within SAS. Specifically, we gather, filter, and analyze literature from four distinct research fields and organize them into two main categories to potential benefits: (i) enhancements to the autonomy of SASs centered around the specific functions of the MAPE-K feedback loop, and (ii) improvements in the interaction between humans and SASs within human-on-the-loop settings. From our study, we outline a research roadmap that highlights the challenges of integrating GenAI into SASs. The roadmap starts with outlining key research challenges that need to be tackled to exploit the potential for applying GenAI in the field of SAS. The roadmap concludes with a practical reflection, elaborating on current shortcomings of GenAI and proposing possible mitigation strategies.

Progress towards a microchannel plate detector with AlGaN photocathode and cross-strip anode for ultraviolet astronomy

Authors:S. Diebold, J. Barnstedt, L. Conti, H. R. Elsener, L. Hanke, M. Höltzli, C. Kalkuhl, D. Rau, D. Schaadt, T. Schanz, B. Stelzer, K. Werner
Date:2025-12-04 11:03:00

Microchannel plates (MCPs) were the driving detector technology for ultraviolet (UV) astronomy over many years, and still today MCP-based detectors are the baseline for several planned UV instruments. The development of advanced MCP detectors is ongoing and pursues the major goals of maximizing sensitivity, resolution, and lifetime, while at the same time decreasing weight, volume, and power consumption. Development efforts for an MCP-based detector system for the UV are running at IAAT at the University of Tübingen. In this publication, we present our latest results towards coating aluminum gallium nitride (AlGaN) photocathodes directly on MCPs, to improve quantum detection efficiency in the far- and extreme-UV. Furthermore, we report on the implementation of a non-iterative centroiding algorithm for our coplanar cross-strip anode directly in an FPGA.

Quantised Academic Mobility: Network and Cluster Analysis of Degree Switching, Plan Changes, and Re-entries in an Engineering Faculty (1980-2019)

Authors:H. R. Paz
Date:2025-12-04 10:26:33

This study challenges the traditional binary view of student progression (retention versus dropout) by conceptualising academic trajectories as complex, quantised pathways. Utilising a 40-year longitudinal dataset from an Argentine engineering faculty (N = 24,016), we introduce CAPIRE, an analytical framework that differentiates between degree major switches, curriculum plan changes, and same-plan re-entries. While 73.3 per cent of students follow linear trajectories (Estables), a significant 26.7 per cent exhibit complex mobility patterns. By applying Principal Component Analysis (PCA) and DBSCAN clustering, we reveal that these trajectories are not continuous but structurally quantised, occupying discrete bands of complexity. The analysis identifies six distinct student archetypes, including 'Switchers' (10.7 per cent) who reorient vocationally, and 'Stable Re-entrants' (6.9 per cent) who exhibit stop-out behaviours without changing discipline. Furthermore, network analysis highlights specific 'hub majors' - such as electronics and computing - that act as systemic attractors. These findings suggest that student flux is an organised ecosystemic feature rather than random noise, offering institutions a new lens for curriculum analytics and predictive modelling.

Reflection-Satisfaction Tradeoff: Investigating Impact of Reflection on Student Engagement with AI-Generated Programming Hints

Authors:Heeryung Choi, Tung Phung, Mengyan Wu, Adish Singla, Christopher Brooks
Date:2025-12-04 10:01:33

Generative AI tools, such as AI-generated hints, are increasingly integrated into programming education to offer timely, personalized support. However, little is known about how to effectively leverage these hints while ensuring autonomous and meaningful learning. One promising approach involves pairing AI-generated hints with reflection prompts, asking students to review and analyze their learning, when they request hints. This study investigates the interplay between AI-generated hints and different designs of reflection prompts in an online introductory programming course. We conducted a two-trial field experiment. In Trial 1, students were randomly assigned to receive prompts either before or after receiving hints, or no prompt at all. Each prompt also targeted one of three SRL phases: planning, monitoring, and evaluation. In Trial 2, we examined two types of prompt guidance: directed (offering more explicit and structured guidance) and open (offering more general and less constrained guidance). Findings show that students in the before-hint (RQ1), planning (RQ2), and directed (RQ3) prompt groups produced higher-quality reflections but reported lower satisfaction with AI-generated hints than those in other conditions. Immediate performance did not differ across conditions. This negative relationship between reflection quality and hint satisfaction aligns with previous work on student mental effort and satisfaction. Our results highlight the need to reconsider how AI models are trained and evaluated for education, as prioritizing user satisfaction can undermine deeper learning.

BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation

Authors:Chenyang Zuo, Siqi Fan, Zaiqing Nie
Date:2025-12-04 10:00:16

Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in large language models, especially the recent emergence of reasoning models, it is natural to explore how a general-purpose language model can be efficiently adapted for molecular science applications. In this work, we introduce BioMedGPT-Mol, a molecular language model designed to support molecular understanding and generation tasks. By curating and unifying existing public instruction datasets, we have assembled a large-scale, comprehensive, and high-quality training dataset. The model is then fine-tuned through a meticulously designed multi-task learning framework. On a consolidated benchmark derived from LlaSMol, TOMG-Bench, and MuMOInstruct, BioMedGPT-Mol achieves remarkable performance. Our experimental results demonstrate that a general-purpose reasoning model can be effectively and efficiently post-trained into a professional molecular language model through a well-structured multi-task curriculum. Leveraging the power of it, we further explore retrosynthetic planning task, and the performance on RetroBench demonstrates its competitive capability of acting as an end-to-end retrosynthetic planner. We anticipate that our approach can be extended to other biomedical scientific domains.

SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation

Authors:Xin Liang, Xiang Zhang, Yiwei Xu, Siqi Sun, Chenyu You
Date:2025-12-04 07:22:16

Generating academic slides from scientific papers is a challenging multimodal reasoning task that requires both long context understanding and deliberate visual planning. Existing approaches largely reduce it to text only summarization, overlooking the visual component and design intensive nature of slide creation. In this paper we introduce SlideGen, an agentic, modular, and visual in the loop framework for scientific paper to slide generation. SlideGen orchestrates a group of vision language agents that reason collaboratively over the document structure and semantics, producing editable PPTX slides with logical flow and compelling visual presentation. By integrating coordinated outlining, mapping, arrangement, note synthesis, and iterative refinement, our system consistently delivers slides of expert level quality. Across diverse benchmarks and strong baselines, SlideGen outperforms existing methods in visual quality, content faithfulness, and readability, positioning it as the new state of the art in automated slide generation. Our work establishes a foundation for design aware multimodal slide generation, demonstrating how agentic collaboration can bridge understanding and presentation in complex multimodal reasoning tasks.

Auto3R: Automated 3D Reconstruction and Scanning via Data-driven Uncertainty Quantification

Authors:Chentao Shen, Sizhe Zheng, Bingqian Wu, Yaohua Feng, Yuanchen Fei, Mingyu Mei, Hanwen Jiang, Xiangru Huang
Date:2025-12-04 07:20:51

Traditional high-quality 3D scanning and reconstruction typically relies on human labor to plan the scanning procedure. With the rapid development of embodied systems such as drones and robots, there is a growing demand of performing accurate 3D scanning and reconstruction in an fully automated manner. We introduce Auto3R, a data-driven uncertainty quantification model that is designed to automate the 3D scanning and reconstruction of scenes and objects, including objects with non-lambertian and specular materials. Specifically, in a process of iterative 3D reconstruction and scanning, Auto3R can make efficient and accurate prediction of uncertainty distribution over potential scanning viewpoints, without knowing the ground truth geometry and appearance. Through extensive experiments, Auto3R achieves superior performance that outperforms the state-of-the-art methods by a large margin. We also deploy Auto3R on a robot arm equipped with a camera and demonstrate that Auto3R can be used to effectively digitize real-world 3D objects and delivers ready-to-use and photorealistic digital assets. Our homepage: https://tomatoma00.github.io/auto3r.github.io .

A Modular Cognitive Architecture for Assisted Reasoning: The Nemosine Framework

Authors:Edervaldo Melo
Date:2025-12-04 06:09:35

This paper presents the Nemosine Framework, a modular cognitive architecture designed to support assisted reasoning, structured thinking, and systematic analysis. The model operates through functional cognitive modules ("personas") that organize tasks such as planning, evaluation, cross-checking, and narrative synthesis. The framework combines principles from metacognition, distributed cognition, and modular cognitive systems to offer an operational structure for assisted problem-solving and decision support. The architecture is documented through formal specification, internal consistency criteria, and reproducible structural components. The goal is to provide a clear conceptual basis for future computational implementations and to contribute to the study of symbolic-modular architectures for reasoning.

dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning

Authors:Yingzi Ma, Yulong Cao, Wenhao Ding, Shuibai Zhang, Yan Wang, Boris Ivanovic, Ming Jiang, Marco Pavone, Chaowei Xiao
Date:2025-12-04 05:05:41

The autonomous driving community is increasingly focused on addressing the challenges posed by out-of-distribution (OOD) driving scenarios. A dominant research trend seeks to enhance end-to-end (E2E) driving systems by integrating vision-language models (VLMs), leveraging their rich world knowledge and reasoning abilities to improve generalization across diverse environments. However, most existing VLMs or vision-language agents (VLAs) are built upon autoregressive (AR) models. In this paper, we observe that existing AR-based VLMs -- limited by causal attention and sequential token generation -- often fail to maintain consistency and controllability between high-level reasoning and low-level planning. In contrast, recent discrete diffusion VLMs equipped with bidirectional attention exhibit superior controllability and reliability through iterative denoising. Building on these observations, we introduce dVLM-AD, a diffusion-based vision-language model that unifies perception, structured reasoning, and low-level planning for end-to-end driving. Evaluated on nuScenes and WOD-E2E, dVLM-AD yields more consistent reasoning-action pairs and achieves planning performance comparable to existing driving VLM/VLA systems despite a modest backbone, outperforming AR-based baselines with a 9 percent improvement in behavior-trajectory consistency and a 6 percent increase in RFS on long-tail WOD-E2E scenarios. These results suggest a controllable and reliable pathway for scalable end-to-end driving.

Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

Authors:Debasmita Ghose, Oz Gitelson, Marynel Vazquez, Brian Scassellati
Date:2025-12-04 04:51:25

To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops

Authors:Chang Liu, Sibo Tian, Sara Behdad, Xiao Liang, Minghui Zheng
Date:2025-12-04 04:36:39

Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.

MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving

Authors:Bin Suna, Yaoguang Caob, Yan Wanga, Rui Wanga, Jiachen Shanga, Xiejie Fenga, Jiayi Lu, Jia Shi, Shichun Yang, Xiaoyu Yane, Ziying Song
Date:2025-12-04 04:16:10

End-to-End autonomous driving (E2E-AD) has emerged as a new paradigm, where trajectory planning plays a crucial role. Existing studies mainly follow two directions: trajectory generation oriented, which focuses on producing high-quality trajectories with simple decision mechanisms, and trajectory selection oriented, which performs multi-dimensional evaluation to select the best trajectory yet lacks sufficient generative capability. In this work, we propose MindDrive, a harmonized framework that integrates high-quality trajectory generation with comprehensive decision reasoning. It establishes a structured reasoning paradigm of "context simulation - candidate generation - multi-objective trade-off". In particular, the proposed Future-aware Trajectory Generator (FaTG), based on a World Action Model (WaM), performs ego-conditioned "what-if" simulations to predict potential future scenes and generate foresighted trajectory candidates. Building upon this, the VLM-oriented Evaluator (VLoE) leverages the reasoning capability of a large vision-language model to conduct multi-objective evaluations across safety, comfort, and efficiency dimensions, leading to reasoned and human-aligned decision making. Extensive experiments on the NAVSIM-v1 and NAVSIM-v2 benchmarks demonstrate that MindDrive achieves state-of-the-art performance across multi-dimensional driving metrics, significantly enhancing safety, compliance, and generalization. This work provides a promising path toward interpretable and cognitively guided autonomous driving.

Towards 6G Native-AI Edge Networks: A Semantic-Aware and Agentic Intelligence Paradigm

Authors:Chenyuan Feng, Anbang Zhang, Geyong Min, Yongming Huang, Tony Q. S. Quek, Xiaohu You
Date:2025-12-04 03:09:33

The evolution toward sixth-generation wireless systems positions intelligence as a native network capability, fundamentally transforming the design of radio access networks (RANs). Within this vision, Semantic-native communication and agentic intelligence are expected to play central roles. SemCom departs from bit-level fidelity and instead emphasizes task-oriented meaning exchange, enabling compact SC and introducing new performance measures such as semantic fidelity and task success rate. Agentic intelligence endows distributed RAN entities with goal-driven autonomy, reasoning, planning, and multi-agent collaboration, increasingly supported by foundation models and knowledge graphs. In this work, we first introduce the conceptual foundations of SemCom and agentic networking, and discuss why existing AI-driven O-RAN solutions remain largely bit-centric and task-siloed. We then present a unified taxonomy that organizes recent research along three axes: i) semantic abstraction level (symbol/feature/intent/knowledge), ii) agent autonomy and coordination granularity (single-, multi-, and hierarchical-agent), and iii) RAN control placement across PHY/MAC, near-real-time RIC, and non-real-time RIC. Based on this taxonomy, we systematically introduce enabling technologies including task-oriented semantic encoders/decoders, multi-agent reinforcement learning, foundation-model-assisted RAN agents, and knowledge-graph-based reasoning for cross-layer awareness. Representative 6G use cases, such as immersive XR, vehicular V2X, and industrial digital twins, are analyzed to illustrate the semantic-agentic convergence in practice. Finally, we identify open challenges in semantic representation standardization, scalable trustworthy agent coordination, O-RAN interoperability, and energy-efficient AI deployment, and outline research directions toward operational semantic-agentic AI-RAN.

Bridging Probabilistic Inference and Behavior Trees: An Interactive Framework for Adaptive Multi-Robot Cooperation

Authors:Chaoran Wang, Jingyuan Sun, Yanhui Zhang, Changju Wu
Date:2025-12-04 03:08:55

This paper proposes an Interactive Inference Behavior Tree (IIBT) framework that integrates behavior trees (BTs) with active inference under the free energy principle for distributed multi-robot decision-making. The proposed IIBT node extends conventional BTs with probabilistic reasoning, enabling online joint planning and execution across multiple robots. It remains fully compatible with standard BT architectures, allowing seamless integration into existing multi-robot control systems. Within this framework, multi-robot cooperation is formulated as a free-energy minimization process, where each robot dynamically updates its preference matrix based on perceptual inputs and peer intentions, thereby achieving adaptive coordination in partially observable and dynamic environments. The proposed approach is validated through both simulation and real-world experiments, including a multi-robot maze navigation and a collaborative manipulation task, compared against traditional BTs(https://youtu.be/KX_oT3IDTf4). Experimental results demonstrate that the IIBT framework reduces BT node complexity by over 70%, while maintaining robust, interpretable, and adaptive cooperative behavior under environmental uncertainty.

Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism

Authors:Tianwei Ni, Esther Derman, Vineet Jain, Vincent Taboga, Siamak Ravanbakhsh, Pierre-Luc Bacon
Date:2025-12-04 00:07:08

Popular offline reinforcement learning (RL) methods rely on conservatism, either by penalizing out-of-dataset actions or by restricting planning horizons. In this work, we question the universality of this principle and instead revisit a complementary one: a Bayesian perspective. Rather than enforcing conservatism, the Bayesian approach tackles epistemic uncertainty in offline data by modeling a posterior distribution over plausible world models and training a history-dependent agent to maximize expected rewards, enabling test-time generalization. We first illustrate, in a bandit setting, that Bayesianism excels on low-quality datasets where conservatism fails. We then scale the principle to realistic tasks, identifying key design choices, such as layer normalization in the world model and adaptive long-horizon planning, that mitigate compounding error and value overestimation. These yield our practical algorithm, Neubay, grounded in the neutral Bayesian principle. On D4RL and NeoRL benchmarks, Neubay generally matches or surpasses leading conservative algorithms, achieving new state-of-the-art on 7 datasets. Notably, it succeeds with planning horizons of several hundred steps, challenging common belief. Finally, we characterize when Neubay is preferable to conservatism, laying the foundation for a new direction in offline and model-based RL.

ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models

Authors:Lei Zhang, Ju Dong, Kaixin Bai, Minheng Ni, Zoltan-Csaba Marton, Zhaopeng Chen, Jianwei Zhang
Date:2025-12-03 22:54:05

Recent advances in large multimodal models have enabled new opportunities in embodied AI, particularly in robotic manipulation. These models have shown strong potential in generalization and reasoning, but achieving reliable and responsible robotic behavior in real-world settings remains an open challenge. In high-stakes environments, robotic agents must go beyond basic task execution to perform risk-aware reasoning, moral decision-making, and physically grounded planning. We introduce ResponsibleRobotBench, a systematic benchmark designed to evaluate and accelerate progress in responsible robotic manipulation from simulation to real world. This benchmark consists of 23 multi-stage tasks spanning diverse risk types, including electrical, chemical, and human-related hazards, and varying levels of physical and planning complexity. These tasks require agents to detect and mitigate risks, reason about safety, plan sequences of actions, and engage human assistance when necessary. Our benchmark includes a general-purpose evaluation framework that supports multimodal model-based agents with various action representation modalities. The framework integrates visual perception, context learning, prompt construction, hazard detection, reasoning and planning, and physical execution. It also provides a rich multimodal dataset, supports reproducible experiments, and includes standardized metrics such as success rate, safety rate, and safe success rate. Through extensive experimental setups, ResponsibleRobotBench enables analysis across risk categories, task types, and agent configurations. By emphasizing physical reliability, generalization, and safety in decision-making, this benchmark provides a foundation for advancing the development of trustworthy, real-world responsible dexterous robotic systems. https://sites.google.com/view/responsible-robotbench

Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications

Authors:Gasser Elazab, Maximilian Jansen, Michael Unterreiner, Olaf Hellwich
Date:2025-12-03 22:37:38

Accurate perception of the vehicle's 3D surroundings, including fine-scale road geometry, such as bumps, slopes, and surface irregularities, is essential for safe and comfortable vehicle control. However, conventional monocular depth estimation often oversmooths these features, losing critical information for motion planning and stability. To address this, we introduce Gamma-from-Mono (GfM), a lightweight monocular geometry estimation method that resolves the projective ambiguity in single-camera reconstruction by decoupling global and local structure. GfM predicts a dominant road surface plane together with residual variations expressed by gamma, a dimensionless measure of vertical deviation from the plane, defined as the ratio of a point's height above it to its depth from the camera, and grounded in established planar parallax geometry. With only the camera's height above ground, this representation deterministically recovers metric depth via a closed form, avoiding full extrinsic calibration and naturally prioritizing near-road detail. Its physically interpretable formulation makes it well suited for self-supervised learning, eliminating the need for large annotated datasets. Evaluated on KITTI and the Road Surface Reconstruction Dataset (RSRD), GfM achieves state-of-the-art near-field accuracy in both depth and gamma estimation while maintaining competitive global depth performance. Our lightweight 8.88M-parameter model adapts robustly across diverse camera setups and, to our knowledge, is the first self-supervised monocular approach evaluated on RSRD.

The Geometry of Benchmarks: A New Path Toward AGI

Authors:Przemyslaw Chojecki
Date:2025-12-03 21:34:09

Benchmarks are the primary tool for assessing progress in artificial intelligence (AI), yet current practice evaluates models on isolated test suites and provides little guidance for reasoning about generality or autonomous self-improvement. Here we introduce a geometric framework in which all psychometric batteries for AI agents are treated as points in a structured moduli space, and agent performance is described by capability functionals over this space. First, we define an Autonomous AI (AAI) Scale, a Kardashev-style hierarchy of autonomy grounded in measurable performance on batteries spanning families of tasks (for example reasoning, planning, tool use and long-horizon control). Second, we construct a moduli space of batteries, identifying equivalence classes of benchmarks that are indistinguishable at the level of agent orderings and capability inferences. This geometry yields determinacy results: dense families of batteries suffice to certify performance on entire regions of task space. Third, we introduce a general Generator-Verifier-Updater (GVU) operator that subsumes reinforcement learning, self-play, debate and verifier-based fine-tuning as special cases, and we define a self-improvement coefficient $κ$ as the Lie derivative of a capability functional along the induced flow. A variance inequality on the combined noise of generation and verification provides sufficient conditions for $κ> 0$. Our results suggest that progress toward artificial general intelligence (AGI) is best understood as a flow on moduli of benchmarks, driven by GVU dynamics rather than by scores on individual leaderboards.

The $^{18}$O$(p,n)^{18}$F Reaction as a Quasi-stellar Neutron Source

Authors:Sarah Agus-Bina
Date:2025-12-03 19:08:55

The s-process in AGB stars produces elements with atomic mass numbers $A\gtrsim60$ through successive neutron captures and beta decays. In stellar environments where the s-process occurs, neutrons quickly thermalize, adopting a Maxwell-Boltzmann energy distribution determined by the local temperature independent of their production via ($α$,n) reactions. Laboratory experiments reproduce this Maxwell-Boltzmann neutron energy spectrum using (p, n) reactions. Specifically, the 7Li(p,n)7Be reaction is commonly employed to measure s-process cross-sections at kT$\approx$25 keV. Expanding the range of reactions used for measuring neutron-induced s-process cross-sections can offer valuable insights into the s-process in AGB stars. One such reaction is 18O(p,n)18F, which can be used to mimic s-process cross-sections, as its neutron energy distribution is similar to the thermal flux distribution at the stellar environment where the 13C($α$,n)16O reaction (kT = 8 keV) takes place. Heil et al. showed that the neutron energy spectrum emitted from the 18O(p,n)18F reaction at proton energy of Ep = 2582 keV, close to the reaction threshold of 2574 keV, closely resembles a Maxwellian flux with kT$\approx$5 keV. This experiment was repeated at PTB and a computational tool, OxyGen, was created to calculate the expected neutron energy spectrum and angular distribution at a planned liquid water-based 18O target at SARAF. OxyGen was incorporated into Geant4 transport simulations, which were compared to experimental results. Overall, the findings of this thesis demonstrate that the OxyGen simulation provides a reliable prediction of the neutron energy spectrum for the 18O(p,n)18F reaction and can be effectively used to plan and analyze future experiments at SARAF for different proton energies.

Driving is a Game: Combining Planning and Prediction with Bayesian Iterative Best Response

Authors:Aron Distelzweig, Yiwei Wang, Faris Janjoš, Marcel Hallgarten, Mihai Dobre, Alexander Langmann, Joschka Boedecker, Johannes Betz
Date:2025-12-03 16:33:25

Autonomous driving planning systems perform nearly perfectly in routine scenarios using lightweight, rule-based methods but still struggle in dense urban traffic, where lane changes and merges require anticipating and influencing other agents. Modern motion predictors offer highly accurate forecasts, yet their integration into planning is mostly rudimental: discarding unsafe plans. Similarly, end-to-end models offer a one-way integration that avoids the challenges of joint prediction and planning modeling under uncertainty. In contrast, game-theoretic formulations offer a principled alternative but have seen limited adoption in autonomous driving. We present Bayesian Iterative Best Response (BIBeR), a framework that unifies motion prediction and game-theoretic planning into a single interaction-aware process. BIBeR is the first to integrate a state-of-the-art predictor into an Iterative Best Response (IBR) loop, repeatedly refining the strategies of the ego vehicle and surrounding agents. This repeated best-response process approximates a Nash equilibrium, enabling bidirectional adaptation where the ego both reacts to and shapes the behavior of others. In addition, our proposed Bayesian confidence estimation quantifies prediction reliability and modulates update strength, more conservative under low confidence and more decisive under high confidence. BIBeR is compatible with modern predictors and planners, combining the transparency of structured planning with the flexibility of learned models. Experiments show that BIBeR achieves an 11% improvement over state-of-the-art planners on highly interactive interPlan lane-change scenarios, while also outperforming existing approaches on standard nuPlan benchmarks.

Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties

Authors:Vineel Tummala, Daniela Inclezan
Date:2025-12-03 16:29:09

This paper presents a logic programming-based framework for policy-aware autonomous agents that can reason about potential penalties for non-compliance and act accordingly. While prior work has primarily focused on ensuring compliance, our approach considers scenarios where deviating from policies may be necessary to achieve high-stakes goals. Additionally, modeling non-compliant behavior can assist policymakers by simulating realistic human decision-making. Our framework extends Gelfond and Lobo's Authorization and Obligation Policy Language (AOPL) to incorporate penalties and integrates Answer Set Programming (ASP) for reasoning. Compared to previous approaches, our method ensures well-formed policies, accounts for policy priorities, and enhances explainability by explicitly identifying rule violations and their consequences. Building on the work of Harders and Inclezan, we introduce penalty-based reasoning to distinguish between non-compliant plans, prioritizing those with minimal repercussions. To support this, we develop an automated translation from the extended AOPL into ASP and refine ASP-based planning algorithms to account for incurred penalties. Experiments in two domains demonstrate that our framework generates higher-quality plans that avoid harmful actions while, in some cases, also improving computational efficiency. These findings underscore its potential for enhancing autonomous decision-making and informing policy refinement. Under consideration in Theory and Practice of Logic Programming (TPLP).

Hierarchical Vision Language Action Model Using Success and Failure Demonstrations

Authors:Jeongeun Park, Jihwan Yoon, Byungwoo Jeon, Juhan Park, Jinwoo Shin, Namhoon Cho, Kyungjae Lee, Sangdoo Yun, Sungjoon Choi
Date:2025-12-03 15:58:38

Prior Vision-Language-Action (VLA) models are typically trained on teleoperated successful demonstrations, while discarding numerous failed attempts that occur naturally during data collection. However, these failures encode where and how policies can be fragile, information that can be exploited to improve robustness. We address this problem by leveraging mixed-quality datasets to learn failure-aware reasoning at planning time. We introduce VINE, a hierarchical vision-language-action model that separates high-level reasoning (System 2) from low-level control (System 1) under a hierarchical reinforcement learning formalism, making failures usable as a structured learning signal rather than noisy supervision. System 2 performs feasibility-guided tree search over a 2D scene-graph abstraction: it proposes subgoal transitions, predicts success probabilities from both successes and failures, and prunes brittle branches before execution, effectively casting plan evaluation as feasibility scoring. The selected subgoal sequence is then passed to System 1, which executes low-level actions without modifying the agent's core skills. Trained entirely from offline teleoperation data, VINE integrates negative experience directly into the decision loop. Across challenging manipulation tasks, this approach consistently improves success rates and robustness, demonstrating that failure data is an essential resource for converting the broad competence of VLAs into robust execution.

A Modular Architecture Design for Autonomous Driving Racing in Controlled Environments

Authors:Brais Fontan-Costas, M. Diaz-Cacho, Ruben Fernandez-Boullon, Manuel Alonso-Carracedo, Javier Perez-Robles
Date:2025-12-03 15:36:46

This paper presents an Autonomous System (AS) architecture for vehicles in a closed circuit. The AS performs precision tasks including computer vision for environment perception, positioning and mapping for accurate localization, path planning for optimal trajectory generation, and control for precise vehicle actuation. Each subsystem operates independently while connecting data through a cohesive pipeline architecture. The system implements a modular design that combines state-of-the-art technologies for real-time autonomous navigation in controlled environments.

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

Authors:Tai Nguyen, Phong Le, André Biedenkapp, Carola Doerr, Nguyen Dang
Date:2025-12-03 13:54:41

Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges in algorithm configuration. However, applying RL to DAC is challenging and often requires extensive domain expertise. We conduct a comprehensive study of deep-RL algorithms in DAC through a systematic analysis of controlling the population size parameter of the (1+($λ$,$λ$))-GA on OneMax instances. Our investigation of DDQN and PPO reveals two fundamental challenges that limit their effectiveness in DAC: scalability degradation and learning instability. We trace these issues to two primary causes: under-exploration and planning horizon coverage, each of which can be effectively addressed through targeted solutions. To address under-exploration, we introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN agent exploration, eliminating the need for instance-specific hyperparameter tuning and ensuring consistent effectiveness across different problem scales. In dealing with the planning horizon coverage problem, we demonstrate that undiscounted learning effectively resolves it in DDQN, while PPO faces fundamental variance issues that necessitate alternative algorithmic designs. We further analyze the hyperparameter dependencies of PPO, showing that while hyperparameter optimization enhances learning stability, it consistently falls short in identifying effective policies across various configurations. Finally, we demonstrate that DDQN equipped with our adaptive reward shifting strategy achieves performance comparable to theoretically derived policies with vastly improved sample efficiency, outperforming prior DAC approaches by several orders of magnitude.