The Personalization Paradox: Semantic Loss Vs. Reasoning Gains in Agentic AI Q & A
DOI:
https://doi.org/10.37256/ccds.7220269566Keywords:
personalization, knowledge access systems, retrieval augmented generation, large language models, semantic similarity, agentic Artificial Intelligence (AI), academic advisingAbstract
This study examines how personalization in agentic retrieval-augmented Artificial Intelligence (AI) systems influences the quality of answers delivered in institutional knowledge access settings such as academic advising. Prior advising and knowledge-access systems typically assume personalization is universally beneficial, yet little empirical evidence evaluates how it alters information quality. This paper addresses this gap by analyzing personalization as an independent factor within a Retrieval-Augmented Generation Large Language Model (RAG LLM) used for student advising. The study evaluates ten system configurations across personalized and non-personalized conditions using twelve authentic advising questions intentionally designed for lexical strictness. Performance was assessed using lexical metrics (Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE)-L), semantic similarity measures (Metric for Evaluation of Translation with Explicit ORdering (METEOR), BERTScore), and reasoning/grounding metrics from the Retrieval Augmented Generation Assessment (RAGAs) framework. A Linear Mixed-effects Model (LMM) was used to quantify main effects and interactions. Personalization in agentic AI does not yield uniform gains; instead, it creates a critical trade-off where factors that significantly improve reasoning quality also incur a statistically significant penalty on semantic similarity. Specifically, personalized configurations produced a statistically significant decrease in BERTScore (0.841 vs. 0.848, p < 0.0001) alongside a simultaneous and significant improvement in METEOR (0.361 vs. 0.251, Δ = + 0.110, p < 0.0001), together demonstrating the metric-dependent nature of the trade-off. Grounding and reasoning metrics simultaneously improved, with Faithfulness rising from 0.655 to 0.711 (p = 0.0135), further supporting that personalization enhances answer quality even as it is penalized by semantic similarity metrics. Personalization decreases semantic similarity scores, not due to quality loss but because generic semantic metrics penalize beneficial user-specific deviations. The configuration that applied personalization redundantly across all three stages: setting the AI’s role, guiding document retrieval, and conditioning the final response generation, achieved the best overall results, confirming that fully integrated user-specific adaptation yields the most effective balance between reasoning gains and semantic penalties.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Satyajit Movidi, Stephen Russell

This work is licensed under a Creative Commons Attribution 4.0 International License.
