Artificial Intelligence Chatbots in Peritoneal Dialysis Education: A Cross-Sectional Comparative Study of Quality, Readability, and Reliability


Onan E., Bozaci İ., DELİGÖZ BİLDACI Y., Karakaya S. P. Y., Kozanoglu R., KAZANCIOĞLU R.

Journal of Clinical Medicine, cilt.15, sa.2, 2026 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 15 Sayı: 2
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/jcm15020692
  • Dergi Adı: Journal of Clinical Medicine
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, EMBASE
  • Anahtar Kelimeler: artificial intelligence chatbots, patient education, peritoneal dialysis, readability and reliability
  • Bezmiâlem Vakıf Üniversitesi Adresli: Evet

Özet

Background: Peritoneal dialysis (PD) remains underutilized worldwide, partly due to limited patient education, misconceptions, and barriers to accessing reliable health information. Artificial intelligence (AI)-based chatbots have emerged as promising tools for improving health literacy, supporting shared decision-making, and enhancing patient engagement. However, concerns regarding content quality, reliability, and readability persist, and no study to date has systematically evaluated AI-generated content in the context of PD. Therefore, this study aimed to systematically evaluate the quality, reliability, and readability of AI-generated educational content on peritoneal dialysis using multiple large language model-based chatbots. Methods: A total of 45 frequently asked questions about PD were developed by nephrology experts and categorized into three domains: general information (n = 15), technical and clinical issues (n = 21), and myths/misconceptions (n = 9). Three AI-based chatbots, Gemini Pro 2.5, ChatGPT-5, and LLaMA Maverick 4, were prompted to generate responses to all questions. Each response was independently evaluated by two blinded reviewers for textual characteristics, readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL), and content quality/reliability using the Ensuring Quality Information for Patients (EQIP) tool and the Modified DISCERN instrument. Results: Across all domains, significant differences were observed among the chatbots. Gemini Pro 2.5 achieved higher Flesch Reading Ease (FRES) scores (32.6 ± 10.5) compared with ChatGPT-5 (24.2 ± 11.7) and LLaMA Maverick 4 (16.2 ± 7.5; p < 0.001), as well as higher EQIP scores (75.4% vs. 59.4% and 61.5%, respectively; p < 0.001) and Modified DISCERN scores (4.0 [4.0–4.5] vs. 3.0 [3.0–3.5] and 3.0 [2.5–3.5]; p < 0.001). ChatGPT-5 demonstrated intermediate performance, while LLaMA Maverick 4 showed lower scores across evaluated metrics. Conclusions: These findings demonstrate differences among AI-based chatbots in readability, content quality, and reliability when responding to identical peritoneal dialysis–related questions. While AI chatbots may support health literacy and complement clinical decision-making, their outputs should be interpreted with caution and under appropriate clinical oversight. Future research should focus on multilingual, multicenter, and outcome-based studies to ensure the safe integration of AI into PD patient education.