Accuracy, Readability, Understandability, and Actionability of ChatGPT Responses to Patient Questions on Systemic Isotretinoin Treatment

Küçük, Ruveyda; Güngör, Cemre; Topal, İlteriş; KÜÇÜK, RAMAZAN; Karadağ, Ayşe

doi:10.4103/idoj.idoj_1243_24

Accuracy, Readability, Understandability, and Actionability of ChatGPT Responses to Patient Questions on Systemic Isotretinoin Treatment

Küçük R., Güngör C. B., Topal İ. O., KÜÇÜK R. B., Karadağ A. S.

Indian Dermatology Online Journal, cilt.17, sa.1, ss.27-32, 2026 (ESCI, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 17 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.4103/idoj.idoj_1243_24
Dergi Adı: Indian Dermatology Online Journal
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus
Sayfa Sayıları: ss.27-32
Anahtar Kelimeler: Artificial intelligence, ChatGPT, dermatology, isotretinoin, patient queries
Bezmiâlem Vakıf Üniversitesi Adresli: Evet

Özet

Objective: This study evaluated the accuracy, readability, understandability, and actionability of ChatGPT-3.5 responses to common patient questions about systemic isotretinoin therapy. Materials and Methods: Thirty questions were developed in five categories (drug information, side effects, pregnancy, daily life, and course of treatment) based on resources from the British Association of Dermatologists and the Turkish Dermatology Association. Questions were presented to ChatGPT-3.5, and responses were evaluated using a four-point Likert scale for accuracy, the Flesch–Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) for readability, and the Patient Education Assessment Tool for Printed Materials (PEMAT-P) for understandability and actionability. Results: Of the 90 evaluations, 44.4% of responses were comprehensive and correct, 18.8% were correct but insufficient, 32.2% were mixed with outdated data, and 4.4% were completely incorrect. The average FKGL was 13.28 ± 2.38, and the FRE score was 29.34 ± 10.4, indicating a college graduate reading level. PEMAT-P scores for understandability and actionability averaged 48.1% and 35.06%, respectively, falling below the 70% threshold. The “daily life” section had the highest scores for both metrics, while “pregnancy and contraception” scored the lowest. Limitations: This study was limited to ChatGPT-3.5, conducted in English, and based on training data available only up to 2021, which may affect the generalizability and currency of the results. Conclusion: While ChatGPT-3.5 shows potential as a patient education tool, it struggles to provide accurate, readable, and actionable information on systemic isotretinoin therapy. Its use requires supervision, and further refinement of artificial intelligence tools is needed to improve their utility in healthcare settings.