Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery

POLAT, EMRE; Polat, Yagmur; ŞENTÜRK, EROL; DOĞAN, REMZİ; YENİGÜN, ALPER; Tugrul, SELAHATTİN; EREN, SABRİ; AKSOY, MEHMET; ÖZTURAN, ORHAN

doi:10.1016/j.ijporl.2024.111998

Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery

POLAT E., Polat Y. B., ŞENTÜRK E., DOĞAN R., YENİGÜN A., Tugrul S., ...Daha Fazla

International Journal of Pediatric Otorhinolaryngology, cilt.181, 2024 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 181
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.ijporl.2024.111998
Dergi Adı: International Journal of Pediatric Otorhinolaryngology
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, CAB Abstracts, EMBASE, Veterinary Science Database
Anahtar Kelimeler: Accuracy, ChatGPT, Comprehensiveness, Parental education, Pediatric otorhinolaryngology, Readability
Bezmiâlem Vakıf Üniversitesi Adresli: Evet

Özet

Objectives: This study examined the potential of ChatGPT as an accurate and readable source of information for parents seeking guidance on adenoidectomy, tonsillectomy, and ventilation tube insertion surgeries (ATVtis). Methods: ChatGPT was tasked with identifying the top 15 most frequently asked questions by parents on internet search engines for each of the three specific surgical procedures. We removed repeated questions from the initial set of 45. Subsequently, we asked ChatGPT to generate answers to the remaining 33 questions. Seven highly experienced otolaryngologists individually assessed the accuracy of the responses using a four-level grading scale, from completely incorrect to comprehensive. The readability of responses was determined using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. The questions were categorized into four groups: Diagnosis and Preparation Process, Surgical Information, Risks and Complications, and Postoperative Process. Responses were then compared based on accuracy grade, FRE, and FKGL scores. Results: Seven evaluators each assessed 33 AI-generated responses, providing a total of 231 evaluations. Among the evaluated responses, 167 (72.3 %) were classified as ‘comprehensive.’ Sixty-two responses (26.8 %) were categorized as ‘correct but inadequate,’ and two responses (0.9 %) were assessed as ‘some correct, some incorrect.’ None of the responses were adjudged ‘completely incorrect’ by any assessors. The average FRE and FGKL scores were 57.15(±10.73) and 9.95(±1.91), respectively. Upon analyzing the responses from ChatGPT, 3 (9.1 %) were at or below the sixth-grade reading level recommended by the American Medical Association (AMA). No significant differences were found between the groups regarding readability and accuracy scores (p > 0.05). Conclusions: ChatGPT can provide accurate answers to questions on various topics related to ATVtis. However, ChatGPT's answers may be too complex for some readers, as they are generally written at a high school level. This is above the sixth-grade reading level recommended for patient information by the AMA. According to our study, more than three-quarters of the AI-generated responses were at or above the 10th-grade reading level, raising concerns about the ChatGPT text's readability.