Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery


POLAT E., Polat Y. B., ŞENTÜRK E., DOĞAN R., YENİGÜN A., Tugrul S., ...More

International Journal of Pediatric Otorhinolaryngology, vol.181, 2024 (SCI-Expanded, Scopus) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 181
  • Publication Date: 2024
  • Doi Number: 10.1016/j.ijporl.2024.111998
  • Journal Name: International Journal of Pediatric Otorhinolaryngology
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, CAB Abstracts, EMBASE, Veterinary Science Database
  • Keywords: Accuracy, ChatGPT, Comprehensiveness, Parental education, Pediatric otorhinolaryngology, Readability
  • Bezmialem Vakıf University Affiliated: Yes

Abstract

Objectives: This study examined the potential of ChatGPT as an accurate and readable source of information for parents seeking guidance on adenoidectomy, tonsillectomy, and ventilation tube insertion surgeries (ATVtis). Methods: ChatGPT was tasked with identifying the top 15 most frequently asked questions by parents on internet search engines for each of the three specific surgical procedures. We removed repeated questions from the initial set of 45. Subsequently, we asked ChatGPT to generate answers to the remaining 33 questions. Seven highly experienced otolaryngologists individually assessed the accuracy of the responses using a four-level grading scale, from completely incorrect to comprehensive. The readability of responses was determined using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. The questions were categorized into four groups: Diagnosis and Preparation Process, Surgical Information, Risks and Complications, and Postoperative Process. Responses were then compared based on accuracy grade, FRE, and FKGL scores. Results: Seven evaluators each assessed 33 AI-generated responses, providing a total of 231 evaluations. Among the evaluated responses, 167 (72.3 %) were classified as ‘comprehensive.’ Sixty-two responses (26.8 %) were categorized as ‘correct but inadequate,’ and two responses (0.9 %) were assessed as ‘some correct, some incorrect.’ None of the responses were adjudged ‘completely incorrect’ by any assessors. The average FRE and FGKL scores were 57.15(±10.73) and 9.95(±1.91), respectively. Upon analyzing the responses from ChatGPT, 3 (9.1 %) were at or below the sixth-grade reading level recommended by the American Medical Association (AMA). No significant differences were found between the groups regarding readability and accuracy scores (p > 0.05). Conclusions: ChatGPT can provide accurate answers to questions on various topics related to ATVtis. However, ChatGPT's answers may be too complex for some readers, as they are generally written at a high school level. This is above the sixth-grade reading level recommended for patient information by the AMA. According to our study, more than three-quarters of the AI-generated responses were at or above the 10th-grade reading level, raising concerns about the ChatGPT text's readability.