Assessing the Accuracy of ChatGPT’s Answers to Basic Questions on Uterine and Cervical Cancers

Malik, Aqdas; Suleiman, Rashida; Bala, Saria; Ibrahim, Shima; Al Kalbani, Moza; Al-Busaidi, Hilal; Burney, Ikram

doi:10.5001/omj.2025.106

Assessing the Accuracy of ChatGPT’s Answers to Basic Questions on Uterine and Cervical Cancers

Malik A., Suleiman R., Bala S., Ibrahim S., Al Kalbani M., Al-Busaidi H., ...Daha Fazla

Oman Medical Journal, cilt.41, sa.1, 2026 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 41 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.5001/omj.2025.106
Dergi Adı: Oman Medical Journal
Derginin Tarandığı İndeksler: Scopus, CINAHL, EMBASE, Directory of Open Access Journals
Anahtar Kelimeler: Artificial Intelligence, Cervix Uteri, ChatGPT, Gynecologic Neoplasms, Large Language Models, Uterus
Bezmiâlem Vakıf Üniversitesi Adresli: Evet

Özet

Objectives: Artificial intelligence (AI) platforms based on large language models such as ChatGPT, are increasingly being used by both the public and medical professionals to obtain medical information. This rapid growth in reliance makes it essential to systematically evaluate the accuracy and clinical reliability of AI-generated medical content. The objective of this study was to evaluate the accuracy of responses provided by ChatGPT regarding prevention, screening, treatment, and risk factors of common gynecological cancers. The assessment focused primarily on the use of ChatGPT by primary care providers and the public with limited subject-specific knowledge. Methods: We evaluated the reliability of ChatGPT (version 3.5) in answering questions about two of the most common gynecological cancers. ChatGPT was posed a total of 40 questions on the prevention, screening, and treatment of endometrial cancer (20 questions) and cervical cancer (20 questions). Responses were independently reviewed and categorized as accurate, inadequate, or inaccurate by five physicians with a mean of 18 ± 3 years of experience in gynecological oncology. Reviewers provided reasons for deeming some responses as inadequate or inaccurate. Results: Overall, 20 out of 40 (50%) responses by ChatGPT 3.5 were regarded as either inaccurate or inadequate. Most of the deficient responses were related to questions on the treatment of the two cancers, while responses to questions about prevention were mostly accurate. Conclusions: ChatGPT may provide accurate information about prevention of gynecological cancers, but the public and health professionals should not rely on its responses to make medical decisions, as many responses in this domain were inadequate or inaccurate. Consultation with qualified physicians or specialists is essential for individualized decision-making. Medical information sourced from AI tools such as ChatGPT should be integrated with clinician oversight to improve reliability.