Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics

Authors

  • Nicolás Dufey-Portilla Universidad Andres Bello Department of Endodontics, School of Dentistry, Viña del Mar https://orcid.org/0000-0001-5922-2757
  • Ana Billik Frisman Autonomous research, Viña del Mar
  • Maximiliano Gallardo Robles Autonomous research, Viña del Mar
  • Fernando Peña-Bengoa Universidad Andres Bello Department of Endodontics, School of Dentistry, Viña del Mar
  • Consuelo Cabrera Ávila Universidad Andres Bello Department of Endodontics, School of Dentistry, Viña del Mar
  • Venkateshbabu Nagendrababu University of Sharjah, College of Dental Medicine, Department of Restorative Dentistry, Sharja
  • Paul M. H Dummer Cardiff University, College of Biomedical and Life Sciences, School of Dentistry, Cardiff https://orcid.org/0000-0002-0726-7467
  • Marc Garcia-Font Universitat International de Catalunya, School of Dentistry, Department of Endodontics. Sant Cugat del Valles, Barcelona https://orcid.org/0000-0002-1280-9611
  • Francesc Abella Sans Universitat International de Catalunya, School of Dentistry, Department of Endodontics. Sant Cugat del Valles, Barcelona https://orcid.org/0000-0002-3500-3039

DOI:

https://doi.org/10.1590/

Keywords:

Artificial intelligence, ChatGPT, Endodontics, Google Gemini, Large language models

Abstract

Artificial intelligence (AI) is transforming access to dental information via large language models (LLMs) such as ChatGPT and Google Gemini. Both models are increasingly being used in endodontics as a source of information for patients. Therefore, as developers release new versions, the validity of their responses must be continuously compared to professional consultations. Objective: This study aimed to evaluate the validity of the responses provided by the most advanced LLMs [Google Gemini Advanced (GGA) and ChatGPT-4o] to frequently asked questions (FAQs) in endodontics. Methodology: A cross-sectional analytical study was conducted in five phases. The top 20 endodontic FAQs submitted by users to chatbots and collected from Google Trends were compiled. In total, nine academically certified endodontic specialists with educational roles scored GGA and ChatGPT-4o responses to the FAQs using a five-point Likert scale. Validity was determined using high (4.5-5) and low (≥4) thresholds. The Fisher's exact test was used for comparative analysis. Results: At the low threshold, both models obtained 95% validity (95% CI: 75.1%- 99.9%; p=.05). At the high threshold, ChatGPT-4o achieved 35% (95% CI: 15.4%- 59.2%) and GGA, 40% (95% CI: 19.1%- 63.9%) validity (p=1). Conclusions: ChatGPT-4o and GGA responses showed high validity under lenient criteria that significantly decreased under stricter thresholds, limiting their reliability as a stand-alone source of information in endodontics. While AI chatbots show promise to improve patient education in endodontics, their validity limitations under rigorous evaluation highlight the need for careful professional monitoring.

Downloads

Download data is not yet available.

References

Caballero Alarcón FA, Brítez Carli R. Inteligencia artificial en el mejoramiento de la enseñanza y aprendizaje. Academo Rev Investig Ciencias Soc Human. 2024;11(2):99-108. doi: 10.30545/academo.2024.may-ago.1

» https://doi.org/10.30545/academo.2024.may-ago.1

Xu Y, Liu X, Cao X, Huang C, Liu E, Qian S, et al. Artificial intelligence: a powerful paradigm for scientific research. Innovation. 2021;2(4):100179. doi: 10.1016/j.xinn.2021.100179

» https://doi.org/10.1016/j.xinn.2021.100179

OpenAI. OpenAI Blog [Internet]. 2023 [cited 2025 Apr 22]. Available from: https://openai.com/blog

» https://openai.com/blog

Zhao H, Chen H, Yang F, Liu N, Deng H, Cai H, et al. Explainability for large language models: a survey. ACM Trans Intell Syst Technol. 2024;15(2). doi: 10.1145/3639372

» https://doi.org/10.1145/3639372

Gemini Team. Gemini: a family of highly capable multimodal models. arXiv:2312.11805 [cs.CL]. 2023 Dec 19 [updated 2025 May 9; cited 2025 May 20]. Available from: https://doi.org/10.48550/arXiv.2312.11805

» https://doi.org/10.48550/arXiv.2312.11805

Aminoshariae A, Kulild J, Nagendrababu V. Artificial intelligence in endodontics: Current applications and future directions. J Endod. 2021;47(1): 1352-7. doi: 10.1016/j.joen.2020.10.003

» https://doi.org/10.1016/j.joen.2020.10.003

Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus. 2023;15(5). doi:10.7759/cureus.38632

» https://doi.org/10.7759/cureus.38632

Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: evaluating the consistency and accuracy of endodontic question answers. Int Endod J. 2024;57(1):108-13. doi: /10.1111/iej.13998

» https://doi.org//10.1111/iej.13998

Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PM, Nosrat A. Validity and reliability of artificial intelligence chatbots as public sources of information on Endodontics. Int Endod J. 2024;57(3): 305-14. doi: /10.1111/iej.14014

» https://doi.org//10.1111/iej.14014

Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6. doi: /10.3389/frai.2023.1169595

» https://doi.org//10.3389/frai.2023.1169595

OpenAI. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]. 2023 Mar 15 [updated 2024 Mar 4; cited 2025 May 20]. Available from: https://doi.org/10.48550/arXiv.2303.08774

» https://doi.org/10.48550/arXiv.2303.08774

OpenAI. Hello GPT-4o [Internet]. [cited 2025 Apr 22]. Available from: https://openai.com/index/hello-gpt-4o/

» https://openai.com/index/hello-gpt-4o/

Moulaei K, Yadegari A, Baharestani M, Farzanbakhsh S, Sabet B, Afrash MR. Generative artificial intelligence in healthcare: a scoping review on benefits, challenges, and applications. Int J Med Inform. 2024;188:105474. doi: /10.1016/j.ijmedinf.2024.105474

» https://doi.org//10.1016/j.ijmedinf.2024.105474

Portilla ND, Garcia-Font M, Nagendrababu V, Abbott PV, Sanchez JA, Abella F. Accuracy and consistency of Gemini responses regarding the management of traumatized permanent teeth. Dent Traumatol. 2024;41:171-7. doi: 10.1111/edt.13004

» https://doi.org/10.1111/edt.13004

Aminoshariae A, Nosrat A, Nagendrababu V, Dianat O, Mohammad-Rahimi H, O’Keefe AW, et al. Artificial intelligence in endodontic education. J Endod. 2024;50(2): 562-78. doi: 10.1016/j. joen.2023.12.007

» https://doi.org/10.1016/j.joen.2023.12.007

Morishita M, Fukuda H, Muraoka K, Nakamura T, Hayashi M, Yoshioka I, et al. Evaluating GPT-4V's performance in the Japanese national dental examination: A challenge explored. J Dent Sci. 2024;19(3):1595-600. doi: /10.1016/j.jds.2023.12.007

» https://doi.org//10.1016/j.jds.2023.12.007

Danesh A, Danesh A, Danesh F. Innovating dental diagnostics: ChatGPT's accuracy on diagnostic challenges. Oral Dis. 2024;31(3):911- 7. doi: /10.1111/odi.15082

» https://doi.org//10.1111/odi.15082

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44-56. doi: /10.1038/s41591-018-0300-7

» https://doi.org//10.1038/s41591-018-0300-7

Johnson AJ, Singh TK, Gupta A, Sankar H, Gill I, Shalini M, et al. Evaluation of validity and reliability of AI chatbots as public sources of information on dental trauma. Dent Traumatol. 2024;40(1):1-7. doi: /10.1111/edt.13000

» https://doi.org//10.1111/edt.13000

Sismanoglu S, Capan BS. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans? BMC Med Educ. 2025;25(1):214. doi:10.1186/s12909-024-06389-9

» https://doi.org/10.1186/s12909-024-06389-9

Hwang TJ, Kesselheim AS, Vokinger KN. Lifecycle regulation of artificial intelligence- and machine learning-based software devices in medicine. JAMA. 2019;322(13):1286-7. doi: /10.1001/jama.2019.13163

» https://doi.org//10.1001/jama.2019.13163

Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8. doi: /10.7861/futurehosp.6-2-94

» https://doi.org//10.7861/futurehosp.6-2-94

Ourang SA, Sohrabniya F, Mohammad-Rahimi H, Dianat O, Aminoshariae A, Nagendrababu V, et al. Artificial intelligence in endodontics: fundamental principles, workflow, and tasks. Int Endod J. 2024;57(11):1546-65. doi: /10.1111/iej.14020

» https://doi.org//10.1111/iej.14020

Singh S, Asthana G. Artificial intelligence: a futuristic tool for advanced endodontics. J Conserv Dent Endod. 2024;27(5):447-8. doi: 10.4103/JCDE.JCDE_171_24

» https://doi.org/10.4103/JCDE.JCDE_171_24

Mokrane S, Siad. The promise and perils of Google's Bard for scientific research [Internet]. 2023 [cited 2025 Apr 22]. Available from: doi: /10.17613/yb4n-mc79

» https://doi.org//10.17613/yb4n-mc79

Setzer F, Li J, Khan A. The use of artificial intelligence in endodontics. J Dent Res. 2024;103(9):853-862. doi: /10.1177/00220345241255593

» https://doi.org//10.1177/00220345241255593

Di Battista M, Kernitsky J, Dibart S. Artificial intelligence chatbots in patient communication: Current possibilities. Int J Periodontics Restor Dent. 2024;44(6):731-8. doi: 10.11607/prd.6925

» https://doi.org/10.11607/prd.6925

Gondode P, Duggal S, Garg N, Sethupathy S, Asai O, Lohakare P. Comparing patient education tools for chronic pain medications: artificial intelligence chatbot versus traditional patient information leaflets. Indian J Anaesth. 2024;68(7):631-6. doi: 10.4103/ija. ija_204_24

» https://doi.org/10.4103/ija.ija_204_24

Uribe SE, Maldupa I, Kavadella A, El Tantawi M, Chaurasia A, Fontana M, et al. Artificial intelligence chatbots and large language models in dental education: Worldwide survey of educators. Eur J Dent Educ. 2024;28:865-76. doi: /10.1111/eje.12900

» https://doi.org//10.1111/eje.12900

Downloads

Published

2025-09-25

Issue

Section

Original Articles

How to Cite

Dufey-Portilla, N., Frisman, A. B., Gallardo Robles, M., Peña-Bengoa, F., Cabrera Ávila, C., Nagendrababu, V., Dummer, P. M. H., Garcia-Font, M., & Abella Sans, F. (2025). Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics. Journal of Applied Oral Science, 33, e20250321. https://doi.org/10.1590/