Kung and colleagues tested ChatGPT performance on the USMLE, a highly standardized and regulated series of three exams (steps one, 2CK, and three) required for medical licensure in the United States.ChatGPT is designed to produce handwriting that resembles a person’s handwriting by predicting future sequences of words.
Popular open source AI software, ChatGPT, can pass the United States Medical Licensing Exam (USMLE) with answers that make coherent, internal sense and contain frequent insights, a study by AnsibleHealth has revealed.
According to the study, ChatGPT can reach the threshold of about 60% to meet the medical examination in the US. The study, written by Tiffany Kung, Victor Tseng and colleagues at AnsibleHealth, was published on February 9, 2023 in the open access journal PLOS Digital Health.
ChatGPT is designed to create a font that resembles a person’s handwriting by predicting future sequences of words. However, this software is not able to perform online searches, unlike most chatbots. Instead, it creates text based on word relationships that are predicted by internal processes.
Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardized and regulated series of three exams (steps one, 2CK, and three) required for medical licensure in the United States. The exam assesses knowledge from most medical disciplines, from biochemistry to diagnostic reasoning to bioethics.
After screening to remove image-based questions, the authors tested the software on 350 of the 376 public questions available from the June 2022 edition of the USMLE. ChatGPT scored between 52.4% and 75.0% on three USMLE exams. The cutoff for graduation each year is around 60%, the study said.
In addition, ChatGPT also demonstrated 94.6% agreement across all of its responses and generated at least one significant insight (something that was novel, non-obvious, and clinically valid) for 88.9% of its responses. Notably, ChatGPT outperformed PubMedGPT, a counterpart model trained exclusively on literature from the biomedical domain, which achieved 50.8% on the older dataset of USMLE-style questions.
While the relatively small entry size limited the depth and scope of the analyses, the authors note that their findings provide insight into the potential of ChatGPT to improve medical education and ultimately clinical practice. For example, AnsibleHealth doctors already use ChatGPT to rewrite jargon-heavy messages to make it easier for patients to understand, they added. “Achieving a score to pass this notoriously difficult professional exam, without any human augmentation, marks a significant milestone in the clinical maturation of AI,” the authors said.
Author Tiffany Kung added that ChatGPT’s role in this research goes beyond being the subject of the study: “ChatGPT made a significant contribution to the writing of [our] manuscript… We communicated with ChatGPT much like a colleague and asked him to synthesize, simplify, and offered counterpoints to ongoing proposals…All co-authors appreciated ChatGPT’s contribution.”