Artificial Intelligence (AI) models may not be immune to aging after all. A new study published in the BMJ suggests that leading large language models (LLMs) and chatbots display signs of cognitive decline as they “age,” much like humans. The research comes amid growing reliance on AI for medical advice, sparking fresh concerns about the technology’s reliability in critical areas.
Researchers assessed the cognitive abilities of several prominent AI models—ChatGPT versions 4 and 4o, Claude 3.5 “Sonnet” by Anthropic, and Gemini versions 1 and 1.5 by Alphabet using the Montreal Cognitive Assessment (MoCA) test. Traditionally used to detect early dementia and cognitive impairment in humans, the MoCA test evaluates skills such as attention, memory, language, spatial reasoning, and executive functions.
In humans, a passing score on the MoCA test is 26 out of 30. However, among the AI models tested, only ChatGPT 4o achieved the required 26 points. ChatGPT 4 and Claude fell slightly short, scoring 25 points each. The lowest performer was Gemini 1.0, which managed just 16 points, underscoring potential deficiencies in its cognitive processing.
The study revealed that all chatbots struggled with visuospatial skills and executive tasks. For instance, they performed poorly on the trail-making task (connecting numbers and letters in ascending order) and the clock-drawing test (creating a clock face showing a specific time). Gemini models particularly failed in the delayed recall task, unable to remember a five-word sequence.
Interestingly, the patterns of impairment resembled those seen in human patients with posterior cortical atrophy, a rare variant of Alzheimer’s disease. This has prompted researchers to caution against the assumption that AI will soon replace human doctors in medical diagnostics.
“These findings challenge the perception that artificial intelligence will surpass human doctors any time soon,” the study stated. “The cognitive impairments shown by these leading AI models could undermine their reliability in providing medical advice and affect patient confidence.”
The researchers also speculated that neurologists may not have to worry about being replaced by LLMs, but instead may find themselves treating new “virtual patients” the AI models themselves.
This revelation comes at a time when users are increasingly turning to AI tools for medical advice due to their ability to simplify complex jargon. However, with evidence of AI’s cognitive limitations now emerging, it remains to be seen how developers and the healthcare sector will address the challenge of “aging” AI models.