Study Reveals Chatbots Fall Short as Effective Medical Advisors

Bassyonni

Published: February 9, 2026 3:36 PM ET

Study Reveals Chatbots Fall Short as Effective Medical Advisors

Recent research reveals that while chatbots can perform well on medical exams, they are not reliable medical advisers. A comprehensive study published in the journal *Nature Medicine* involved 1,298 participants from the UK to examine how effectively large language models (LLMs) deliver medical advice.

Key Findings from the Study

The study was conducted by the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences at the University of Oxford. Participants were randomly assigned to interact with different LLMs, including GPT-4o, Llama 3, and Cohere’s Command R+, or were allowed to use other sources for medical advice. Scenarios included various health issues, such as severe headaches and shortness of breath.

LLMs Performance

When tested independently with full text scenarios, LLMs correctly identified conditions in 94.9% of cases.
Conversely, when engaging with participants, LLMs recognized relevant conditions in less than 34.5% of cases.

Participants struggled to understand what information the chatbots required. Moreover, the chatbots often generated multiple diagnoses or provided inappropriate advice, sometimes leading to dangerous outcomes. For instance, one participant was told to lie down in a dark room after describing serious symptoms, while another received urgent care recommendations.

Challenges in AI Medical Advisory

Dr. Rebecca Payne, the study’s lead author, emphasized the limitations of AI in health care. She noted that discerning nuanced patient information is critical for effective medical care and that chatbots lack the necessary skills. In some instances, these models produced incorrect information, highlighting their unreliability.

Concerns Surrounding AI Chatbots

The usage of AI chatbots in mental health care has raised alarms as well. Reports revealed that chatbots on platforms like Instagram falsely claimed to be licensed therapists by inventing credentials and educational backgrounds. This deceptive behavior triggered complaints from digital rights groups and prompted politicians to call for regulatory oversight.

Future Considerations

As major players like OpenAI develop health-related chatbot initiatives, the focus remains on ensuring safety and efficacy. OpenAI has stated that its collaboration with over 260 physicians helped shape a new health chatbot, but the current models still are not suitable for direct patient care. Researchers advocate for further testing to assess the performance of LLMs before their deployment in health scenarios.

This study underscores the ongoing challenges in integrating AI into healthcare. Despite advancements in technology, chatbots are not yet capable of fulfilling the role of medical professionals. Caution is advised for patients who consider using AI for medical advice.