AI Symptom Checks Get a Reality Check: Study Finds ChatGPT and Rivals No Better Than Web Search

February 9, 2026
AI Symptom Checks Get a Reality Check: Study Finds ChatGPT and Rivals No Better Than Web Search

London, Feb 9, 2026, 17:16 (GMT)

  • A UK trial revealed that people seeking symptom advice from AI chatbots didn’t make better health decisions compared to those using web searches or other standard resources
  • These AI models showed strong performance on their own, but accuracy dropped once real users got involved
  • Researchers and clinicians cautioned that minor tweaks in how users describe symptoms might lead to unsafe recommendations

A new study released Monday found that asking an AI chatbot about medical symptoms doesn’t improve patient decision-making compared to using a standard internet search or reputable health websites. Reuters

As more consumers turn to chatbots for health advice at home, health systems and developers are testing these tools as a digital “front door” for triage—helping patients figure out whether to manage symptoms themselves, visit a doctor, or head to the emergency room.

The researchers pointed out that surveys show a rising number of people turn to AI chatbots for health questions—about one in six American adults do so at least monthly. They also warned that high marks on medical exams don’t always reflect how these tools perform with actual users. Nature

The randomized trial included 1,298 UK adults tackling 10 medical scenarios crafted by doctors, ranging from mild illnesses to a critical brain bleed. Participants either used a large language model — the text-generating AI behind tools like ChatGPT — or turned to their usual resources, like internet searches or the National Health Service website.

When tested solo, the models — OpenAI’s GPT-4o, Meta’s Llama 3, and Cohere’s Command R+ — correctly spotted relevant conditions 94.9% of the time and selected the right “disposition,” or next step, 56.3% on average, the study found. People using those same systems identified conditions less than 34.5% of the time and chose the proper next move under 44.2%, barely outperforming the control group.

Adam Mahdi, co-author and associate professor at the University of Oxford, pointed out a “huge gap” between the theory behind the technology and its real-world performance. “The knowledge may be in those bots; however, this knowledge doesn’t always translate when interacting with humans,” he said.

Mahdi cautioned that impressive benchmark results can hide flaws that only appear when AI systems interact with real users. “The gap between benchmark scores and actual performance should alert AI developers and regulators,” he said, urging more extensive testing with diverse populations before rolling out these tools in healthcare. Ox

Rebecca Payne, a GP and lead medical practitioner on the study, warned that consumers need to approach chatbot responses carefully. “Despite all the hype, AI just isn’t ready to take on the role of the physician,” she said, noting that incorrect advice might overlook cases requiring urgent medical attention.

The study highlighted how minor tweaks in wording can drastically change responses. For instance, a user reporting “the worst headache ever” with a stiff neck and light sensitivity was advised to go to the hospital. But when the headache was described as “terrible” instead, the guidance shifted to resting in a dark room.

Researchers examined a set of conversations closely and found that mistakes frequently stemmed from both ends: people omitted crucial information or shared incorrect details, while the AI occasionally generated misleading or outright false answers. The systems also blended solid advice with weaker suggestions, forcing users to figure out what to believe.

The key question remains: can improved interfaces, clearer instructions for users, or newer models bridge the gap between controlled tests and real-world use? The team intends to run similar studies across other countries and languages to verify if the findings hold up. OpenAI, Meta, and Cohere did not respond to Reuters’ requests for comment.

The researchers noted that the study received backing from the data company Prolific, the German non-profit Dieter Schwarz Stiftung, and both the UK and U.S. governments. This highlights the increasing effort to evaluate if consumer-facing AI can be safely integrated into already stretched health systems.

Technology News

  • UW study tests AI learning altruism from human behavior via inverse reinforcement learning
    February 9, 2026, 12:48 PM EST. AI systems could learn values by watching human behavior across cultures, says a University of Washington study. It tests inverse reinforcement learning (IRL) as a way for machines to infer values from actions rather than being handed a rulebook. In a real-time, multi-agent online game, participants from different cultural groups played where no single strategy maximized payoff. Researchers derived reward functions that encode latent preferences and trained AI agents. Agents trained on group data showed systematic differences in prioritizing collective outcomes over individual gain, and those patterns held in new scenarios, suggesting learning of stable preferences rather than memorized states. Yet the work cautions that models do not truly understand culture symbolically or reason about morality abstractly.

Latest Articles

EU threatens Meta with quick WhatsApp action over rival AI chatbots

EU threatens Meta with quick WhatsApp action over rival AI chatbots

February 9, 2026
BRUSSELS, Feb 9, 2026, 18:15 CET EU antitrust regulators on Monday threatened to move quickly against Meta Platforms over a WhatsApp policy that blocks rival artificial intelligence assistants, signalling possible temporary orders while a competition probe runs. Reuters The warning lands as chatbots are being pushed into customer service and sales, where distribution matters. If a service cannot plug into WhatsApp, it can lose a direct line to consumers and businesses that treat the app as their front door. The European Commission said it had sent Meta a “statement of objections” — the formal list of allegations in an EU