San Francisco, April 9, 2026, 08:12 PDT
Roughly 10% of Google’s AI Overviews—the summaries stacked above standard search listings—got it wrong, according to a fresh New York Times-commissioned analysis, despite Google shifting the feature to its newer Gemini 3 model. AI startup Oumi ran 4,326 queries and saw accuracy climb from 85% in October to 91% by February.
Now it’s significant: Google is weaving AI-generated answers right into core search, not just leaving them as a side feature. Back in January, Gemini 3 became the default for AI Overviews, and the company introduced a follow-up question function. By late 2025, executives said more than 2 billion users were on board.
The launch comes as questions swirl around AI search’s impact on the web. According to Pew Research, when an AI summary was shown, users clicked on a standard search result just 8% of the time—compared to 15% when no summary appeared. Google, for its part, maintains that overall outbound clicks from Search are holding steady.
Oumi put SimpleQA—a benchmark focused on concise, factual questions with a single checkable answer—to the test. The latest results: one metric ticked up, but another took a hit. In February, over half of responses marked as correct were actually “ungrounded,” with the supporting links falling short of backing the answer. That’s 56%, jumping from just 37% in October. OpenAI
Most of the errors weren’t glaring. When prompted about the year Bob Marley’s old house turned into a museum, Google put it at 1987, missing the actual 1986 opening. Elsewhere, AI Overviews pointed straight to the Classical Music Hall of Fame’s official page, yet still claimed cellist Yo-Yo Ma wasn’t among the inductees.
Okahu chief executive Pratik Verma described Google’s system as about as accurate as rival AI models, but cautioned users: “never trust one source.” Oumi chief executive Manos Koukoumidis pointed to verification as the tougher challenge, saying that even correct answers get shaky if the links next to them aren’t clear support. The Star
Google wasn’t having it. Company spokesperson Ned Adriance called out “serious holes” in the study, arguing it doesn’t capture real search behavior. He also pointed to Google’s own help pages, which already caution users that AI Overviews could make errors. On a separate track, Google DeepMind researchers rolled out a revised benchmark dubbed SimpleQA Verified, aiming to clean up what they described as noisy labels and other issues that plagued the original version. The Star
That still leaves a big question hanging over the headline numbers. Google claims it processes over 5 trillion searches per year. Pew’s March 2025 analysis saw AI summaries show up in 18% of searches. So, to estimate how Oumi’s error rate scales worldwide by the hour, you have to guess how frequently AI Overviews are shown and which search queries actually get them.
Google is pushing to weave more AI-generated answers into its search, just as competitors step up their own offerings. In February 2025, OpenAI rolled out ChatGPT search for all users, while Perplexity followed in July 2025 with its Comet browser, which brings AI-driven search and productivity features designed to take on Chrome.
Google’s calculus is shifting. Search continues to connect users to the web, but errors or unsupported citations now land front and center, right where everyone sees them.