San Francisco, April 9, 2026, 08:12 PDT
Google’s AI Overviews, the AI-written summaries that appear above ordinary search results, returned wrong answers in about one out of 10 cases in a new analysis commissioned by the New York Times, even after Google moved the feature to its newer Gemini 3 model. The review by AI startup Oumi tested 4,326 searches and found accuracy rose to 91% in February from 85% in October. 1
That matters now because Google is pushing AI answers deeper into its core search product, not keeping them on the edge. In January, the company made Gemini 3 the default model for AI Overviews and added follow-up questions, and by late 2025 executives said the product had scaled to more than 2 billion users. 2
It also lands amid a debate over what AI search is doing to the web. Pew Research said users clicked a traditional search result on 8% of visits when an AI summary appeared, versus 15% when it did not, while Google has argued outbound clicks from Search remain broadly stable. 3
Oumi ran the test on SimpleQA, a benchmark built around short, fact-seeking questions with one verifiable answer. The score improved on one measure and worsened on another: more than half of the February responses marked correct were also “ungrounded,” meaning the cited links did not fully back the claim, up to 56% from 37% in October. 4
The misses were often subtle rather than bizarre. Asked when Bob Marley’s former home became a museum, Google’s summary said 1987 although the museum opened in 1986; in another case, AI Overviews linked to the Classical Music Hall of Fame’s own site and still said there was no record of cellist Yo-Yo Ma being inducted. 1
Pratik Verma, chief executive of Okahu, said Google’s system looked roughly as accurate as other leading AI models, but warned users to “never trust one source.” Manos Koukoumidis, Oumi’s chief executive, argued the harder problem is verification: even a true answer becomes harder to trust when the links beside it do not clearly support it. 1
Google pushed back. Spokesperson Ned Adriance said the study had “serious holes” and did not reflect what people actually search for, and Google’s own help pages already warn that AI Overviews may include mistakes. Separately, Google DeepMind researchers published a revised benchmark, SimpleQA Verified, that sought to fix what they described as noisy labels and other flaws in the original test. 1
That leaves a key uncertainty around the splashiest math in the wider coverage. Google says it handles more than 5 trillion searches a year, but Pew found AI summaries on 18% of Google searches in its March 2025 sample, so turning Oumi’s error rate into an hourly global total requires assumptions about how often AI Overviews appear and which queries trigger them. 5
The findings arrive as Google races to make AI answers a bigger part of search while rivals roll out similar products. OpenAI made ChatGPT search available to everyone in February 2025, and Perplexity launched its Comet browser in July 2025 with AI-powered search and task tools aimed at challenging Chrome. 6
For Google, the trade-off is getting sharper. Search still links out to the web, but when the answer is wrong — or the citation does not back it up — the mistake now sits in the most prominent spot on the page. 7