天文学 ニュース 21 1月 2026

Technology News

  • AI agents fail Apex-Agents benchmark, raising doubts about workplace readiness
    January 22, 2026, 8:34 PM EST. Mercor's Apex-Agents benchmark subjects leading AI models to real-world white-collar tasks drawn from consulting, investment banking and law. Early results show a universal fail rate: even the best models answer only about a quarter of questions correctly; most responses are wrong or missing. The findings highlight a bottleneck in multi-domain reasoning, as professionals repeatedly pull context from Slack, Google Drive and other tools-an environment current agents struggle to navigate. The scenarios come from Mercor's expert marketplace and are posted publicly on Hugging Face to set a public standard. If AI agents can reliably tackle these tasks, they could reshape roles for lawyers and other knowledge workers. The research aims to explain why progress toward workplace-ready AI remains uneven.