New study shows AI isn’t ready for office work
|
By
Moinak Pal Published January 24, 2026 |
It has been nearly two years since Microsoft CEO Satya Nadella predicted that generative AI would take over knowledge work, but if you look around a typical law firm or investment bank today, the human workforce is still very much in charge. Despite all the hype about “reasoning” and “planning,” a new study from training-data company Mercor explains exactly why the robot revolution is stalled: AI just can’t handle the messiness of real work.
Mercor released a new benchmark called APEX-Agents, and it is brutal. Unlike the usual tests that ask AI to write a poem or solve a math problem, this one uses actual queries from lawyers, consultants, and bankers. It asks the models to do complete, multi-step tasks that require jumping between different types of information.
The results? Even the absolute best models on the market—we are talking about Gemini 3 Flash and GPT-5.2—couldn’t crack a 25% accuracy rate. Gemini led the pack at 24%, with GPT-5.2 right behind it at 23%. Most others were stuck in the teens.
Mercor CEO Brendan Foody points out that the issue isn’t raw intelligence; it’s context. In the real world, answers aren’t served up on a silver platter. A lawyer has to check a Slack thread, read a PDF policy, look at a spreadsheet, and then synthesize all that to answer a question about GDPR compliance.
Humans do this context-switching naturally. AI, it turns out, is terrible at it. When you force these models to hunt for information across “scattered” sources, they either get confused, give the wrong answer, or just give up entirely.
For anyone worried about their job security, this is a bit of a relief. The study suggests that right now, AI functions less like a seasoned professional and more like an unreliable intern who gets things right about a quarter of the time.
That said, the progress is terrifyingly fast. Foody noted that just a year ago, these models were scoring between 5% and 10%. Now they are hitting 24%. So, while they aren’t ready to take the wheel yet, they are learning to drive much faster than we expected. For now, though, the “knowledge work” revolution is on hold until the bots learn how to multitask p
Related Posts
Acer reveals Veriton compact PC to tackle the Mac mini with AMD Ryzen and plenty of AI mojo
Acer is making a direct play in that space with the Veriton RA110 AI Mini Workstation, a compact desktop that runs on AMD's Ryzen AI Max+ 395 processor, aimed at the same desk-bound professional who wants power without the tower.
Acer’s Swift Air 14 is a peppy MacBook Neo rival with some cool upgrades and a $699 ask
At a time when even mainstream laptops are creeping toward four-figure price tags, Acer’s latest machine feels refreshingly straightforward. It’s aimed at students, remote workers, and anyone who wants a laptop that looks and feels expensive without draining their bank account. The Swift Air 14 is powered by Intel’s new Core Series 3 processors and delivers up to 19 hours of battery life. That’s the sort of endurance that could realistically get many users through a full workday and beyond without scrambling for a charger.
Google Drive can now batch-scan your documents and spare you a few other frustrations, too
Well, Google Drive's new document scanner redesign fixes all three problems at once. Announced by Sameer Samat, the President of Android Ecosystem at Google, the feature is now rolling out for Android users.