The Illusion AI Productivity Gains
Why your AI tools aren't delivering the ROI you were promised — and what to do about it
Hey folks 👋
The early evidence on AI’s impact on human productivity was extraordinary. In 2023, Harvard Business School released a landmark field experiment which found that consultants using GPT-4 completed 12% more tasks, 25% faster, with over 40% higher quality. Customer support agents using AI resolved 15% more issues per hour. Below-average performers in both studies improved by more than 40%. On paper, this was a once-in-a-generation productivity story.
Three years on, the headlines tell a different story. Last year, an MIT report concluded that 95% of corporate AI investments have produced zero return. Earlier this year, Goldman Sachs declared that AI’s contribution to US GDP growth in 2025 to be “basically zero.”
The most fascinating thing here is that both of these pictures are true: in some organisations, AI is driving unprecedented increases in productivity and with it employee growth and satisfaction. In others, AI is cited as the primary cause of reduced productivity and with it growing employee 'brain drain', frustration and attrition.
In this week's post, I explain why this is happening, and how each group — the winners and the strugglers — is using AI differently.
Let’s dive in!
Defining & Measuring Productivity
Before we go further, two quick definitions.
When I say productivity, I mean the volume and quality of work an organisation produces per hour of human effort. Not output volume alone, and not hours saved alone: the two together.
When I say productivity gains, I don’t mean the metric most L&D teams are reporting on right now: time saved. That’s the wrong number, and it’s the reason most AI ROI stories aren’t holding up to scrutiny.
“Time saved” measures only the front end of AI use — how much faster you got to a draft, an answer, a deck. It ignores the back end: how much of that time was lost to rework, verification, and fixing flawed output. Workday’s research shows 40% of time saved by AI is lost to rework. Goldman Sachs has shown that 90% of S&P 500 companies can’t quantify AI’s impact, and 95% of enterprise AI deployments have produced nothing.
The metrics that actually matter are output quality and employee satisfaction — the two things AI genuinely delivers when it’s used well, and the two things L&D has always been measured on.
That’s what the rest of this post is about: how to use AI in ways that improve both.
The Numbers Don’t Add Up
When we look at data on AI’s impact on productivity in the workplace, the macro picture is grim, but the on-the-ground picture is downright strange.
Workday’s January 2026 study of 3,200 business leaders found that 85% of employees are saving one to seven hours a week with AI — and nearly 40% of those gains are immediately lost to rework. If a worker saves six hours a week, more than two are lost to corrections and rewrites. Only 14% of workers consistently report a net-positive outcome.
BCG’s recent survey of 1,488 full-time workers is even sharper: productivity goes up when people use three or fewer AI tools — and falls off a cliff once they hit four or more. Among workers reporting “AI brain fry,” 34% intend to quit (BCG, 2026).
And in March, Stanford and BetterUp researchers named the phenomenon eating the gains: workslop. AI-generated content that looks polished but lacks substance. 40% of US workers received it from a colleague in the past month. Early estimates put each incident at around two to three and a half hours of rework, costing somewhere between $8 and $9 million a year in lost productivity for a 10,000-person organisation (Hancock et al., 2026).
So workers feel busier. Outputs look polished. AI tools are everywhere. And yet ROI is missing, productivity gains are evaporating into rework, and the once-in-a-generation productivity story isn’t landing.
This isn’t a tools problem. It’s a use problem. And until L&D solves it, the productivity story isn’t going to land.
The Confidence Trap
To understand why the use problem persists, we need to look at a finding from March that almost no one is talking about.
In a Harvard Business School working paper, Randazzo et al. analysed GPT-4 logs from over 70 BCG consultants attempting to validate AI outputs. When professionals pushed back — fact-checking, pointing out errors, pressing the AI to reconsider — the AI typically didn’t admit its limitations: it escalated its persuasion. It apologised, then restated its original position with more supporting data, deploying structured reasoning to make its flawed recommendation appear analytically grounded.

I’m calling this the confidence trap. It’s what happens most of the time you ask AI to verify itself. You won’t get the truth — you’ll get the same answer, dressed better and defended more convincingly.
A head of L&D at a major financial services firm told me last month that her team had quietly stopped trusting their own AI-generated reports. They’d started building informal verification rituals — sending drafts to each other before sending them anywhere else, adding 30 minutes of rework to every deliverable. They didn’t have a name for what they were doing. They were defending against workslop and confidence traps without knowing it.
Most workers are walking into confidence traps every day. They’re producing workslop without knowing it. And most AI training is teaching them to prompt better, which is exactly the wrong skill. The skill they need is how to verify AI outputs without using AI to do it.
This is one of the biggest AI opportunities for L&D, hiding in plain sight.
The AI Productivity Map
Here’s the framework that makes this make sense.
When workers delegate to AI, they free up cognitive capacity. What they do with that freed capacity determines whether the productivity gain is real, evaporates into rework, or actively damages their thinking. There are three zones — and 90% of workers aren’t in the right one.
Zone 1: The Sidelines (50% of the workforce)
Avoids or minimally uses AI. Continues working as before.
👉 Employee impact: capacity-constrained, cognition unchanged, falling behind on AI fluency.
👉 Business impact: slower delivery, missed productivity gains.
According to data, this is half the US workforce. Gallup’s Q1 2026 survey shows 50% of US workers either don’t use AI at all or use it so rarely it makes no real difference. They’re not damaged by AI — but they’re not benefitting from it either, and the AI-fluency gap is widening every quarter.

Zone 2: The Trap (40% of the workforce)
Uses AI without verification or improvement cycles. Polished output, hollow substance. This is where most AI users are stuck — and it has two faces.
👉Employee impact: cognitive debt, mental fatigue, errors, intent to quit (AI Brain Fry).
👉 Business impact: rework, lost time, eroded trust between colleagues (Workslop).
The Trap is the failure mode the research has been circling for the last six months from two different angles.
From the business side, Stanford and BetterUp called it workslop — AI-generated content that looks finished but lacks substance, requiring downstream rework that erases the time savings. From the employee side, BCG called it AI Brain Fry — the cognitive overload that comes from over-monitoring uncritical AI output, producing 14% more mental effort, 12% more fatigue, and 19% more information overload.
Same root cause. Two surface symptoms. Workslop is what your business notices. AI Brain Fry is what your employee feels. Both are what happens when AI use lacks structured verification.
There’s also a deeper cognitive cost. MIT’s recent EEG study found heavy AI users showed lower neural engagement, weaker recall of their own work, and what the researchers called “cognitive debt” (Kosmyna et al., 2025). Baldeo’s APA study (n=1,923) found passive AI users reported the lowest confidence in their own reasoning. Output looks fine. Then someone asks a hard question — and there’s nothing underneath.
Zone 3: The Sweet Spot (10% of the workforce)
This small but growing group uses AI heavily within structured verification and improvement cycles. High engagement, high quality — but not necessarily time saved.
👉 Employee impact: sharper thinking, increased confidence, judgement, and engagement.
👉 Business impact: sustained productivity and output quality gains.
This is the zone the early research was capturing. The Dell’Acqua and customer support gains I opened with — the 40% quality lift, the 25% speed boost, the 43% improvement among lower-skilled workers — they didn’t disappear. They just stayed inside the lab. The 10% of workers who actually replicate them in real workplaces share one thing: structured verification and improvement loops. The Cybernetic Teammate study at Procter & Gamble found this same pattern with AI-enabled individuals matching the performance of two-person teams (n=776, 2025).
BetterUp calls these workers “Pilots” — the high-agency minority who use AI deliberately rather than reactively. They’re roughly 3.6× more productive than their peers. And critically, the cognitive evidence runs the other way too: when AI delegation is paired with verification and judgement, thinking gets sharper, not duller.
A note worth sitting with: AI helps lower-skilled workers more than top performers, and can even hurt expert performance if they over-delegate. The metric that matters isn’t “did everyone get faster?” — it’s “did your team reinvest the freed capacity in the right work for them?”
Six Moves to Get to the Sweet Spot
These aren’t tricks. They’re a methodology. Each is supported by published research. Each one targets a specific way the productivity gains leak.
🔥 Check AI’s work in a brand new chat.
Generate in one chat. Open a fresh, empty chat. Paste in the output. Ask: “You’re a sceptical reviewer. Where does this fall short?” A clean chat with no memory is harder on the work than the original ever will be.
Madaan et al.’s Self-Refine research showed that decoupling generation and critique yields about a 20% absolute improvement in output quality on average across diverse tasks — including against GPT-4. Self-Refine uses the same model as both generator and critic; using a fresh chat as a neutral reviewer is a practical variant of the same principle.
🔥 Verify one specific claim — not with AI.
Before any AI output leaves your hands, pick one claim. A statistic, a name, a date. Open Google. Check it. Five minutes. This is the minimum viable defence against the confidence trap.
🔥 Decide your mode before you start.
Are you a centaur (clean division — AI does X, you do Y) or a cyborg (constant alternation)? Either mode can deliver the documented BCG-style gains if used deliberately. The mode you don’t choose is the one you fall into — and that’s The Trap.
🔥 Ask AI to think out loud before it answers.
End every prompt with: “Walk me through your reasoning step by step before giving me the answer.” It slows AI down (good). It makes the answer auditable — so you can spot where it went wrong before you forward it. Step-by-step reasoning meaningfully improves AI accuracy on complex tasks (Wei et al., chain-of-thought research).
🔥 Draft your version first — even if it’s bad.
Two minutes of bullet points before you open AI. Three benefits: you catch your own thinking before AI’s framing colonises it; you spot where AI’s version differs from yours (which is where the real choices are); you maintain the cognitive practice that keeps your judgement sharp.
Dell’Acqua’s earlier work on recruiters found that those using high-quality AI without doing this became worse at their jobs than those without AI — what he called “falling asleep at the wheel.”
🔥 Map the workflow before adding the tool.
For every AI tool: what specific task does this replace, and how will I know the output is good enough? If you can’t answer both, cut it. Stack fewer, use them properly. MIT’s research on the 95% of failed corporate AI deployments identified a recurring pattern: a learning gap between the tools and enterprise workflows (Challapally, MIT NANDA, 2025).
Here’s the TLDR cheat sheet:
Concluding Thoughts
The 10% of workers in The Sweet Spot aren’t there because they have access to better tools. They’re there because they’re using AI inside structured verification cycles, and someone is measuring the right thing.
The implications of this for L&D are significant: the result of more intentional AI use in the workplace isn’t just higher productivity for the business, it’s improved cognitive function, better quality work, and higher satisfaction for the employee - i.e. big wins that L&D leaders have been chasing for decades.
If you take two things from this post, take this:
👉 Start making the six AI moves above. Generate in one chat, evaluate in another. Verify one claim externally. Decide centaur or cyborg upfront. Ask AI to reason before answering. Draft your version first. Map the workflow before adding the tool.
👉 Stop measuring time saved. Start measuring quality of output and employee satisfaction. Time saved is the metric that's failed your CEO for years. Output quality and employee satisfaction are the metrics that capture what AI actually delivers when it's used well — and the ones the 10% in The Sweet Spot are quietly winning on.
Happy innovating!
Phil 👋
PS: Ready to move your team from The Trap to The Sweet Spot? Join me on the AI Bootcamp for L&D.



