From Crutch to Coach?
What a new research study tells us about AI's impact on human learning & skills development
Hey Folks 👋
As we all know well, generative AI has made it dramatically easier to get things done. The big open question for us educators is: does that actually make us more capable, or just faster at outsourcing our thinking?
A new paper, “Coach not crutch: AI assistance can enhance rather than hinder skill development” (Lira et al., 2025), gives one of the clearest answers yet – and it’s both more optimistic (and more nuanced) than previous research has suggested.

Spoiler: In a series of large-scale controlled experiments involving thousands of adult participants, an AI assistant didn’t just boost immediate task performance — it also improved people’s skills in the longer term, once the AI was taken away.
This research provides a critical counter-narrative to a growing body of literature that warns of the dangers of so-called “cognitive offloading” and risk of skills decline when working and learning with the assistance of AI (Bastani et al, 2024; Zhai et al, 2024; Kulal, 2025; Gerlich, 2025; Tian, 2025).
The findings of Coach not Crutch don’t so much contradict those warnings as qualify them. They suggest that under the right conditions — specifically, controlled practice, scaffolded instruction, and exposure to well-designed, high-quality examples — AI can operate as a powerful learning tool, strengthening rather than weakening human capability.
Let’s dive in. 🚀
What Researchers Actually Tested
Lira et al. ran five large scale experiments to test the impact of AI on adult learners’ skills, specifically their writing skills. The experiment focused on one very practical use case: writing cover letters.
Across the studies, the basic structure was this:
Pre-test – Participants were given a deliberately poor cover letter and asked to rewrite it as if they were genuinely applying for the role.
Mini-lesson – Everyone then got a short micro-lesson on five evidence-based principles of effective professional writing (from Writing for Busy Readers): being concise, making reading easy, designing for easy navigation, using enough (but not too much) formatting, and making it easy for the reader to respond.Practice phase – This is where the conditions vary. Depending on their allocated group, participants then did one of these five things:
Practised with an AI coach – They drafted or revised a new cover letter, pasted it into an AI writing tool and received an instant rewrite that applied the five principles. Their own version and the AI version were shown side by side, and they could edit, accept or reject AI suggestions before submitting.
Practised without AI – They rewrote the same kind of cover letter from scratch or by editing the bad draft, using only the earlier lesson as guidance and no external tools.
Got feedback from professional human editors – After writing their letter, they received a rewritten version plus comments from experienced editors, then used that feedback to improve their own draft.
Used Google to search for tips and examples – They were explicitly told to search for cover letter advice and examples on the open web, then come back and revise their letter based on what they’d found.
Simply saw an AI-generated “ideal” version – In the “example-only” condition, instead of practising, they were just shown a high-quality AI rewrite of the letter (not labelled as AI) and asked to study it; they didn’t type or edit anything during practice.
Test phase (no AI) – Crucially, learning was measured later via a fresh writing task where everyone has to improve a new, badly written cover letter without any AI, search or editor support (one day later). This is the core outcome and measure of success: what can the participants do alone, as a result of this interaction?
Follow-up – In some studies, participants come back about a day later for another no-AI writing task, so the authors can see whether any gains from practice (with or without AI) actually stick beyond the immediate session.

How did the researchers measure “learning gains”?
In the study, “learning” is defined as improvement in independent writing performance over time — specifically, from pre-test to later no-AI test(s). Instead of relying on a single human marker, the team used two complementary measures:
AI-based scoring. GPT-4o rated each cover letter against the five writing principles and averaged these into a single writing-quality score. These AI scores showed strong alignment human research assistants who rated a subsample of letters on the same rubric.
A human-facing outcome. A separate group of participants, naïve to the study, compared pairs of letters and chose which writer was more likely to get a job interview. Letters preferred in these head-to-head comparisons also tended to have higher GPT-4o quality scores, providing a behavioural, job-relevant validation of the model-based metric.
Across studies, “learning gains” were then quantified as the extent to which different practice conditions (with AI, without AI, with human editors, with Google, or example-only) lead to higher scores on these no-AI tests — immediately and one day later — typically reported as standardised effect sizes.
In other words, the study didn’t just ask “does the AI like its own writing?”, but triangulated between model ratings, human judgments and performance over time to capture genuine skill development.
Key Findings
1. Practising with AI improved future performance without AI
In the experiment, participants who practised with the AI “writing coach”:
Learned more than those who practised without AI
Learned more than those who didn’t practise at all
And in a later study, learned more than those who received personalised feedback from professional human editors
All of this is despite the fact that they:
Spent less time “learning”
Typed fewer keystrokes
And reported lower subjective effort
This finding challenges two common and widely-help assumptions about human learning:
“If humans don’t put in the effort, they don’t learn.”
“If AI does the work, we don’t learn.”
Under some conditions, at least, that’s not true.
2. AI Outperformed both Google and Human Editors as a Practice Partner
In one of the bigger studies, participants first wrote a practice cover letter and then were randomly assigned to one of three “coaches”:
An AI writing assistant
A team of expert human editors (25+ years’ experience on average)
Google search (find your own tips & examples)
They then revised their letter with their assigned resource, and the next day wrote a new cover letter without any help.

Results:
During practice, letters improved most with AI-generated feedback, followed by human editors, then Google.
On the writing quality scale, the AI group’s drafts were much stronger – roughly three-quarters of a standard deviation better than those revised with professional editors (d ≈ 0.76) and about a full standard deviation better than those revised using Google alone (d ≈ 1.0).
On the next-day, “did you actually learn?” test (no AI used), the gaps shrank but didn’t disappear. People who had practised with AI still scored around half a standard deviation higher than the Google group (d ≈ 0.46) and about a fifth of a standard deviation higher than the editor group (d ≈ 0.20). In other words: the AI edge wasn’t just a one-off boost – some of it stuck.
3. Simply Seeing an AI-Generated Example Often Matched Active AI practice
Perhaps the most surprising result comes from the final study. Alongside the “Practise with AI” group and “Practise without AI” group, the researchers added a third condition where participants didn’t write anything at all during practice – they just looked at a high-quality, AI-generated rewrite of the cover letter.
On the “did you learn anything?” test the next day, the results were striking:
The “example-only” group did just as well as the “practise with AI” group
Both AI-exposed groups significantly outperformed those who practised without AI
The example-only group did all this with almost no effort: no extra typing, less time, and lower reported effort
If you work in learning design, this should make your ears prick up. What seems to be doing a lot of the heavy lifting towards learning here is one simple thing: high-quality, personalised, worked examples at the point of need.
This is beautifully consistent with decades of research on the worked example effect in maths and problem-solving, where seeing an expert solution and then imitating/adapting it often produces better learning than struggling in the dark (Gog et al, 2011; Booth et al, 2015; Barbieri et al, 2024).
This don’t mean that practice never matters; but is does suggest that - in some domains - upgrading examples alone can achieve large learning gains.
Implications for Learning in the Flow of Work
Many of us have been talking about a future where AI delivers learning in the flow of work: not as a separate course, but as “always on” micro interventions embedded in the tasks people are already doing.
The Coach not Crutch findings give us a very concrete blueprint for what designing AI as coach not crutch could look like in practice.
In practice, this kind of AI “coach” seems to require at least three things:
Examples aligned to explicit performance principles
A reason for learners to care about later AI-free performance
Light-touch prompts that keep reflection alive without killing flow
Imagine you’re:
A sales manager writing a tricky email to a key client
A people leader drafting a performance review
A policy analyst summarising a complex regulatory update for colleagues
In most organisations today, an AI tool is positioned as a productivity assistant:
“Write this email for me.”
“Draft this performance review.”
“Summarise this regulation.”
A pedagogy-first, flow-of-work version would look more like:
You write a first draft yourself: The system nudges you to get something down, however rough.
AI generates a high-quality rewrite of your draft, using explicit principles:
For writing, these might be the same five used in the study: brevity, clarity, navigation, formatting, ease of response.
For other tasks, they might be domain-specific performance standards (e.g. good diagnosis notes, strong risk assessments).
You see your draft and the AI version side by side: This is the key “worked example” moment:
Where did the AI cut fluff?
How did it structure the argument?
How did it make next steps clearer for the recipient?
Optional: guided reflection prompts: Tiny questions, asked frequently, can protect against the metacognitive laziness we saw in the earlier studies, for example:
“What’s one change you want to copy into your version, and why?”
“How did the AI make it easier for the reader to respond?”
“Which principle do you see in action here (e.g. ‘less is more’)?”
You choose which edits to adopt: You stay in control; the AI acts as a coach, not an autopilot.
Over time, as you repeat this pattern across dozens of real-world tasks, you’re not just “getting help” – you’re quietly building a mental model of what good looks like in your role: emails, reports, slide decks, proposals, reviews.
This is learning in the flow of work as it has always been imagined but never been substantively achieved:
No “stop and learn”
No login to a separate platform
Learning delivered via “apprenticeship”, which offers better versions of the work you were doing anyway, with tiny, high-quality examples baked in.
The Coach not Crutch research is just a start point, but it indicates that if we design “in flow” experiences carefully, we can reasonably expect real skill gains, not just efficiency bumps followed by skills erosion.
So, Does AI Harm or Enhance Human Learning?
At first glance, this research looks like a direct contradiction to a growing body of literature that warns of the dangers of so-called “cognitive offloading” and risk of skills decline when working and learning with the assistance of AI (Bastani et al, 2024; Zhai et al, 2024; Kulal, 2025; Gerlich, 2025; Tian, 2025).
Previous studies (especially in maths, programming and SRL) showed that generic AI tended to undermine deep learning, particularly when learners used it to short-circuit thinking.
This paper suggests that AI can boost learning and skills development, even when it appears to reduce effort. So, what’s going on?
Different domains, different affordances
In professional writing, the output itself is incredibly rich as a learning object. In a good letter, you can see the structure, formatting, tone, length and clarity.
In maths or programming, the final answer rarely shows the full reasoning process; a correct solution can hide a lot of conceptual shortcuts.
So in writing, a high-quality AI example is closer to a full worked example; in other domains, “just the answer” is much less pedagogically useful. That helps explain why earlier maths and programming studies saw reduced independent problem-solving when novices leaned on LLMs.
Different task framings and incentives
In the Coach not Crutch experiments, participants knew they would later be tested without AI and paid based on performance.
That’s very different from real life, where the incentive is almost always “get the thing done as fast as possible.”
In other words, these participants had a strong reason to treat AI as a coach, not a shortcut. That’s not always the case in authentic settings – as the SRL and metacognition research on “metacognitive laziness” makes painfully clear.
Design, not just availability, is what matters
The AI “coach” used in these experiments was explicitly designed around evidence-based writing principles. Participants got a mini-lesson first, so they had a conceptual scaffold to interpret the AI’s examples.
Contrast that with dropping a generic ChatGPT window into an LMS and hoping for the best. The earlier research shows what happens in that world: lots of cognitive offloading, little genuine learning, and reduced engagement in SRL processes like reflection and self-evaluation.
So, overall: this paper doesn’t “disprove” concerns about AI harming learning. What it does is sharpen and add nuance to our understanding:
AI harms learning when it replaces thinking.
AI can enhance learning when it exposes learners to clear, high-quality demonstrations that they have reason to study and imitate.
That’s a crucial distinction for anyone designing and delivering learning with AI.
What this Means for AI-powered Learning Design
If we synthesise this new paper with the earlier, more pessimistic studies, a few design principles start to emerge:
1. Move from “answer engines” to “example engines”
Instead of designing AI tools whose primary job is to produce outputs, design them to:
Generate clear, high-quality examples that embody evidence-based principles
Show those examples alongside the learner’s own work
Make the gap between “current” and “desired” performance visible and legible
In other words: treat AI as a worked-example factory, not a magic typewriter. This directly mirrors what drove the gains in the Coach not crutch experiments.
2. Couple AI examples with at least minimal thinking
The Coach not crutch paper shows that even passive exposure to examples can help. But the broader SRL and metacognition research still strongly suggests that reflection amplifies learning.
So in a pedagogy-first design, I’d be inclined to:
Keep the high-quality example
Add lightweight reflection prompts that don’t break the flow of work but still nudge the learner to:
Notice one change they want to adopt
Name the principle they see in action
Predict the impact on the reader
Tiny questions, asked frequently, can counter the metacognitive laziness, cognitive offloading and inflated confidence we saw in the earlier studies.
3. Make “test without AI” part of the journey
One clever aspect of the Lira et al. design is the no-AI test phase. People knew they would have to perform independently later, which likely:
Increased their motivation to actually learn from the AI
Reduced the temptation to treat the tool as a pure shortcut
In workplace settings, we can approximate this by:
Building in AI-off moments in training or certification tasks
Designing assessment rubrics that explicitly reward independent performance, not just polished outputs
Being transparent with learners: “This tool will help you perform now and build skill for later – and we will occasionally ask you to show what you can do without it.”
Conclusion
This paper is an exciting proof of what’s possible when AI is used as a coach rather than a crutch – but we need to be careful not to over-claim. All five experiments sit in one domain (professional writing), over a short time horizon (one day), with participants who were paid for performance and explicitly told they’d be tested without AI.
We don’t yet know if these effects hold in maths, programming, clinical decisions, or complex problem-solving under time pressure, nor what happens over months or years in real workplaces that reward speed over learning. And the promise of democratised coaching will only be realised if tools are genuinely accessible, interfaces inclusive, and organisations invest in AI literacy and psychological safety.
Taken together with earlier studies, the emerging picture looks something like this:
generic, answer-first AI tends to boost productivity at the cost of cognitive engagement;
pedagogy-first, example-centred AI can deliver both productivity and stronger learning – especially when we make “what good looks like” visible, embed tiny moments of reflection, and integrate support into real work rather than bolting it on afterwards.
So the core question hasn’t really changed: are we building tools that are smarter, or tools that help us create smarter, more capable humans? The Coach not Crutch paper gives us good reason to believe the latter is possible – but it won’t happen by accident.
Happy experimenting,
Phil 👋
PS: If you want to go deeper into how we actually design these kinds of AI-powered, pedagogy-aligned, flow-of-work learning experiences, that’s exactly what we explore – hands-on – in my AI & Learning Design Bootcamp.


