From Asking to Orchestrating

Aka, the agentic AI workflow in L&D

Jul 02, 2026

Hey folks 👋

If you look beyond L&D — to software, to law and customer service — something is shifting in how people use AI. It’s a bigger change than another model release or a new feature drop: it’s a change in the shape of how we work with AI.

The short version is that we’re moving from asking AI to do things to handing work over to it. Put another way, the way we interact with AI is shifting from conversational co-creation to delegation.

In this week’s post I ask: is this also happening in L&D and, if so, what does it mean for our roles and skills?

Let’s dive in!

The Frustrations & Limitations of Conversational AI

For the last few years, the way most of us have used AI in L&D has been conversational. You ask AI something, it answers, you check what it said, you ask again. Ethan Mollick calls this “co-intelligence” — we steer an AI model through a task, one turn at a time, keeping a hand on the wheel throughout.

Some of us are getting very good at this. As an industry we’ve built thousands upon thousands of prompt libraries and custom GPTs to speed up the work we were already doing — drafting learning objectives, generating quiz questions, summarising an SME interview, turning a transcript into a storyboard, rewriting content for tone.

This version of AI promised it would make us both faster and better, but in practice it’s delivered neither. Most of us don’t feel faster — we feel busier: more content to review, more drafts to wrangle, more tools to keep on top of, and the quiet sense that we’re treading water. Working harder, moving quicker, but somehow not getting any further ahead.

The reason is baked into how conversational AI works. However well we prompt it, we’re still working with the equivalent of a keen junior apprentice — one that produces fast, produces a lot, and needs our oversight, our instruction, and, critically, our QA on every single thing it hands back.

AI didn’t make us faster or better: it handed us a new job — managing an apprentice — bolted on top of the one we already had. And it arrived with a catch: leadership now expects the speed gains the tool promised, whether or not the tool actually delivers them.

From Co-Creation to Delegation

Over the last year — call it early 2025 to now — and partly in response to exactly those frustrations, other industries have moved on: from prompts to agents, from chatting with AI to delegating work to it.

In practice, instead of prompting a model turn by turn, people are handing whole multi-step tasks to AI agents: systems that don’t just answer, but go away, do the work, use tools, and come back with something finished. The shift is big enough to already have its own vocabulary — “agentic engineering.” In practice, this means moving the human’s job from doing the work to three new things:

→ Deciding what to delegate — judging which tasks to hand over and which to keep; defining where the AI’s capability starts and ends within our specialism.

→ Setting the brief — defining the goal, the constraints, and what “good” looks like clearly enough that an agent can run with it largely unsupervised.

→ Spot-checking the output — checking, correcting and owning what comes back, because accountability is never delegated.

The shift isn’t isn’t hypothetical. Here are three real examples from the last twelve months:

→ Software Engineering. Engineers increasingly describe their day as managing a handful of coding agents rather than writing code line by line — assigning a task, letting it run, reviewing what comes back. At Morgan Stanley, engineers pointed one agent at 9 million lines of ageing legacy code and had it do the slow, unglamorous work humans usually dread: reading it and translating it into plain-English specs developers could actually build from. The result: a reported 280,000 developer hours saved, and engineers freed from deciphering old code to do the work only they could do.
→ Law. Contract review — a slow, structured process that once cost days of billable hours — is being handed to agents that draft, red-line and return a marked-up contract in minutes. Salesforce’s in-house legal team reported cutting millions in outside-counsel costs after putting an agent on contract drafting and analysis.
→ Customer Service. Klarna’s agents now handle around two-thirds of customer chats — the high-volume, routine tier — resolving issues in under two minutes where it used to take eleven. The humans have moved up to what the AI can’t do well: the complex, emotional, high-stakes cases.

The pattern underneath all three is the same: the human stops being the one at the keyboard doing the task, and becomes the one who sets the brief, steers the work, and makes the calls.

The State of Play in July 2026

How widespread is the shift from conversational co-creation to delegation in July 2026? The clearest picture we have comes from a large study of OpenAI’s Codex agent, conducted by researchers from OpenAI, Columbia, Wharton & Duke. Here’s the TL;DR:

→ The heaviest AI users don’t chat more — they run teams of agents. More than 10% have three or more running at once in a given week; at the frontier inside OpenAI, one person directs around 71 hours of agent work in a single day. Not one person working 71 hours — one person orchestrating it.

→ Agentic use is climbing fast — weekly active users grew more than fivefold in the first half of 2026.

→ The tasks are getting bigger. The share of users delegating something that would take an experienced human 8+ hours jumped from 2.1% to 25.6% in six months. More than 70% now hand over at least an hour’s work in a single go.

→ It isn’t just coders. The fastest growth is among non-developers — drafting documents, running analysis, producing artefacts. The work we’d recognise.

This change is still early and uneven. The same research is candid that most agentic workflows still lean heavily on human expertise — both to design them and to run them day to day — and plenty of pilots never reach production, because (just like conversational AI) the technology can simply shift the workload around, or even add to it.

But the direction is unmistakable. The frontier has already turned: away from us doing the work faster with an assistant at our elbow, and towards directing agents that do the work for us. It’s early, it’s uneven, and it’s far from solved — but it’s coming.

The big question is what it means for us…..

The Current State of Agentic AI in L&D

The question I’ve been exploring recently is: where are L&D teams on the AI continuum that stretches between prompting generic AI models and building autonomous AI “team mates”?

The answer: we’re not as far as along the continuum as the hype suggests — but some are further than you might think.

Some of the initial pieces of “Agentic L&D” are already being built and tested, including on my bootcamp. A few real examples that people have built on the bootcamp include:

→ A feedback-to-improvement agent that reads post-course surveys, spots the recurring problems, and sends the designer an improvement brief before the next run — with a human signing off the AI’s read before anything changes.

→ A policy-monitoring agent that watches for research or regulatory changes and flags exactly which training needs updating when something changes.

→ A manager-intake agent that runs the diagnostic conversation for a training request — asking the hard questions, pushing back on weak framing in real time, and gathering the evidence before anyone agrees to build.

What's yet to emerge in L&D is the joined-up version: a holistic and strategic review of what the technology can and can’t do in our domain, and the creation of a connected system of agents, each owning a stage of the work and feeding the next — with humans directing the whole cycle rather than running each piece by hand.

What might this Agentic L&D system look like in practice?

The Agentic L&D Team: a Hypothesis

I’ve been working on this question for a while now — across my own practice, my client work, and every cohort of the bootcamp — trying to sketch what a genuinely joined-up agentic L&D system might actually look like.

What follows is my hypothesis: a picture of where I think this is heading, based on what I know about L&D and what’s already happening elsewhere, offered to be pulled apart rather than adopted wholesale.

👩🏽‍🦰 The Human L&D sets the intent & goal. An L&D pro surfaces or receives a problem statement and turns it into a goal: “we need X to do Y so we can Z”.

↓

🤖 The Diagnosis Agent. Takes the problem statement and rapidly runs problem definition cycles to define the root cause and 2-5 optimal solutions. This might include the verdict that it’s not a learning problem but - for example - a comms problem.
↓

👩🏽‍🦰 The Human L&D chooses if & what to build based on the data. This is the first real judgement call: what’s problem and the optimal solution? Should we build and test a single solution, or multiple? The human decides and approve the spec(s) to build, e.g. a job aid and/or an AI coach.
↓

🤖 The Blueprint Agent builds a v1. Using the spec from the human L&D which includes the problem statement and any available contextual information, e.g. an existing deck on the topic, the build agent builds a prototype with and populates as many holders as possible.
↓

👩🏽‍🏫 👩🏽‍🦰 The Human L&D + SME review & complete the blueprint. Experts are far better reacting to something concrete than answering questions into thin air. And this is where the truth-check lives: is what we have accurate & current?
↓

🤖 The QA Agent break-tests the blueprint. Does it map to the problem statement? Does is align with in-house quality standards?
↓

👩🏽‍🦰 L&D makes the release call. Ship it, or send it back. A go / no-go decision which sits squarely with the human.
↓

🤖 👩🏽‍🦰 The Monitor Agent runs after launch — nudges, completions, and the real prize: measuring what actually happened against the intent, then feeding that back into the next cycle. That’s the arrow that closes the loop. The learning system starts learning about itself.

In practice, the agentic model has the potential to help solve three of the most wicked problems in L&D:

Our Work Could Get Faster — because the work that eats up our time stops eating up so much time. In this model, agents do the heavy lifting in parallel while we’re doing something more important, like analysing data. Meanwhile, the SME - instead of being cornered for a blank-page briefing - reacts to a real draft in a fraction of the time. We’re not shaving minutes off discrete tasks the way conversational AI did; we’re handing whole stages over and refocusing our efforts on what matters most.

Our Work Could Get Better — because the two things that most affect quality finally get proper attention. Right now diagnosis gets rushed (there’s a course to build) and evaluation gets skipped (there’s the next course to build). The agentic loop protects both: nothing gets built until the Diagnosis Agent and a human have agreed it’s a real learning problem, and nothing is called “done” until the Monitor Agent measures whether it actually worked. Add the SME reviewing a concrete draft rather than answering cold, and QA break-testing before a human ever looks — and the quality bar moves up at exactly the points where we usually let it slip.

We Could Bring Joy & Value Back to L&D Work — because everything the machines take on is everything that was never really the point of our jobs anyway. We didn’t get into this work to format SCORM packages or wrangle version six of a storyboard. We got into it to work out what people actually need, to design something that changes what they do, and to know whether it worked. That’s diagnosis, judgement and impact — the parts of the craft we’re trained for and rarely have room to practise. Strip out the production, and what’s left isn’t a smaller job - it’s the job most of us always wanted.

Closing Thoughts: Agentic AI & the Professionalisation of L&D?

So where doe all this leave us? Three closing thoughts from me:

First — we are a long way off. Let’s be honest about it. Almost nobody in L&D is running the full loop today; most of us are doing one or two steps with AI and the rest by hand. The joined-up system I’ve sketched is a hypothesis about the destination, not a description of the road. Even the industries furthest ahead haven’t fully closed their own loops. So if none of this is your reality yet — good, you’re in the overwhelming majority.

But — it’s a clear signal of the direction of travel. This is the thread running through the whole post: the shift from conversation to delegation, from doing the work with AI at our elbow to directing agents that do it for us. It’s already reshaping software, law and customer service, and there’s no good reason to think L&D is exempt. We don’t have to be at the destination to start walking towards it — and the first steps (using AI to diagnose, not just produce; to monitor impact, not just ship) are available today.

And — far from disempowering us, agentic AI has the potential to professionalise us. I hear this from L&D folks a lot: if we agents do the building, what’s left for us? But look at what gets handed over versus what stays. What gets automated and delegated is the low-value work that was never really the point — formatting, wrangling, chasing, producing. What stays with the human is the part that was always the actual craft: working out whether there’s a real problem, deciding what’s worth building, judging whether it’s any good, and knowing whether it worked.

And here’s the evidence that this isn’t wishful thinking. Across the studies, the thing that predicts good output from AI isn’t technical skills — it’s depth of domain expertise. In the BCG/Harvard study of 758 consultants, people working inside AI’s “jagged frontier” saw up to 95% time savings; those working outside it, without the expertise to tell good from good-enough, did 19 percentage points worse than people using no AI at all. Used by someone who can’t judge the output, AI doesn’t speed you up — it just produces confident, polished, well-formatted rubbish at scale.

All of which means the more capable these agents get, the more - not less - valuable deep L&D expertise becomes. When anyone can generate a course in an afternoon, the scarce thing isn’t production — it’s the judgement to know which course is worth making, whether it’s any good, and whether it changed anything. That judgement has always been the real craft. For years we’ve been buried in production, pulled away from it. The quiet promise of this shift is that it hands the production to the machines and hands the craft back to us.

Happy orchestrating!
Phil 👋

PS: Want to learn how to build your own team of L&D agents? Apply for a place on my AI Bootcamp for L&D.

Dr Phil's Newsletter

Discussion about this post

Ready for more?