Which AI Video Tool Is Most Powerful for L&D Teams?
Evaluating four popular AI video generation platforms through a learning-science lens
Hey Folks,
Happy new year! One of the biggest L&D stories of 2025 was the rise to fame among L&D teams of AI video generator tools. As we head into 2026, platforms like Colossyan, Synthesia, HeyGen, and NotebookLM’s video creation feature are firmly embedded in most L&D tech stacks. These tools promise rapid production and multi-language output at significantly reduced costs —and they deliver on a lot of that.
But something has been playing on my mind: we rarely evaluate these tools on what matters most for learning design—whether they enable us to build instructional content that actually enables learning.
So, I spent some time over the holiday digging into this question: do the AI video tools we use most in L&D create content that supports substantive learning?
To answer it, I took two decades of learning science research and translated it into a scoring rubric. Then I scored the four most popular AI video generation platforms among L&D professionals against the rubric.

What I found reveals some major differences in how well these tools support learning—differences that matter far more than avatar realism or rendering speed.
Let’s dive in!
Using Video for Learning 101
Let’s start by clearing up a myth. Ask any learning professional about how to use video content, and many will say: “People can’t pay attention for more than six to eight minutes.” But that claim didn’t come from controlled learning research—it came from a Microsoft Canada (2015) marketing report about web browsing behaviour (specifically, how quickly people click away from pages).
The real insight is this: the brain doesn’t have a “natural six-minute shut-off switch.” As cognitive psychologist Gemma Briggs from The Open University put it, “Attention depends on the task and context, not some biological timer” (Briggs, 2017).
What does exist is a behavioural pattern: drop-off. When researchers analysed 6.9 million MOOC video sessions, they found steady attrition: roughly 100% watching in the first three minutes, about 50% by six to nine minutes, and around 20% by nine to twelve minutes (Guo et al., 2014).
Similar patterns show up in medical education (Kirn, 2019) and engineering courses (Manasrah et al., 2021). Seidel’s experiment (Seidel, 2024) also showed that segmented longer videos can outperform continuous longer-form ones on both memory and transfer—even when total duration is higher.
But—and this is crucial—people stopping the video doesn’t necessarily mean they stopped learning. Navarrete et al.’s review of 257 studies found that well-designed interactive videos of 10–15 minutes produced learning equal to or better than shorter videos (Navarrete et al., 2025). Two strands of research help explain why:
Szpunar et al. showed that quizzes interpolated between video segments reduce mind-wandering and boost final test scores (Szpunar et al., 2013). Geri and Winer found that adding interactivity can expand learners’ effective engagement beyond the so-called “six-minute limit” (Geri et al, 2017).
The takeaway: video length is a crude proxy for impact. What matters is whether the design reduces unnecessary cognitive load and supports productive processing (Sweller, 1988; Mayer, 2022).
This reframes how we build and evaluate video platforms, shifting us from questions like “Does this tool help me make short videos?” to more substantive questions like, “Does this tool help me create videos that actually produce learning?”
Evidence-Based Principles for Video-Based Learning
The next big question, of course, is how do we design videos that actually produce learning? If we take a look at the research published over the last couple of decades, six principles consistently emerge:
1. Intentional Segmentation
Breaking content into meaningful chunks improves learning. Segmented videos beat continuous videos on memory and transfer (Seidel, 2024). Well-structured interactive videos of 10–15 minutes can perform as well as or better than shorter videos (Navarrete et al., 2025).
Why it works: Segmentation prevents working memory overload and creates pause points for processing (Mayer, 2022). When learners view continuous, information-dense video, they must simultaneously hold information in working memory while new material arrives—leading to cognitive overload that inhibits learning (Thompson et al., 2021).
Research using lightning formation videos found that participants who viewed presentations in short segments performed significantly better on transfer tests than those viewing continuously (Mayer & Chandler, 2001). A 2023 study on vocabulary learning found that high segmentation (self-controlled navigation across 10 slides) reduced cognitive load and improved retention compared to continuous presentation (Liu, 2024).
2. Embedded Retrieval Practice
Quizzes inserted between segments reduce mind-wandering and boost performance (Szpunar et al., 2013). Retrieval within video content strengthens memory and interrupts passive viewing (Mayer, 2022).
Why it works: The testing effect—also called retrieval practice—shows that material recalled during learning is retained far better than material studied for equivalent time (Roediger & Karpicke, 2006). In the landmark study, students who studied material once and tested three times recalled 50% more one week later than students who studied four times with no testing.
Retrieval produces both direct effects (the act of retrieving itself strengthens memory) and indirect effects (retrieval outcomes guide learners to adjust study strategies) (Yong et al, 2016). This effect extends specifically to video-based learning: research using Coursera lectures found that repeated testing produced better long-term retention than repeated studying, even when immediate recall favoured studying.
3. Strategic Signalling
Well-used visual and verbal cues in video content improve recall, but over-signalling can backfire (She et al., 2024).
Why it works: The signalling principle states that learning improves when cues highlight essential information—but only when used selectively (Mayer, 2022). Signalling guides attention to what matters, helping learners distinguish signal from noise. However, excessive or multi-coloured highlighting increases cognitive load without improving learning—the brain must process both the cues and the content, splitting attention (She et al., 2024).
Effective signalling physically and temporally integrates text with visuals, using 3–4 selective cues per segment rather than cluttering screens with competing emphases (Mayer, 2005).
4. Instructional Presence
Social cues trigger cognitive and emotional processes that support learning. Seeing a presenter (human or hyper-realistic AI) increases motivation, trust, and transfer (Garcia & Yousef, 2023).
Why it works: Social agency theory proposes that visible instructors providing social cues (gestures, facial expressions, eye contact) create a sense of social presence and interpersonal interaction, even when the instructor isn’t physically present (Fiorella & Mayer, 2021). This perceived social presence motivates learners to increase cognitive effort to understand content correctly, resulting in deeper processing and better learning outcomes.
A 2022 meta-analysis found that instructor presence increased perceived social presence (effect size g = 0.35) and affective/motivational ratings, though effects on learning outcomes were moderated by video pacing (learner-controlled vs. system-paced) and the quality of social cues provided (Beege et al., 2023).
A 2022 systematic review confirmed that learners express positive feelings toward instructor-present videos and that social cues (not just presence alone) foster learning, though the instructor must be positioned strategically near key content rather than distractingly overlaying it (Polat et al., 2022).
Parasocial interaction—the affective and cognitive connection learners form with on-screen instructors—can positively impact learning when instructors display humanlike gestures coordinated with instructional content (Polat, 2022).
5. Learner Control
Autonomy—the internal sense that one has agency and volition—is a core psychological need in Self-Determination Theory that, when satisfied, increases motivation, engagement, and positive learning outcomes (Gagne et al, 2022). Online learners can retain 25–60% more information than in traditional classrooms when they can learn at their own pace and revisit challenging content without time pressure.
The segmenting principle specifically states that people learn better when multimedia is presented in learner-paced segments rather than continuous units, allowing learners to manage cognitive load by pausing when needed (Mayer, 2022).
6. Distributed Practice (Spacing)
Revisiting content across multiple sessions dramatically improves long-term retention (Cepeda et al., 2006; Kang, 2016). Forgetting + re-learning builds stronger memory than cramming. Treat video-based learning as a sequence with follow-ups days/weeks later.
Why it works: The spacing effect is one of the most robust findings in learning science. Cepeda et al.’s (2006) meta-analysis of 317 experiments (839 assessments) found that distributed practice consistently beats massed practice, with effect sizes that increase as retention intervals lengthen (Cepeda et al., 2006).
Spacing allows some forgetting between sessions, and the effortful retrieval required during re-learning strengthens memory traces more than continuous exposure. Each retrieval modifies and enhances the ability to reconstruct that knowledge in the future (Roediger & Karpicke, 2006). In practice, this means a single video session—no matter how well-designed—will produce far weaker long-term retention than a sequence of shorter sessions spaced days or weeks apart.
Together, these six principles reduce cognitive overload, strengthen memory, activate motivation, respect autonomy, and build durable retention. Video length doesn't predict learning—instructional design does.
Assessing the Impact of Video-Generation Tools on Learning
An easy way to assess how well any video generation platform supports learning is to turn the six evidence-based principles into a scoring rubric which maps back to the research. So, I did just that. Here’s the TLDR:
Segmentation Support: How well does the platform make it easy to chunk content into scenes or chapters?
Retrieval Practice: Can you embed quizzes directly in videos, at any point in the video?
Signalling & Guidance: Are there tools for text highlighting and emphasis that don’t overwhelm?
Learner Control: Can learners navigate by chapter, adjust playback speed, and replay sections?
Spacing Workflow: Does the platform help you create and schedule follow-up content for spaced practice?
Instructor Presence: Do avatars/presenters show realistic facial expressions and gestures?
Measurement Quality: Does the platform measure learning outcomes (quiz scores) or only engagement (completion)? Can you track results through SCORM or xAPI?
Iteration Stability: Can you edit videos without starting over while keeping instructional structure intact?
You can access my full scoring rubric and try it for yourself here.
Findings From My Initial Tests
To assess the status quo, I asked: right now, how well does each tool support the instructional mechanisms research says matter most?
Here’s a summary of how each platform scored against my rubric:

Overall, I took three key takeaways from the tests:
🥡 An Emerging Divide
The most striking split for me is the fact that only Colossyan and Synthesia score ≥4 (Strong) on embedded retrieval practice—what research identifies as one of the strongest predictors of retention (Szpunar et al., 2013). Both platforms offer built-in quizzes with manual designer placement, SCORM/xAPI export, pass-rate tracking, and immediate feedback.
HeyGen and NotebookLM score very poorly: they have no native quiz capability, no way to embed retrieval prompts, and no score tracking. They can scale content production efficiently, but they don’t easily support the mechanism that turns watching into durable learning.
This matters because L&D teams often compare tools based on avatar quality and production speed—criteria that don’t predict learning outcomes. Colossyan and Synthesia also support learning-relevant measurement: capturing quiz performance data, pass rates, and learner-level results—the kind of feedback loop L&D teams rely on to improve instruction (Szpunar et al., 2013). If you can’t measure learning, you can’t iterate with confidence.
🥡 Mind the Spacing Gap
One critical gap that’s consistent across all four tools is that none automate or optimise distributed practice workflows. All four score between 1–2 out of 5 on Spacing Workflow.
Despite decades of evidence that spacing is essential for durable learning, it’s not built into the default workflow of any platform I reviewed. Until spacing automation arrives, L&D teams must manually schedule follow-up quizzes and application tasks 2–3 days and 1–2 weeks after initial video viewing—outside the video platform entirely.
🥡 Scaffolding vs. Control
Among the two tools that best support learning (Colossyan and Synthesia), features and approaches diverge—and these differences matter for when and how we use each tool.
Colossyan: learning-first architecture & scaffolding
Colossyan stands out for embedding pedagogical frameworks directly into authoring—problem-centered tasks, Gagné’s Nine Events, Bloom’s taxonomy, cognitive load management, and Universal Design for Learning. Its branching scenarios, conversational interactions, quiz analytics, and scene drop-off tracking support a “measure-iterate-improve” loop aligned with how L&D teams actually work.
Synthesia: learner experience & precision control
Synthesia stands out for learner-facing control and polished viewing experience. It’s the only platform scoring 5/5 on both Segmentation Support (full chapter timeline visible in player) and Learner Control (speed adjustment 0.75–1.5×, keyboard shortcuts, jump-to-chapter navigation)—fully matching the research standard (Seidel, 2024; Navarrete et al., 2025).
Tips for Platform Builders
If those who build AI video platforms want to optimise for substantive learning—not just production costs and timeframes—they need to prioritise five things:
#1: Embed & automate spacing workflows
Not one of the most popular AI video tools used by L&D teams automates distributed practice, despite strong evidence it improves retention (Cepeda et al., 2006; Kang, 2016).
Action for product builders:
Auto-generate short “booster” videos 3 days and 1 week after viewing, resurfacing key concepts with new examples.
Trigger quiz reminders or reflection prompts at research-backed intervals (48 hours, 7 days, 30 days).
Integrate with LMS calendars so spacing becomes part of the authoring flow.
#2: Add safeguards against over-signalling and cognitive overload
Platforms provide overlays, animated text, highlights—but none intentionally prevent overuse. Too many cues increase cognitive load without improving learning (She et al., 2024).
Action for product builders:
Implement “cue budgets” (e.g., “You’ve added 8 highlights in this 2-minute scene—research suggests 3–4 is optimal”).
Default to single-colour cues (and require an explicit choice to add multi-colour emphasis).
Offer real-time cognitive load estimates based on screen density, text length, and narration speed.
#3: Make retrieval practice more granular and more generative
The research standard is interpolated prompts every 3–5 minutes, ideally generative (explain/apply) rather than recognition-only (Szpunar et al., 2013; Geri et al, 2017).
Action for product builders:
Add AI-generated retrieval prompts at concept boundaries (e.g., “Before we move on, explain why X matters in your role”).
Support open-text responses (reviewed by instructors or scored via AI rubrics).
Auto-place quiz markers every 3–5 minutes by default, with designer override.
Track item-level analytics so teams can fix the exact points where learners struggle.
#4. Build instructor-presence features based on social research—not just avatar realism
Research on learning emphasises the importance of positioning, eye contact, and coordinated gestures with visuals (Garcia & Yousef, 2022), not just “realistic faces and movements.”
Action for product builders:
Auto-position avatars next to key content (not on top of it).
Add gesture triggers (point, circle, emphasise) synced to on-screen elements.
Support “talking head + annotation” modes where the instructor marks up slides while visible.
#5. Turn measurement into learning intelligence (not just reporting)
Completion data is great, but to enable data-informed iteration dashboards should surface instructional diagnoses, e.g. “65% missed Question 3—consider adding a worked example before this concept.”
Action for product builders: build analytics that translate learner performance into specific revision recommendations.
How to Get the Most from Any AI Video Generator
In the meantime, what can you - as an L&D pro - do to mitigate the risks and optimise the benefits of AI-video generation platforms? Here are my two top tips:
Step 1: Evaluate your tool using my rubric
Score your platform using my rubric. You’ll instantly see what the platform supports—and what your design process must compensate for.
Step 2: Intentionally fill AI’s pedagogical gaps
No segmentation? Break scripts into logical 5–10 minute concept chunks before uploading. Add chapter titles or timestamps in descriptions.
No retrieval practice? Pause every 3–5 minutes and embed quiz slides or reflection prompts in your LMS flow between segments.
Weak instructor presence (e.g. voice-only)? Add a brief human intro/outro to create social connection at key moments.
No spacing support? Manually schedule follow-up quizzes and application tasks 2–3 days and 1–2 weeks later (LMS or email).
No impact measurement? Add pre/post checks and delayed retention tests (1 week, 1 month) to assess whether learning stuck.
Weak learner control? Provide transcripts, chapter breaks, and downloadable resources so learners can navigate and review.
No signalling safeguards? Audit for visual clutter; aim for ~3–4 cues per 2-minute segment, using single colours.
Conclusion: The Current & Future State of Video Generation in L&D
The current generation of AI video tools has made huge strides in production speed and avatar realism. But only two platforms—Colossyan and Synthesia—support evidence-based instructional design in ways that matter most: embedded retrieval practice, learner control, and learning-relevant measurement (Szpunar et al., 2013; Seidel, 2024; Navarrete et al., 2025).
Even then, critical gaps remain: none automate spacing, none actively prevent cognitive overload from over-signalling, and none make frequent generative retrieval practice the default.
The research is clear: segmentation reduces wasted mental effort (Seidel, 2024). Retrieval practice strengthens memory (Szpunar et al., 2013). Spacing creates durable retention (Cepeda et al., 2006; Kang, 2016). Measurement enables improvement. These aren’t “nice to haves”—they’re the mechanisms that determine whether learning sticks.
For AI video generation to truly transform L&D—not just accelerate it—vendors need to build instructional design workflows that make bad instructional choices hard and good ones easy.
The future of video-based learning isn’t faster production or more convincing avatars—it’s tools that operationalise learning science by default, so that creating effective instruction becomes the path of least resistance. In this world, AI video platforms don’t just make video—they manage learning sequences, track performance over weeks and diagnose where learning falls down.
In practice, this might mean that the “AI video tool” might in the future look less like a content generation tool and more like a holistic learning platform.
One thing is clear: the AI-video-generation platforms that will most likely win in the L&D space won’t be those which focus only on building the best avatars or most engaging visuals — they’ll be the platforms that enable and orchestrate effective learning at pace and scale.
Until then, once again, the headline is this: AI needs you. Specifically, it needs your domain expertise to bridge the gap between what tools enable and what substantive learning requires.
Happy experimenting!
Phil 👋
PS: Want to learn more about how to get the most from AI as an instructional designer? Apply for a place on my AI & Learning Design Bootcamp — the world’s most popular AI course for educators.




Hi Dr. Hardman, new subscriber here and I really enjoyed your detailed breakdown of instructional video and what aspects contribute to more durable learning. It is a great blueprint. One reaction I had, I prefer HeyGen and give it an A for its steerability in particular when used with some of 11labs cutting edge voice models. I am able to tune an engaging speaker experience and then I export this video to another tool where I build in segmentation and control the strategic signalling. Then our LMS has all the assessment tools baked in for embedded retrieval.
I appreciate Colossyan and Synthesia have a one stop shop model with quizzes and scorm tools. Unfortunately, we received negative feedback from learners on the quality of the avatars, they are still in an uncanny valley stage. However we like HeyGen for its high quality video output and use it only for this tool, then bring it into other workflows to improve the learning experience. Thanks again for your research on this topic, I hope more people use it when creating instructional videos!
Cheers
Brick
Director of Enterprise Learning and Development
www.heatspring.com
Certified B Corp