Your Synthetic Educator Looks Perfect, But Can It Actually Teach?
How to script effective AI tutors with the PERSONA framework—and the ethical questions we can't ignore
Hey folks!
Earlier this week I observed a group of learners watch an AI-avatar-presented training. The avatar was flawlessly rendered, the voice crystal clear, its movements eerily realistic, but in the debrief after the session we concluded that what the synthetic tutor said was both distracting and forgettable.
In the last 2-3 years, AI avatars have evolved from futuristic novelty to an increasingly mainstream feature of corporate training. With platforms like Synthesia making avatar-based video creation accessible to every L&D team, we're already seeing a dramatic shift in how organisations deliver training at scale with AI avatars (literally) front and centre. This shift represents more than just a technological upgrade—it's fundamentally changing how we think about the relationship between instructor and learner.
The business case is compelling: avatar-based training can save hours of production time and millions of dollars each year by eliminating the need for human presenters, reducing reshoot costs, and enabling rapid content updates. Organisations can create consistent training experiences across global teams, overcome scheduling constraints with subject matter experts, and scale personalised instruction in ways that were previously impossible.
But what does this transformation mean for the learning process itself? Thankfully, the last 18 months have seen a surge in peer-reviewed research on avatar-based learning, and the findings paint a fascinating picture.
Recent studies demonstrate what while AI technology promises revolutionary change, the reality is far more nuanced; in practice, the difference between the success and failure of AI-avatar trainers as learning professionals lies not in the sophistication of the technology but in how we write the scripts that we paste into it.
This distinction is crucial because it shifts our focus from the dazzling technical capabilities of avatar platforms to the fundamental principles of effective communication and learning design. The most photorealistic avatar in the world becomes a barrier to learning if it delivers content in ways that violate basic principles of human cognition and motivation.
Let's dive into what the research reveals about scripting for synthetic educators—both how to get it right, and the more difficult question of what it means if and when we do…
Part I: What the Research Tells Us About the Risks of Synthetic Tutors
How we script AI video content isn't just important—it's potentially transformational. When we get the scripting wrong, we use sophisticated AI to amplify the very problems that make so much e-learning ineffective:
lifeless delivery that builds zero connection and erodes trust (Lind, 2024)
rule-focused messaging that kills motivation (Ng, Hao & Zhang, 2025)
cognitive overload that ensures nothing sticks (Mayer, Fiorella & Stull 2020; Slemmons et al. 2018)
Understanding these risks is essential because avatar technology doesn't just replicate traditional training problems—it can actually amplify them. The sophistication of the visual presentation creates expectations for human-like interaction that poorly written scripts completely fail to meet.
So, what does peer-reviewed research suggest are the biggest risks when working with AI-avatar video content?
Risk #1: The Uncanny Valley of Corporate Communication
Lind's (2024) groundbreaking experiment compared onboarding videos narrated by a photo-realistic Synthesia avatar with identical content delivered by a human spokesperson. The results revealed a troubling pattern that challenges our assumptions about when and how to use avatar technology.
While objective knowledge retention remained stable across both groups—suggesting that the information was successfully transmitted—learners who detected that the presenter was AI reported noticeably lower levels of trust, engagement and overall satisfaction. This wasn't simply a matter of preference; it represented a fundamental breakdown in the learning relationship.
However, recent research by Baake, Schmitt & Metag (2025) challenges the universality of the uncanny valley effect. Their study of 491 participants found no uncanny valley effect in science communication contexts, with more realistic avatars actually being perceived as more trustworthy across all dimensions. This suggests the uncanny valley may be context-dependent and perhaps less pronounced in educational settings than previously assumed.

The issue wasn't just visual realism; it was the mismatch between appearance, language and behaviour that created what researchers call "authenticity dissonance." When an avatar looks human but sounds like a corporate memo, learners experience cognitive dissonance that actually drives them away from the content—even when that content is objectively sound.
Perhaps most concerning, the Baake study revealed persistent gender bias: male avatars were consistently rated as more competent than female ones, reflecting broader patterns identified by the World Economic Forum's research showing that women hold just 12.2% of STEM-related C-suite jobs and make up less than one-third of AI-skilled professionals.
Risk #2: The Cognitive Overload Trap
Despite decades of clear research guidelines about cognitive load management, many avatar implementations violate basic principles that we know are essential for effective learning. This represents a fundamental misunderstanding of how human memory and attention work.
Research consistently shows that microlearning can improve retention by up to 80% compared to traditional learning methods. The human brain can only process a limited amount of information simultaneously, and when we exceed those limits, learning effectiveness plummets dramatically.
Slemmons et al. (2018) found that 2–4 minute segments with natural break points improve focus and reduce cognitive overload. This represents a significant improvement—the difference between content that sticks and content that's immediately forgotten. Recent research confirms that microlearning groups score significantly higher than control groups across multiple learning domains.
Yet many corporate avatar implementations use longer segments with faster delivery speeds. This approach violates established principles about optimal learning conditions.
Research on speech rate comprehension validates the 125–150 words per minute recommendation, showing that medium speech rates around 154 WPM are most preferred for reading/listening synchronisation. When avatars speak faster than this, learners struggle to process both the visual and auditory information, leading to cognitive overload and reduced retention.
The irony is striking: organisations invest in expensive avatar technology to improve learning outcomes, then use scripts that actually make learning less effective than traditional approaches. The technology enables us to control pacing with unprecedented precision, but only if we deliberately design for human cognitive capacity rather than trying to maximise information density.
Risk #3: The Emotional Disconnect
Research on emotional design in multimedia learning reveals that emotional elements can induce positive emotions and significantly improve learning outcomes. Studies show that emotional design groups report significantly higher positive emotions (M=3.59 vs M=2.75) with a large effect size (Cohen's d = 1.15).
Yet many avatar implementations default to neutral, "professional" delivery that systematically strips emotion from learning experiences. This approach works against decades of cognitive-science insights into memory formation and knowledge transfer.
The problem is particularly acute with avatar technology because the visual sophistication creates expectations for human-like interaction. Research emphasizes that social presence requires three components: affective expression (including emotional expression), open communication, and group cohesion. When learners see a realistic human avatar, their brains activate social learning pathways that expect emotional authenticity.
When authenticity isn't present in the script, the disconnect becomes a barrier to learning rather than a bridge to understanding.
Risk #4: Cultural Assumptions and Accessibility Gaps
While avatar technology holds remarkable promise for inclusive learning, current implementations often embed cultural assumptions that limit effectiveness and can actually increase inequity rather than reducing it.
Research emphasises that cultural adaptation is crucial for optimising learning outcomes and requires more than just translating content—it demands understanding and respecting diverse cultural contexts. AI algorithms can be biased and discriminatory, often focusing on "norm" data that excludes marginalised perspectives.
Most avatar personas reflect dominant cultural communication styles—direct eye contact, specific gesture patterns, individualistic language frameworks—potentially alienating learners from cultures that prioritise different communication norms. As AI systems become more embedded in education, the need for AI development teams to understand and navigate cultural nuances has become paramount.

The World Economic Forum's research warns that AI could entrench existing gender disparities unless organisations actively embed parity into their technological transitions. This creates accessibility challenges that extend beyond traditional considerations like captions and audio descriptions to include cultural responsiveness and bias mitigation.
Part II: How to Get the Most from Synthetic Educators
Understanding the risks is crucial, but the research also reveals where avatar technology can genuinely transform learning for the better. Rather than dismissing avatars wholesale or embracing them uncritically, the evidence points toward strategic implementation—leveraging AI's strengths while maintaining human insight where it genuinely excels.
Understanding these opportunities is crucial because avatar technology, when implemented thoughtfully, can address persistent challenges in traditional e-learning approaches. Research demonstrates that digital humans can be customized in terms of appearance, language, personality, script and gestures, providing customizable training that isn't possible with conventional online technologies.
The Incarnate Persona Revolution
One of the most significant findings in recent avatar research comes from Ng, Hao & Zhang's (2025) study, which revealed that avatars designed with an "incarnate" persona—presented as internal, named former employees rather than external professionals—are perceived as dramatically more authentic and trustworthy.
This finding challenges conventional wisdom about avatar design. Many organizations assume that using obviously artificial or generically professional avatars is safer and more appropriate for corporate contexts. The research suggests the opposite: learners respond more positively to avatars that have clear identities, backgrounds, and personal stakes in the content.
Enhanced authenticity, particularly when combined with altruistic messaging and careful attention to voice selection, leads to measurably higher engagement and more positive training outcomes. This isn't just about learner satisfaction—it translates into improved knowledge retention and behavioural change.
The concept of incarnate personas goes beyond simply giving an avatar a name. It involves creating a coherent identity that includes professional background, personal motivation for presenting the content, and authentic connection to the organizational context. When learners understand why this particular avatar is qualified to teach this particular content, trust and engagement increase significantly.
The Conversation Advantage
Mayer's extensive and well-known work on multimedia learning (2021) demonstrates a fundamental principle that transforms how we should approach avatar scriptwriting: conversational language—using "you," "we," and direct address—increases attention and cognitive processing through what researchers call the personalisation principle.
This approach aligns with research emphasising that online environments must intentionally implement social presence opportunities to foster transformative learning. Research on transformative learning in online environments shows that adult learners change mindsets through critical reflection in safe environments that encourage dialogue, create learning community, promote critical thinking, and integrate new technologies.
Mayer, Fiorella & Stull (2020) describe this phenomenon as "deeper cognitive engagement." Learners don't just passively receive information; they actively process it as if participating in meaningful conversation. This shift from consumption to participation represents a fundamental change in the learning dynamic.
The practical implications are significant. Instead of writing scripts that sound like policy documents or presentation slides, we should be writing scripts that sound like explanations from a knowledgeable colleague. This means using personal pronouns, acknowledging shared challenges, and framing content in terms of mutual problem-solving rather than information transfer.
The Optimal Pacing Revolution
Slemmons et al. (2018) found that 2–4 minute micro-segments reduce cognitive overload—a significant improvement that can mean the difference between effective learning and complete content failure. Research consistently shows that microlearning modules are completed 60% faster than traditional e-learning courses while boosting learner motivation by 55%.
When paired with the 125–150 words per minute speech-rate window that research validates as optimal for comprehension, avatars can deliver substantial content without overwhelming learners—but only if instructional designers script for human cognitive capacity rather than trying to cram in maximum information.
VirtualSpeech data demonstrates the practical impact: learners improve skills up to 4x faster with hands-on practice exercises, and 95% report that practicing in VR helps them prepare better for real-world situations.
The technology enables unprecedented control over pacing. Unlike human presenters, who may speed up when nervous or slow down when tired, avatars can maintain consistent, optimal delivery speeds throughout entire programs. We can build in strategic pauses for reflection, ensure that complex concepts are given adequate processing time, and create natural break points that respect human attention spans.
Part III: The PERSONA AI Avatar Scriptwriting Framework
Based on the convergent evidence from recent studies, I’ve put together a first draft of a more systematic approach to avatar scriptwriting that leverages what we know about effective learning while addressing the unique opportunities and challenges of synthetic educators:
P - Purpose-Driven Messaging Connect individual actions to broader impact
E - Emotional Authenticity Include appropriate concern, encouragement and realism
R - Relational Language Use "we," "you," and conversational tone throughout
S - Structured Pacing 2–4 minute segments with natural break points
O - Optimal Timing 125–150 words per minute with strategic pauses
N - Narrative Structure Setup → challenge → resolution → reflection
A - Altruistic Framing Emphasise helping others and shared purpose
PERSONA Framework in Action
Purpose-Driven Messaging
This transforms compliance-focused content into mission-connected learning. Instead of leading with rules and consequences, effective avatar scripts connect individual behaviours to meaningful outcomes for colleagues, customers, and communities.
❌ Poor example: "You must complete cybersecurity training to comply with company policy."
✅ Good example: "When you follow these cybersecurity practices, you're protecting not just our company data, but also our clients' personal information and your colleagues' livelihoods. Every secure password you create helps safeguard the trust our customers place in us."
Emotional Authenticity acknowledges the real feelings that learners experience when encountering challenging content, while providing encouragement and realistic perspective on implementation challenges. Research shows this approach can increase positive emotions significantly, with large effect sizes on learning outcomes.
❌ Poor example: "Workplace harassment is prohibited and will result in disciplinary action."
✅ Good example: "I know talking about harassment can feel uncomfortable—and honestly, it should. These are real situations that affect real people. But here's what gives me hope: when we all understand how to recognise and respond to these issues, we create a workplace where everyone can do their best work. You're already here learning, which tells me you care about your colleagues."
Relational Language
Using relational language - i.e. vocabulary and expressions which describe relationships between things, including spatial, quantitative and logical connections - creates partnership rather than hierarchy, using inclusive pronouns and acknowledging shared challenges and experiences. This approach supports research on both managing cognitive load and enabling “social presence” through open and clear communication.
❌ Poor example: "Personnel are required to follow established procedures when handling customer complaints."
✅ Good example: "Let's talk about what happens when you receive a frustrated customer call. We've all been there—the customer is upset, you want to help, but you're not sure what you can actually do. That's exactly why we have these guidelines—they're your toolkit for turning a bad day into a great customer experience."
Structured Pacing
Implementing carefully structured pacing respects human cognitive limits by breaking complex content into digestible segments with natural transition points and reflection opportunities. Research confirms that 2-4 minute segments are optimal, with microlearning showing significantly better retention than traditional methods.
❌ Poor example: One 8-minute segment covering entire compliance framework without breaks.
✅ Good example:
Segment 1 (3 mins): "What is workplace respect?"
[Natural break with reflection prompt]
Segment 2 (2.5 mins): "Recognising disrespectful behaviour"
[Break with scenario consideration]
Segment 3 (3 mins): "Your role in creating respectful culture"
Optimal Timing
Optimal timing means maintaining a 125-150 words per minute range that research identifies as ideal for comprehension, with strategic pauses that allow for processing and reflection.
❌ Poor example (180+ WPM): "Our company values include integrity honesty respect and collaboration and these values guide everything we do from how we interact with customers to how we make business decisions and it's important that everyone understands these values..."
✅ Good example (135 WPM): "Our company has four core values: integrity, honesty, respect, and collaboration. [1-second pause] These aren't just words on a poster—they guide every decision we make. [pause] From how you handle a customer complaint, to how you collaborate with your team, these values shape who we are as an organisation."
Narrative Structure
This principle follows the proven setup-challenge-resolution-reflection pattern that mirrors how humans naturally process and remember experiences. This approach supports transformative learning by encouraging critical reflection and meaning-making.
❌ Poor example: Direct instruction without story structure.
✅ Good example:
Setup: "Meet Sarah, a project manager who's been with us for two years."
Challenge: "Last month, she noticed her team lead making inappropriate comments about a colleague's appearance during meetings."
Resolution: "Sarah used our speak-up process to address the situation. She spoke privately with her manager first, who then worked with HR to provide coaching for the team lead."
Reflection: "Now pause and think—if you were in Sarah's position, what would have helped you feel confident about speaking up?"
Altruistic Framing
Altruistic framing emphasises how individual actions contribute to collective success and the wellbeing of others, tapping into intrinsic motivation rather than external compliance pressure.
❌ Poor example: "Complete this training to avoid disciplinary action and protect yourself from liability."
✅ Good example: "When you master these safety protocols, you're not just following rules—you're becoming someone your colleagues can count on. Your attention to these details means everyone goes home safely to their families. You're part of a team where everyone looks out for each other, and that starts with the choices you make every single day."
🚀 Phil's Top Tip: Notice how, overall, the effective examples feel like advice from a trusted colleague rather than instructions from corporate headquarters. That's the psychological shift that makes avatar training more effective—learners engage differently when they feel supported rather than managed.
Part IV: Concrete Examples
To demonstrate this framework and these principles in action, let's examine two approaches to workplace ethics training based on PwC New Zealand's Code of Conduct. The contrast illustrates how the same essential content can be transformed from compliance “noise” into a more optimised learning experience.
Example A: The Compliance Video Script
Avatar: Unnamed corporate presenter
Duration: ~52 seconds
Tone: Formal, directive
Purpose: Introduce Code of Conduct requirements
"Welcome to the PwC Code of Conduct training module. This session will cover your mandatory obligations under our ethical framework.
The Code of Conduct applies to all PwC personnel globally. You are required to comply with all provisions outlined in this document, which consists of four main areas: how we do business, relationships with colleagues, community responsibilities, and information handling.
Speaking up is a mandatory requirement. If you observe misconduct, you must report it through designated channels. The PwC Ethics Helpline is available for anonymous reporting where permitted by local laws.
Failure to report violations may result in disciplinary action. Retaliation is prohibited and will be subject to appropriate consequences as outlined in section 8 of the Code.
You will now proceed to complete the mandatory knowledge assessment. Please click 'Continue' to begin."
Why this approach is likely to disengage learners and limit learning effectiveness:
Anonymous delivery creates no personal connection or trust
Rule-focused messaging emphasises compliance over purpose and values
Passive information delivery provides no engagement or emotional connection
Threat-based motivation uses fear of consequences rather than inspiring positive action
Abstract policies are presented without relatable context or real-world application
Example B: The Research-Informed Video Script
Avatar: Aroha Māhina, Ethics & Culture Lead, 5 years at PwC
Duration: ~1 min 13 seconds
Tone: Warm, authentic, purposeful
Visual Context: Outside the PwC office in Auckland, with subtle Māori design elements
AROHA (warm, genuine smile):
"Kia ora! I'm Aroha Māhina, and I lead our Ethics and Culture team here at PwC. I've been part of our whānau for five years, and there's something really special I want to share with you."
[Gentle gesture toward korowai imagery on screen]
"See this beautiful korowai? It represents our Code of Conduct. Each fiber is one of us—when we all work together, we create something extraordinary that protects and strengthens our entire community."
[Lean in slightly, more personal tone]
"I know 'Code of Conduct' might sound formal, but it's really about something much more important—it's about how we care for each other and the trust our clients place in us."
[On-screen prompt: "Think about a time when someone had your back..."]
[2-second pause]
"Speaking up isn't about pointing fingers or getting anyone in trouble. It's about protecting what we've built together. Whether it's a concern about a project, or something that just doesn't feel right—your voice matters."
[Reassuring tone]
"And here's what I want you to know: when you speak up, you're not alone. We have support systems—the Ethics Helpline, your managers, our team—because we believe everyone deserves to feel safe and heard."
[Smile returns, gesture forward]
"So let's explore together how our Code helps us build trust and solve important problems—not just for ourselves, but for everyone we serve."
[Button: "Let's Build Trust Together →"]
Why this approach is more likely to engage learners and enable effective learning:
Authentic cultural connection through Aroha's persona and respectful use of Māori concepts
Purpose-driven messaging that connects compliance requirements to deeper values of care and trust
Emotional engagement that acknowledges potential discomfort while building psychological safety
Visual storytelling using the korowai metaphor to make abstract concepts tangible and meaningful
Inclusive language that creates partnership ("our whānau," "we") rather than hierarchy
Supportive framing that emphasises community and mutual support rather than rules and consequences
The contrast between these approaches illustrates how the PERSONA framework transforms not just the style of communication, but the fundamental relationship between learner and content.
The research-informed approach leverages every principle we discussed—incarnate persona, conversational language, emotional authenticity, altruistic framing, and narrative structure—to create an experience that feels genuinely helpful rather than bureaucratically required.
However, despite improvements in the second example, several factors could still limit student engagement and substantive learning, for example:
Accent and Cultural Authenticity
If the avatar’s accent does not match the cultural context of the script (e.g., an American accent using Māori greetings like "Kia Ora"), it can feel artificial or disingenuous to learners, undermining the intended authentic cultural connection. This disconnect may cause learners to disengage, as highlighted in the feedback above. Lind, S. J. (2024)Lack of Representation in Avatars
Avatars often reflect dominant cultural or demographic norms, which can result in a lack of representation for diverse learners. This may alienate students who do not see themselves reflected in the avatar’s appearance, voice, or communication style, reducing the sense of inclusion and relevance. Baake, J., Schmitt, J. B., & Metag, J. (2025).Persistent Gender and Cultural Bias
Research shows that male avatars are often rated as more competent than female avatars, and that avatars may unintentionally reinforce existing gender or cultural biases if not carefully designed. This can impact trust and engagement, especially among underrepresented groups. Baake, J., Schmitt, J. B., & Metag, J. (2025); World Economic Forum (2025).Authenticity Dissonance
When there is a mismatch between the avatar’s appearance, language, and behaviour (such as using culturally significant language without genuine connection), learners may experience "authenticity dissonance." This cognitive dissonance can erode trust and make the learning experience feel less credible.
Lind, S. J. (2024); Baake, J., Schmitt, J. B., & Metag, J. (2025).Limited Personalisation and Local Relevance
Scripts and avatars that are not adapted to local contexts or learner backgrounds may fail to resonate. Without customisation for regional language, accents, or cultural references, the training may feel generic and less engaging. Ng, J., Hao, M., & Zhang, Y. (2025).
Conclusion: The Future of Synthetic Educators
Zooming out, the rise of AI avatars in learning has revealed two troubling patterns that should give our profession pause.
Like magpies drawn to shiny objects, we—and the developers building avatar platforms—have become fixated on visual realism at the expense of learning-critical factors. We obsess over facial expressions, gesture accuracy and movement fluidity while paying too little attention to learning-critical elements like tone of voice, pacing, and pronunciation that actually determines learning effectiveness.
This misplaced focus explains why so many avatar implementations in L&D will likely fail. We're perfecting the wrong variables.
This brings us to second, more uncomfortable ethical question: if, as Lind’s research suggests, avatar effectiveness depends on their ability to convince learners that they are interacting with “real” humans, are we now building educational technology that requires deception in order to function optimally?
If transparency about AI undermines learning effectiveness, what does this tell us about the fundamental viability of avatar-based education?
In a world where KPIs related to speed and cost are far more common that those related to quality and impact, the future of learning will undoubtedly include both AI and human educators. For those who lean into AI, success won't come from perfecting digital deception. It will come from understanding that whether delivered through pixels or presence, effective learning requires genuine respect for how humans process information, build trust and make meaning.
The avatars are getting more realistic every day. The question is: are we getting better at the things that actually matter for learning?
Happy experimenting,
Phil 👋
PS: Want to explore avatar scripting techniques through hands-on application? Apply for a place on my AI & Learning Design Bootcamp, where we cover these approaches and 30+ other practical AI applications for instructional design.
PPS: If you want to stay current with the latest research on AI and instructional design, subscribe to my Learning Research Digest—a monthly summary of the most important peer-reviewed studies in our field.
References
Aguilar, K. N., Smith, M. L., Payne, S. C., Zhao, H., & Benden, M. (2024). Digital human ergonomics training for remote office workers: Comparing a novel method to a traditional online format. Applied Ergonomics, 118, 104164.
Baake, J., Schmitt, J. B., & Metag, J. (2025). Balancing realism and trust: AI avatars in science communication. Journal of Science Communication, 24(2), A03.
Chakraborty, M. (2017). Learner engagement strategies in online class [Doctoral dissertation, Texas A&M University].
Community of Inquiry framework. (2000). Garrison, D. R., Anderson, T., & Archer, W.
Jackson, P., & Chakraborty, M. (2014). Transformative learning in the online learning environment: A literature review. Texas A&M University.
Lind, S. J. (2024). Can AI-powered avatars replace human trainers? An empirical test of synthetic human-like spokesperson applications. Journal of Workplace Learning. https://doi.org/10.1108/JWL-04-2024-0075
Mayer, R. E. (2021). Multimedia Learning (3rd ed.). Cambridge University Press.
Mayer, R. E., Fiorella, L., & Stull, A. (2020). Five ways to reduce cognitive load in multimedia learning. Educational Psychologist, 55(3), 165–180.
Ng, J., Hao, M., & Zhang, Y. (2025). Avatar for hotels' green training: Exploring the impact of persona, gender, and value orientation on learning and environmental behaviour. International Journal of Hospitality Management, 126, 104068.
PMC. (2019). Effect of text-to-speech rate on reading comprehension by adults with aphasia. PMC, 7231913.
PMC. (2025). The impact of emotional design on multimedia learning outcomes. PMC, 11939454.
Slemmons, K., Smetana, L., Dittman, E., & Potter, S. (2018). The impact of video length on student engagement in online learning environments. Journal of Science Education and Technology, 27(5), 469-479.
VirtualSpeech. (2024). How VirtualSpeech's AI avatars are transforming learning experiences. Retrieved from https://virtualspeech.com/blog/ai-avatars-are-transforming-learning-experiences
World Economic Forum. (2025). Gender parity in the intelligent age. White paper developed in partnership with LinkedIn.
ZipDo. (2025). Microlearning statistics: Education reports 2025. Retrieved from https://zipdo.co/microlearning-statistics/