How the Engine Works
A complete overview of the video generation pipeline
1High-Level Data Flow
2Dual-Agent Script System
Agent 1: Writer
Creates a 20-beat story outline following the "Competence Arc" structure.
• Focuses on narrative flow
• Writes engaging narrator scripts
• Designs character struggles & victories
Agent 2: Director
Converts the story into structured JSON with visual & audio metadata.
• Generates background-only visuals
• Selects theme & layout modes
• Creates character visual description
3Story Structure: The Competence Arc
Every video follows a proven educational narrative structure:
Introduce character and environment. Establish a clear goal.
Character fails due to ignoring basic concept. Learns the fundamental rule through struggle. Includes 1 CONCEPT_CARD.
A harder challenge appears. Basic rule isn't enough. Character learns advanced concept. Includes 1-2 CONCEPT_CARDS.
Stop the story. Show how this topic applies to real careers (Engineering, Medicine, Tech, etc.)
Happy ending shot of character succeeding.
4Asset Generation Pipeline
Character
Runware AI
- Generate 4 emotion poses
- Remove background
- Upload to R2 CDN
Backgrounds
Runware AI
- Generate per-slide visuals
- Character-free scenes
- Upload to R2 CDN
Voiceovers
ElevenLabs
- Generate per-slide audio
- Measure duration
- Upload to R2 CDN
5Dynamic Theme System (15 Themes)
The Director AI selects the optimal theme based on subject matter:
6Final Output
Video File
- • Rendered via Remotion (H.264)
- • Uploaded to Cloudflare Stream
- • Signed URLs for secure playback
- • Average generation: 3-5 minutes
Metadata (script_json)
- • Full slide data with timings
- • Character & background URLs
- • Audio URLs & durations
- • Ready for RAG vector chunking