Text on B-Roll

Create scroll-stopping short-form videos by compositing bold on-screen text over AI-generated b-roll footage. The full pipeline: topic in, rendered MP4s out.

Prerequisite skills (loaded automatically):

video-generator - Veo 3.1 API for b-roll generation
video-caption-creation - Hook frameworks and text patterns
one-liners - 12 core one-liner patterns
nano-banana-image-generator - Optional: generate a matching static thumbnail

When to Use This Skill

Promoting a blog post, article, or newsletter on Reels/TikTok/Shorts
Creating text-forward social videos from a topic or theme
Building a batch of short-form content around one visual concept
Any time you want bold text + cinematic b-roll without filming anything

This skill is NOT for:

Podcast clips with audio (use youtube-clip-extractor)
Videos with voiceover or dialogue (use video-generator directly)
Static image posts (use nano-banana-image-generator)

Workflow Overview

Understand the topic - What's the article/theme about?
Generate b-roll - One photorealistic Veo clip that matches the topic
Write 8-12 text options - Framework-fitted to proven patterns
User selects favorites - Pick 2-4 lines
Scaffold Remotion project - From template
Preview in Studio - User approves look
Render to MP4 - One video per text option

Step 1: B-Roll Generation

Generate one strong b-roll clip using video-generator. The visual should be:

Photorealistic - not illustrated or stylized
Extreme close-up or tight shot - intense, tactile, sensory
One simple action - hands doing something, a single motion
Ambient audio only - no dialogue, no music
9:16 vertical, 1080p, 8 seconds

B-Roll Prompt Template

Extreme close-up, shallow depth of field. [Subject performing tactile action].
[Specific sensory details - textures, colors, materials]. [Paper/surface reacts to the action].
The sound of [specific ambient audio]. Photorealistic, handheld micro-movement,
warm natural light from a nearby window.

B-Roll Quality Checklist

Photorealistic (not AI-looking)
Strong tactile/sensory quality
Simple composition (one focal point)
No text or symbols in the footage
Vertical 9:16
Motion is slow and deliberate (not chaotic)

Running the Generation

cd "/Users/charliedeist/Desktop/New Root Docs/OpenEd Vault"
export GEMINI_API_KEY=$(grep GEMINI_API_KEY .env | cut -d'=' -f2) && \
python3 ".claude/skills/video-generator/scripts/generate_video.py" \
  "Your prompt here" \
  --aspect 9:16 --resolution 1080p --duration 8 \
  --name "descriptive-name" \
  --output "Studio/SEO Content Production/" \
  --negative "text overlays, watermarks, blurry, cartoon, illustration, anime"

Veo takes 1-6 minutes. If the first result has artifacts, regenerate - don't rework.

Step 2: Write Text Options

Generate 8-12 on-screen text options using proven patterns. The text must work WITHOUT audio context - it stands alone over silent b-roll.

Pattern Library (Use These)

Pattern	Format	Example
Stop + Complaint	"Stop [universal frustration]"	"Stop grading kids on how fast they fill in bubbles."
Everyday Observation	"[Simple truth no one says]"	"Nobody remembers their test scores."
Existential Question	"[Question that points out absurdity]"	"Who decided 30 bubbles could measure a kid?"
Normalize	"Normalize [thing people feel guilty about]"	"Normalize letting kids destroy the answer sheet."
Values / Core Belief	"[Worldview in one line]"	"Curiosity doesn't fit in a bubble."
Mock Instruction	"[Absurdly specific command]"	"Instructions unclear. Used the Scantron as a canvas."
Polarizing Statement	"[Challenge common assumption]"	"Scantrons don't measure intelligence. They measure obedience."
Full Quote	"[Long conversational text that fills screen]"	"We hand kids four choices and call it thinking. Then we wonder why they let AI think for them."
Two-Part Reveal	Setup → hard cut → punchline	"They said my kid can't think critically." → "She was finger painting while your kid was memorizing answers."
Aspirational	"This is your sign to [permission]"	"This is your sign to let your kid color outside the bubbles."

Text Quality Rules

McDonald's Test: If a truck driver wouldn't get it, simplify
First 3 words do 80% of the work. Front-load the punch
Mix lengths: Include punchy (2-5 words), statement (5-8), narrative (9-15), and full-quote (15+)
No AI-isms: No "delve," "comprehensive," "crucial," "landscape"
No correlatives: Never "X isn't just Y - it's Z"
Conversational: Would you text this to a friend?
Specific > vague: "Stop acting like worksheets are learning" beats "Education should be better"

Text Sizing Guide

Text Length	Font Size	Notes
2-5 words	78-90px	Big, dominant, centered
5-8 words	66-78px	Still bold, may wrap to 2 lines
9-15 words	56-66px	Will wrap to 3-4 lines, still readable
15+ words	48-56px	Fills more screen, reads like a paragraph
Two-part reveal	60-66px each	Part 1 slightly larger than part 2

Step 3: Remotion Composition

Template Location

Reusable template files are at: .claude/skills/text-on-broll/template/

Quick Setup

# Create project directory
mkdir -p "Studio/[project-name]/text-on-broll"
cd "Studio/[project-name]/text-on-broll"

# Init and install
npm init -y
npm install --save remotion @remotion/cli @remotion/player react react-dom
npm install --save-dev typescript @types/react

# Create directories
mkdir -p src public

# Copy b-roll video
cp path/to/generated-video.mp4 public/bg-video.mp4

Then create three files from the templates below.

tsconfig.json

{
  "compilerOptions": {
    "target": "ESNext",
    "module": "commonjs",
    "jsx": "react-jsx",
    "strict": true,
    "esModuleInterop": true,
    "forceConsistentCasingInFileNames": true,
    "skipLibCheck": true
  },
  "include": ["src"]
}

src/index.ts

import { registerRoot } from "remotion";
import { RemotionRoot } from "./Root";

registerRoot(RemotionRoot);

src/Root.tsx

import React from "react";
import { Composition } from "remotion";
// Import your compositions from TextOverlay.tsx

export const RemotionRoot: React.FC = () => {
  return (
    <>
      <Composition
        id="variant-name"
        component={YourComponent}
        durationInFrames={240}  // 8s at 30fps
        fps={30}
        width={1080}
        height={1920}  // 9:16 vertical
      />
    </>
  );
};

src/TextOverlay.tsx - The Core Component

import React from "react";
import {
  AbsoluteFill,
  OffthreadVideo,
  staticFile,
  useCurrentFrame,
} from "remotion";

const IGText: React.FC<{
  text: string;
  fontSize?: number;
  top?: string;
}> = ({ text, fontSize = 72, top = "38%" }) => {
  return (
    <div
      style={{
        position: "absolute",
        top,
        left: "50%",
        transform: "translateX(-50%)",
        width: "85%",
        textAlign: "center",
        zIndex: 10,
      }}
    >
      <span
        style={{
          fontFamily: "'Helvetica Neue', 'Inter', Arial, sans-serif",
          fontWeight: 800,
          fontSize,
          lineHeight: 1.15,
          color: "white",
          WebkitTextStroke: "2.5px rgba(0,0,0,0.9)",
          textShadow: "0 3px 12px rgba(0,0,0,0.5)",
          letterSpacing: "-0.02em",
        }}
      >
        {text}
      </span>
    </div>
  );
};

The IGText Style (Locked In)

This is the approved IG-native text style. Do not change without explicit request:

Property	Value	Why
Font family	Helvetica Neue / Inter	Clean, IG-native feel
Weight	800 (extra bold)	Readable over video
Color	White	Maximum contrast
Stroke	2.5px rgba(0,0,0,0.9)	Thin black outline, readable on any background
Shadow	0 3px 12px rgba(0,0,0,0.5)	Subtle depth, not cheesy
Letter spacing	-0.02em	Tighter, more editorial
Line height	1.15	Compact multi-line
Width	85% of frame	Safe margins for IG
Animation	None	Text appears immediately, no fade/spring

Composition Patterns

Single line (most common):

export const MyComposition: React.FC = () => (
  <AbsoluteFill>
    <OffthreadVideo src={staticFile("bg-video.mp4")} />
    <IGText text="Your text here." fontSize={68} top="32%" />
  </AbsoluteFill>
);

Two-part reveal (hard cut at midpoint):

export const TwoPartReveal: React.FC = () => {
  const frame = useCurrentFrame();
  const midpoint = 108; // ~3.6s at 30fps

  return (
    <AbsoluteFill>
      <OffthreadVideo src={staticFile("bg-video.mp4")} />
      {frame < midpoint && (
        <IGText text="Setup line." fontSize={66} top="30%" />
      )}
      {frame >= midpoint && (
        <IGText text="Punchline." fontSize={60} top="28%" />
      )}
    </AbsoluteFill>
  );
};

Step 4: Preview and Render

Preview

npx remotion studio --port 3123

Opens at http://localhost:3123. Check all compositions. Look for:

Text readable over the video at all moments
No text clipping at edges
Font size appropriate for text length
Two-part reveal timing feels natural

Render All Compositions

# Render a single composition
npx remotion render src/index.ts [composition-id] out/[name].mp4

# Render all compositions in batch
for comp in comp1 comp2 comp3; do
  npx remotion render src/index.ts $comp out/$comp.mp4
done

Render Settings

Setting	Value
Codec	H.264 (default)
Frame rate	30fps
Resolution	1080x1920
CRF	18 (high quality)

Output Routing

Platform	Specs	Notes
Instagram Reels	9:16, < 90s, < 4GB	Upload directly
TikTok	9:16, < 10 min, < 287MB	Upload directly
YouTube Shorts	9:16, < 60s	Upload directly
Stories	9:16, < 15s per story	May need to trim to 8s

Batch Production Workflow

For creating multiple videos from one topic (e.g., promoting an article):

Generate 1 b-roll clip that captures the topic's visual metaphor
Write 10 text options across all pattern types
User picks 3-4 favorites
Scaffold 1 Remotion project with all selected as separate compositions
Preview all in Studio, adjust sizes/positions
Batch render all compositions
Pair each video with a platform-specific caption (from content-repurposer)

This gives you 3-4 unique videos from one generation session, each with different text but the same compelling visual.

Skill Chain

This skill works best as part of a content production chain:

Source Content (article, newsletter, podcast)
    ↓
text-on-broll (this skill) → 3-4 vertical videos
    ↓
content-repurposer → platform captions for each video
    ↓
Post to IG Reels, TikTok, Shorts

Related skills:

video-generator - B-roll generation (called by this skill)
video-caption-creation - Hook frameworks (referenced by this skill)
nano-banana-image-generator - Static thumbnail for the same content
content-repurposer - Platform captions to pair with each video
one-liners - Text pattern library
short-form-video - Broader short-form strategy and philosophy

Examples

Example: Education Article Promotion

Topic: Article about standardized testing and creative agency

B-roll prompt: "Extreme close-up, shallow depth of field. A child's small paint-covered fingers slowly drag through thick finger paint across a standardized test bubble sheet. Paint smears vibrant orange and blue across neat rows of bubbles. The sound of wet paint dragging across paper. Photorealistic, handheld micro-movement, warm natural light."

Text options selected:

"Stop grading kids on how fast they fill in bubbles." (Stop + Complaint)
"Curiosity doesn't fit in a bubble." (Values)
"They said my kid can't think critically." → "She was finger painting while your kid was memorizing answers." (Two-Part Reveal)

Result: 3 vertical videos, same b-roll, different text - each targets a different scroll-stop trigger.