multimedia-accessibility

Installation
SKILL.md

Multimedia Accessibility

Design multimedia content that can be consumed by people who can't hear the audio, can't see the video, or can't process both at once.

Core Principle

Every piece of information conveyed through one sense must also be available through another. Audio content needs a visual equivalent. Visual content needs an audio or text equivalent.

Captions

What Captions Include

  • All spoken dialogue, attributed to the speaker
  • Relevant sound effects: [door slams], [phone rings], [laughter]
  • Music that conveys mood or meaning: [upbeat music], [tense score]
  • Off-screen sounds that matter: [footsteps approaching]

Caption Quality

  • Synchronised with speech (within 1 second)
  • Minimum 1 second display time, maximum 2 lines
  • No more than 32 characters per line for readability
  • Proper punctuation and grammar — not auto-generated without review
  • Speaker identification when multiple speakers are present

Auto-Captions Are Not Enough

  • Auto-generated captions typically have 80–85% accuracy
  • That means 1 in 5 or 6 words is wrong
  • Always review and correct auto-captions
  • Names, technical terms, and accented speech fail most often

Transcripts

When to Provide

  • Every audio-only file (podcast, voice memo, audio article)
  • Every video (as a complement to captions)
  • Transcripts serve people who are deaf, hard of hearing, or who prefer reading to watching/listening

What Transcripts Include

  • All spoken content with speaker labels
  • Descriptions of relevant visual content (for video transcripts)
  • Relevant sound effects and music cues
  • Timestamps for longer content (helps navigation)

Transcript Placement

  • Link directly below or beside the media player
  • Label clearly: "Read the full transcript"
  • Don't hide behind multiple clicks
  • Make transcripts searchable

Audio Descriptions

When to Provide

  • When video content conveys important visual information that isn't in the dialogue
  • Presentations where speakers reference slides or visuals
  • Tutorials that demonstrate visual actions

What to Describe

  • On-screen text not read aloud
  • Actions and gestures that carry meaning
  • Scene changes and visual context
  • Charts, graphs, and visual data shown on screen

How to Describe

  • Fit descriptions into natural pauses in dialogue
  • Be concise — describe what matters, not everything visible
  • For dense visual content: provide an extended audio description version where the video pauses for description

Media Player Requirements

  • Keyboard-accessible play, pause, stop, volume, and seek controls
  • Visible captions toggle
  • Speed controls (0.5x to 2x) — essential for cognitive accessibility
  • Volume control independent of system volume
  • No autoplay — let the user choose when to start
  • Visible progress bar with time display

Assessment Questions

  1. Does every video have accurate, synchronised captions?
  2. Does every audio file have a transcript?
  3. Are auto-captions reviewed and corrected?
  4. Is important visual information audio-described?
  5. Is the media player fully keyboard accessible?
  6. Is autoplay disabled?
Weekly Installs
7
GitHub Stars
40
First Seen
Mar 19, 2026