skills/claude-office-skills/skills/Transcription Automation

Transcription Automation

SKILL.md

Transcription Automation

Comprehensive skill for automating audio/video transcription and content processing.

Core Workflows

1. Transcription Pipeline

TRANSCRIPTION FLOW:
┌─────────────────┐
│  Audio/Video    │
│     Input       │
└────────┬────────┘
┌─────────────────┐
│  Pre-Processing │
│  - Convert      │
│  - Enhance      │
│  - Split        │
└────────┬────────┘
┌─────────────────┐
│  Transcription  │
│  - STT Engine   │
│  - Diarization  │
└────────┬────────┘
┌─────────────────┐
│ Post-Processing │
│  - Format       │
│  - Timestamps   │
│  - Speakers     │
└────────┬────────┘
┌─────────────────┐
│     Output      │
│  - Text/SRT/VTT │
│  - Summary      │
└─────────────────┘

2. Transcription Configuration

transcription_config:
  engine: whisper  # whisper, assembly_ai, deepgram
  
  audio_settings:
    sample_rate: 16000
    channels: mono
    format: wav
    
  transcription:
    language: auto  # or specific: en, zh, es
    model: large  # tiny, base, small, medium, large
    task: transcribe  # transcribe or translate
    
  features:
    speaker_diarization: true
    word_timestamps: true
    punctuation: true
    profanity_filter: false
    
  output:
    formats:
      - txt
      - srt
      - vtt
      - json
    include_confidence: true
    include_timestamps: true

Meeting Transcription

Meeting Notes Template

meeting_transcript:
  metadata:
    title: "{{meeting_title}}"
    date: "{{date}}"
    duration: "{{duration}}"
    attendees: "{{speakers}}"
    
  output_template: |
    # {{title}}
    
    **Date:** {{date}}
    **Duration:** {{duration}}
    **Attendees:** {{attendees}}
    
    ## Summary
    {{ai_summary}}
    
    ## Key Points
    {{#each key_points}}
    - {{this}}
    {{/each}}
    
    ## Action Items
    {{#each action_items}}
    - [ ] {{task}} - @{{assignee}} - Due: {{due_date}}
    {{/each}}
    
    ## Full Transcript
    {{#each segments}}
    **[{{timestamp}}] {{speaker}}:** {{text}}
    
    {{/each}}

Speaker Diarization

diarization_config:
  min_speakers: 2
  max_speakers: 10
  
  speaker_labels:
    - name: "Speaker 1"
      voice_sample: "sample_1.wav"  # Optional
    - name: "Speaker 2"
      voice_sample: "sample_2.wav"
      
  output_format:
    speaker_prefix: true
    speaker_timestamps: true
    
  example_output: |
    [00:00:05] SPEAKER_1: Welcome everyone to today's meeting.
    [00:00:12] SPEAKER_2: Thanks for having us.
    [00:00:18] SPEAKER_1: Let's start with the agenda.

Subtitle Generation

SRT Format

subtitle_config:
  format: srt
  
  timing:
    max_duration: 7  # seconds per subtitle
    min_gap: 0.1     # seconds between subtitles
    chars_per_line: 42
    max_lines: 2
    
  style:
    case: sentence  # sentence, upper, lower
    numbers: words  # words, digits
    
  example_output: |
    1
    00:00:05,000 --> 00:00:08,500
    Welcome to today's presentation
    about transcription automation.
    
    2
    00:00:09,000 --> 00:00:12,000
    Let me start by explaining
    the basic concepts.

VTT Format

vtt_config:
  format: vtt
  
  features:
    cue_settings: true
    styling: true
    
  example_output: |
    WEBVTT
    
    00:00:05.000 --> 00:00:08.500 align:center
    Welcome to today's presentation
    about transcription automation.
    
    00:00:09.000 --> 00:00:12.000 align:center
    <v Speaker 1>Let me start by explaining
    the basic concepts.

Integration Workflows

Zoom Integration

zoom_transcription:
  trigger:
    event: recording_completed
    
  workflow:
    - step: download_recording
      source: zoom_cloud
      
    - step: transcribe
      engine: whisper
      language: auto
      
    - step: diarize
      identify_speakers: true
      
    - step: generate_notes
      template: meeting_notes
      include_summary: true
      extract_action_items: true
      
    - step: distribute
      destinations:
        - notion_page
        - slack_channel
        - email_attendees

YouTube Integration

youtube_subtitles:
  trigger:
    event: video_uploaded
    
  workflow:
    - step: download_audio
      source: youtube_video
      
    - step: transcribe
      engine: whisper
      task: transcribe
      
    - step: generate_subtitles
      formats: [srt, vtt]
      
    - step: translate
      target_languages: [es, zh, ja, de, fr]
      
    - step: upload_subtitles
      destination: youtube
      as_cc: true

Podcast Processing

podcast_workflow:
  input:
    source: rss_feed
    format: audio/mp3
    
  processing:
    - transcribe:
        engine: whisper
        model: large
        
    - generate_chapters:
        detect_topics: true
        min_duration: 60  # seconds
        
    - create_show_notes:
        summarize: true
        extract_links: true
        highlight_quotes: true
        
    - create_searchable_index:
        full_text: true
        timestamps: true
        
  output:
    - transcript_txt
    - chapters_json
    - show_notes_md
    - search_index

Language Support

Multi-Language Transcription

multilingual:
  auto_detect: true
  
  supported_languages:
    - code: en
      name: English
      model: large
      
    - code: zh
      name: Chinese
      model: large
      
    - code: es
      name: Spanish
      model: large
      
    - code: ja
      name: Japanese
      model: medium
      
  translation:
    enabled: true
    target: en
    preserve_original: true

Code-Switching

code_switching:
  enabled: true
  primary_language: en
  secondary_languages: [zh, es]
  
  output: |
    [00:01:23] The next topic is about 人工智能,
    which has been muy importante in recent years.
    
  handling:
    detect_language_per_segment: true
    tag_language_switches: true

Quality Enhancement

Post-Processing

post_processing:
  text_cleanup:
    - remove_filler_words: ["um", "uh", "like"]
    - fix_common_errors: true
    - normalize_numbers: true
    
  formatting:
    - add_punctuation: true
    - capitalize_sentences: true
    - paragraph_breaks: true
    
  speaker_attribution:
    - merge_short_segments: true
    - min_segment_duration: 1.0
    
  output_enhancement:
    - add_timestamps: true
    - highlight_keywords: true
    - generate_summary: true

Accuracy Metrics

TRANSCRIPTION QUALITY REPORT
═══════════════════════════════════════

File: meeting_2024_01_15.mp3
Duration: 45:32
Engine: Whisper Large

METRICS:
Word Error Rate (WER):  4.2%
Character Error Rate:   2.8%
Confidence Score:       0.94

SPEAKER DIARIZATION:
Speakers Detected: 4
Diarization Accuracy: 91%

PROCESSING TIME:
Total: 8m 23s
Real-time Factor: 0.18x

DETECTED ISSUES:
• Low confidence at 12:34 (background noise)
• Overlapping speech at 23:45
• Unknown speaker at 34:12

API Examples

OpenAI Whisper

import openai

# Transcribe audio
with open("meeting.mp3", "rb") as audio_file:
    transcript = openai.Audio.transcribe(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

# Access results
for segment in transcript.segments:
    print(f"[{segment.start:.2f}] {segment.text}")

AssemblyAI

import assemblyai as aai

transcriber = aai.Transcriber()

config = aai.TranscriptionConfig(
    speaker_labels=True,
    auto_chapters=True,
    entity_detection=True
)

transcript = transcriber.transcribe(
    "https://example.com/meeting.mp3",
    config=config
)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Best Practices

  1. Quality Audio: Clean input = better output
  2. Choose Right Model: Balance speed vs accuracy
  3. Use Diarization: Identify speakers clearly
  4. Post-Process: Clean up automated output
  5. Verify Critical Content: Human review important
  6. Consider Privacy: Handle sensitive content
  7. Store Efficiently: Compress and index
  8. Provide Context: Vocabulary hints help
Weekly Installs
0
GitHub Stars
9
First Seen
Jan 1, 1970