meeting-transcription
Meeting Transcription
This skill enables an AI agent to process meeting audio recordings into structured, actionable documents. The agent handles the full pipeline from raw audio input through speaker diarization, transcription, and intelligent summarization. The output includes a timestamped transcript with speaker labels, a concise summary of key discussion points, a list of decisions made, and clearly assigned action items with owners and deadlines.
Workflow
-
Ingest and validate the audio input. Accept the meeting audio file and verify it is in a supported format: MP3, WAV, M4A, FLAC, OGG, or WebM. Check the file size, duration, and channel count (mono vs. stereo). If the audio is in a non-standard format, convert it to WAV 16kHz mono using FFmpeg or a similar preprocessing tool. Log the file metadata (duration, sample rate, codec) for downstream reference.
-
Preprocess the audio for quality. Apply noise reduction to suppress background hum, keyboard clicks, and room echo. Normalize audio levels across the recording so that quiet speakers are boosted and loud segments are attenuated. If the recording has multiple channels (e.g., a stereo podcast), split channels where each maps to a known speaker. Flag sections with very low signal-to-noise ratio as potentially unreliable.
-
Perform speaker diarization. Identify and label distinct speakers throughout the recording. Use voiceprint clustering to distinguish speakers even when they interrupt each other or speak in quick succession. Assign temporary labels (Speaker 1, Speaker 2, etc.) by default, and allow the user to provide a name mapping either before or after processing. Handle overlapping speech by attributing the segment to the dominant speaker and noting the overlap.
-
Transcribe the audio to text. Run the preprocessed, diarized audio through a speech-to-text engine (e.g., Whisper, Deepgram, Google Speech-to-Text). Produce a word-level or segment-level transcript with timestamps. Apply punctuation restoration and capitalization correction. For multi-language meetings, detect language switches and transcribe each segment in its original language, optionally providing inline translations.
-
Generate the structured summary. Analyze the full transcript to extract key discussion topics, decisions made, open questions, and action items. Group related discussion segments into thematic sections. For each action item, identify the owner (by speaker label or name), the task description, and any mentioned deadline. Produce a summary document with clearly delineated sections: Overview, Key Discussion Points, Decisions, Action Items, and Follow-ups.
-
Format and deliver the output. Produce the final output in the requested format: Markdown, JSON, or plain text. Include both the full timestamped transcript and the structured summary as separate sections or files. If calendar integration is enabled, cross-reference the meeting with calendar event data to auto-populate the meeting title, attendee list, and agenda in the output header.
Usage
More from seb1n/awesome-ai-agent-skills
summarization
Summarize text using extractive, abstractive, hierarchical, and multi-document techniques, producing concise outputs at configurable detail levels.
24note-taking
Capture, organize, and retrieve notes efficiently using structured formats, tagging, and file management for meetings, ideas, research, and daily logs.
20proofreading
Proofread and correct text for grammar, spelling, punctuation, style, clarity, and consistency, with support for multiple style guides and readability analysis.
20knowledge-graph-creation
Build structured knowledge graphs from unstructured text by extracting entities, mapping relationships, generating graph triples, and visualizing the result.
18data-visualization
Create clear, effective charts and dashboards from structured data using matplotlib, seaborn, and plotly.
16data-analysis
Analyze datasets to extract insights through statistical methods, trend identification, hypothesis testing, and correlation analysis.
15