transcript-pipeline
Transcript Pipeline Skill
Run a deterministic, auditable transcript-to-tutorial workflow with optional resource enrichment.
Purpose
Use this skill to convert raw class captions into high-quality study notes while preserving accountability through ledger + validation artifacts.
Use scripts for deterministic work. Use chat/stage prompts for language-heavy transformation.
Core Contract
- Keep stage order: ingest -> refine -> synthesize -> enhance -> validate -> publish.
- Run deterministic gates with scripts, never with LLM self-certification.
- Preserve traceability in
.pipeline/*artifacts. - Keep learner-facing notes readable and sanitized.
- Treat validation status as PASS/FAIL source of truth.
Scripts
Use these scripts from scripts/:
ingest_zoom_captions.py- deterministic ingestion and segment ledger creationrun_chat_pipeline.py- guided orchestration for stage handoffs and validationvalidate_coverage.py- hard-gate coverage validationpublish_tutorial_notes.py- learner-facing file naming and sanitizationmerge_chunks.py- merge chunk outputs for large transcriptsrun_colab_notebook_pipeline.py- AI/ML Colab appendix and code explainer pipelineupdate_ai_notes_with_resources_and_colab.py- AI/ML notes enrichment utilityresource_enrichment.py- authenticated enrichment for Notion/Canva/Drive resources
Stage Workflow
Stage 0: Ingest (Deterministic)
Run:
python scripts/ingest_zoom_captions.py "<transcript_or_session_path>"
Required outputs:
.pipeline/segment_ledger.jsonl.pipeline/segment_manifest.jsonl
Stage 1: Refine (Chat Stage)
Load references/stage1-refine.md.
Produce:
.pipeline/refined_transcript.md.pipeline/topic_inventory.json.pipeline/corrections_log.csv.pipeline/uncertainty_report.json
Stage 2: Synthesize (Chat Stage)
Load references/stage2-synthesize.md.
Produce:
.pipeline/structured_notes.md.pipeline/coverage_matrix.json
Stage 3: Enhance (Chat Stage)
Load:
references/stage3-enhance.mdreferences/tutorial-tech-bar-raiser.md
Produce:
.pipeline/enhanced_notes.mdfinal_notes.mdbootcamp_index.md
Stage 4: Validate (Deterministic)
Run:
python scripts/validate_coverage.py --pipeline-dir .pipeline
Validation guidance: references/stage4-validate.md.
Hard gates:
- Segment coverage accountability
- Uncertainty retention
- No orphan claims
Stage 5: Publish
Run:
python scripts/publish_tutorial_notes.py --root "<sessions_root>" --session-dir "<session_dir>"
Result:
- Published tutorial filename in canonical format
- Learner-safe note without noisy source tags
- Updated course index links
One-Command Guided Mode
Use guided runner for chat-window workflows:
python scripts/run_chat_pipeline.py run "<transcript_or_session_path>" --deep-pass
This enforces required handoffs and deep quality gates.
Optional Resource Enrichment Stage
Run when class notes include external links (Notion/Canva/Drive):
python scripts/resource_enrichment.py --all-sessions
Single session:
python scripts/resource_enrichment.py --session-dir "<session_dir>"
Auth options:
- Notion:
NOTION_TOKEN_V2,NOTION_ACTIVE_USER - Canva:
RESOURCE_PLAYWRIGHT_STORAGE_STATE
Reference: references/resource-enrichment-authenticated-flow.md.
Optional AI/ML Colab Enrichment
Run for Colab-backed AI/ML classes:
python scripts/run_colab_notebook_pipeline.py
Reference: references/colab-notebook-explainer-pipeline.md.
Large Transcript Handling
If input exceeds context comfort:
- Run Stage 1 by chunks.
- Merge chunk artifacts:
python scripts/merge_chunks.py --chunk-dirs "<chunkA/.pipeline>" "<chunkB/.pipeline>" --output-dir "<session/.pipeline>"
- Continue Stage 2 onward on merged artifacts.
Required Outputs Checklist
Learner-facing:
final_notes.md<Domain> Class <NN> [DD-MM-YYYY] - <Topic>.mdbootcamp_index.md
Pipeline/audit:
.pipeline/segment_ledger.jsonl.pipeline/segment_manifest.jsonl.pipeline/refined_transcript.md.pipeline/topic_inventory.json.pipeline/corrections_log.csv.pipeline/uncertainty_report.json.pipeline/structured_notes.md.pipeline/coverage_matrix.json.pipeline/enhanced_notes.md.pipeline/validation_report.md.pipeline/exceptions.json(if fail)
Quality gates:
.pipeline/deep_pass_report.md(when--deep-pass).pipeline/deep_pass_exceptions.json(when--deep-pass)
Resource enrichment (optional):
.resources/resource_enrichment_report.json
Execution Rules
- Fail fast on missing required artifacts.
- Report missing outputs explicitly by file path.
- Retry only from earliest failing stage.
- Keep resource extraction status explicit (success/fallback/blocked).
More from prakharmnnit/skills-and-personas
backend-principle-eng-cpp-pro-max
Principal backend engineering intelligence for C++ systems and performance-critical services. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale backend code and architectures. Focus: correctness, memory safety, latency, reliability, observability, scalability, operability.
88backend-principle-eng-java-pro-max
Principal backend engineering intelligence for Java services and distributed systems. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale backend code and architectures. Focus: correctness, reliability, performance, security, observability, scalability, operability, cost.
14lecture-alchemist
Transform raw lecture transcripts (Zoom, YouTube, etc.) into structured, retention-optimized study notes. Use when the user provides a lecture transcript, class recording text, or asks to process/convert lecture notes. Handles WebDev, AI/ML, Web3, DSA, and general tech domains. Produces hierarchical topic breakdowns, cleaned code artifacts, intuition builders, flashcards, spaced repetition plans, and actionable study materials. Trigger phrases: 'process this transcript', 'convert lecture to notes', 'lecture notes', 'transcript to study material', 'Lecture Alchemist'.
14backend-pe
Distinguished Principal Engineer backend/system architecture skill. Use when the user demands "BackendPE", "Supermode", "Antigravity", or requests high-performance, unlimited-context, world-class backend and distributed systems design. This skill maximizes depth, rigor, and production readiness.
13constellation-team
Coordinate a cross-functional star-team workflow (Product Manager, Principal Engineer, Backend, Frontend, QA/Security, DevOps) with mandatory architecture and code-review checkpoints. Use when a request needs end-to-end product delivery, multi-role collaboration, or explicit role-based outputs (PM/PE/Backend/Frontend/QA/DevOps), or when the user asks for "star team", "cross-functional", "full lifecycle", or "multi-role" planning.
13transcribe-refiner
Clean and reconstruct raw auto-generated captions (Zoom, YouTube, Teams, Google Meet, Otter.ai, etc.) into readable, coherent transcripts. Use when the user provides raw caption files (.txt, .vtt, .srt), meeting transcripts with timestamps and speaker tags, or asks to clean up/refine a transcript. Handles: timestamp removal, speaker tag normalization, filler word removal, broken sentence reconstruction, transcription error correction, paragraph formation. Preserves every piece of substantive content while removing noise. Trigger phrases: 'clean this transcript', 'refine captions', 'fix this transcript', 'process Zoom captions', 'clean up meeting notes'.
13