supadata

SKILL.md

Supadata Skill

One API for YouTube transcripts, search, channel ingestion, structured extraction, and metadata across YouTube + social video platforms.

Base URL: https://api.supadata.ai/v1
Auth header: x-api-key: $SUPADATA_API_KEY
Env var: SUPADATA_API_KEY


When to Use Which Endpoint

Goal Endpoint Cost
Get transcript from a YouTube/social URL /transcript or /youtube/transcript 1 credit (native) / 2 credits/min (AI)
Transcribe many videos at once /youtube/transcript-batch 1 credit/video (native)
Search YouTube by keyword /youtube/search 1 credit/page
List all video IDs from a channel /youtube/channel-videos 1 credit
Get video/channel/playlist metadata /youtube/video, /youtube/channel, /metadata 1 credit
Extract structured data from a tutorial (visual content) /extract varies (AI vision)
Scrape a web page to Markdown /web/scrape 1 credit

Key decision: Transcript vs Extract

  • Use Transcript when content is mostly spoken / narrated. Cheaper, faster.
  • Use Extract when the video is a tutorial/demo where important content is shown on screen but NOT spoken aloud (e.g. Midjourney prompts typed into UI, ComfyUI node graphs, on-screen settings panels, code shown without narration). Extract runs a vision model on the video frames.

1. YouTube Transcript

Single video (YouTube-specific, most common)

curl -X GET "https://api.supadata.ai/v1/youtube/transcript?url=https://youtu.be/VIDEO_ID&text=true&lang=en" \
  -H "x-api-key: $SUPADATA_API_KEY"

Parameters:

Param Values Notes
url YouTube URL Required
text true / false true = plain string, false = timestamped chunks
lang ISO 639-1 (e.g. en) Optional, defaults to first available

Response (text=true):

{
  "content": "Full transcript as plain text...",
  "lang": "en",
  "availableLangs": ["en", "es"]
}

Response (text=false):

{
  "content": [
    { "text": "Hello everyone", "offset": 0, "duration": 2500, "lang": "en" }
  ],
  "lang": "en",
  "availableLangs": ["en"]
}

Cross-platform transcript (YouTube, TikTok, Instagram, X, Facebook, file URL)

curl -X GET "https://api.supadata.ai/v1/transcript?url=URL_HERE&text=true&mode=auto" \
  -H "x-api-key: $SUPADATA_API_KEY"

mode values:

  • native — fetch existing captions only (cheapest, 1 credit, no AI). Use this first.
  • auto — try native, fall back to AI speech-to-text if no captions exist (default)
  • generate — always AI speech-to-text (2 credits/min, use when you need it for content without captions)

Async handling: Large videos return HTTP 202 with a jobId. Poll with:

curl "https://api.supadata.ai/v1/transcript/JOB_ID" -H "x-api-key: $SUPADATA_API_KEY"

2. Batch Transcript (multiple videos)

Use for bulk channel ingestion or list of URLs.

curl -X POST "https://api.supadata.ai/v1/youtube/transcript-batch" \
  -H "x-api-key: $SUPADATA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://youtu.be/VIDEO_ID_1",
      "https://youtu.be/VIDEO_ID_2"
    ],
    "text": true,
    "lang": "en"
  }'

Returns a batchId. Poll results:

curl "https://api.supadata.ai/v1/youtube/batch/BATCH_ID" \
  -H "x-api-key: $SUPADATA_API_KEY"

3. YouTube Search

Search YouTube and get structured results. Far cleaner than SerpApi for programmatic use — native sort/filter params, ISO dates, integer view counts.

curl -X GET "https://api.supadata.ai/v1/youtube/search?query=AI+image+prompts&type=video&sortBy=views&uploadDate=month&duration=medium&limit=20" \
  -H "x-api-key: $SUPADATA_API_KEY"

Parameters:

Param Values Notes
query string Required
type video, channel, playlist, movie, all Default: all
sortBy relevance, rating, date, views Default: relevance
uploadDate hour, today, week, month, year, all Default: all
duration short (<4min), medium (4–20min), long (>20min), all Default: all
features array: hd, subtitles, 4k, live, creative-commons, 360, hdr Optional
limit 1–5000 Auto-paginates. Each page ~20 results = 1 credit.
nextPageToken string Manual pagination token from previous response

Response:

{
  "query": "AI image prompts",
  "results": [
    {
      "type": "video",
      "id": "VIDEO_ID",
      "title": "Best Midjourney Prompts 2024",
      "description": "...",
      "thumbnail": "https://i.ytimg.com/vi/VIDEO_ID/hqdefault.jpg",
      "duration": 847,
      "viewCount": 234567,
      "uploadDate": "2024-11-15T00:00:00.000Z",
      "channel": {
        "id": "CHANNEL_ID",
        "name": "AI Creator Hub",
        "thumbnail": "https://..."
      }
    }
  ],
  "nextPageToken": "eyJ..."
}

Pagination cost note: limit=100 will consume ~5 credits (100/20 pages). Use limit carefully for bulk research.


4. Channel Video List

Get all video IDs from a YouTube channel for bulk ingestion.

curl -X GET "https://api.supadata.ai/v1/youtube/channel-videos?url=https://youtube.com/@CHANNEL_HANDLE" \
  -H "x-api-key: $SUPADATA_API_KEY"

Returns array of video IDs. Feed into batch transcript endpoint.


5. Extract — Structured Data from Video (Vision + Audio)

Use when video content is visual — prompts shown on screen, UI demos, workflow screenshots, settings panels not narrated aloud.

curl -X POST "https://api.supadata.ai/v1/extract" \
  -H "x-api-key: $SUPADATA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.youtube.com/watch?v=VIDEO_ID",
    "prompt": "Extract all AI image prompts shown on screen. Include the exact text of each prompt, the tool or platform visible (Midjourney, Stable Diffusion, etc), and any parameter settings shown (aspect ratio, model, steps, etc).",
    "schema": {
      "type": "object",
      "properties": {
        "prompts": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "timestamp": { "type": "string" },
              "tool": { "type": "string" },
              "promptText": { "type": "string" },
              "parameters": { "type": "string" }
            },
            "required": ["promptText"]
          }
        }
      },
      "required": ["prompts"]
    }
  }'

Always returns async jobId (HTTP 202). Poll:

curl "https://api.supadata.ai/v1/extract/JOB_ID" -H "x-api-key: $SUPADATA_API_KEY"

Schema strategy:

  • Run with prompt only first → API auto-generates schema → reuse returned schema for consistency across future videos of same type
  • Provide both prompt + schema for maximum control

Pre-built schema examples for our use cases:

AI Image Prompt Extractor

{
  "type": "object",
  "properties": {
    "prompts": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "timestamp": { "type": "string" },
          "tool": { "type": "string", "description": "Midjourney, FLUX, Stable Diffusion, etc" },
          "promptText": { "type": "string" },
          "parameters": { "type": "string", "description": "e.g. --ar 16:9 --stylize 750" },
          "resultVisible": { "type": "boolean", "description": "Is the generated image shown?" }
        },
        "required": ["promptText"]
      }
    }
  },
  "required": ["prompts"]
}

Key Takeaways Extractor

{
  "type": "object",
  "properties": {
    "topic": { "type": "string" },
    "summary": { "type": "string" },
    "keyTakeaways": { "type": "array", "items": { "type": "string" } },
    "actionItems": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["topic", "summary", "keyTakeaways"]
}

Video Chapters

{
  "type": "object",
  "properties": {
    "chapters": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "title": { "type": "string" },
          "startTime": { "type": "string" },
          "summary": { "type": "string" }
        },
        "required": ["title", "startTime"]
      }
    }
  },
  "required": ["chapters"]
}

6. Video / Channel Metadata

# Single video metadata
curl "https://api.supadata.ai/v1/youtube/video?url=https://youtu.be/VIDEO_ID" \
  -H "x-api-key: $SUPADATA_API_KEY"

# Channel metadata
curl "https://api.supadata.ai/v1/youtube/channel?url=https://youtube.com/@HANDLE" \
  -H "x-api-key: $SUPADATA_API_KEY"

# Cross-platform (YouTube, TikTok, Instagram, X, Facebook)
curl "https://api.supadata.ai/v1/metadata?url=URL" \
  -H "x-api-key: $SUPADATA_API_KEY"

7. Web Scrape (bonus — same key)

Extract any web page to clean Markdown.

curl "https://api.supadata.ai/v1/web/scrape?url=https://example.com" \
  -H "x-api-key: $SUPADATA_API_KEY"

Python Usage (SDK)

from supadata import Supadata

supadata = Supadata(api_key=os.environ["SUPADATA_API_KEY"])

# Transcript
transcript = supadata.youtube.transcript(url="https://youtu.be/VIDEO_ID", text=True, lang="en")
print(transcript.content)

# Search
results = supadata.youtube.search(query="AI image prompts", type="video", sort_by="views", upload_date="month", limit=20)
for r in results.results:
    print(r.title, r.view_count)

# Extract (async)
job = supadata.extract(url="https://youtu.be/VIDEO_ID", prompt="Extract all prompts shown on screen")
result = supadata.extract.get_results(job.job_id)
print(result.data)

Install SDK: pip install supadata


Content Pipeline Pattern

Discover → Filter → Ingest → Extract → Store

1. search(query, sortBy=views, uploadDate=month)  → get ranked video list
2. Filter by viewCount > threshold, duration = medium/long
3. batch_transcript(urls)                          → pull all transcripts
4. If tutorial/demo video → extract(url, schema)   → get visual prompts
5. Feed to agent → classify → store in laniameda-kb

Pricing Reference

Action Credits
Native transcript (captions exist) 1
AI-generated transcript 2 per minute of video
Search (per page ~20 results) 1
Channel video list 1
Video/channel/playlist metadata 1
Web scrape 1
Extract (AI vision) varies

Default to mode=native for transcripts. Only use generate when captions don't exist.


Error Handling

Code Meaning
400 Invalid request / missing params
401 Bad API key
404 Video not found / no transcript available
429 Rate limit hit
202 Async job started — poll with jobId
Weekly Installs
7
First Seen
13 days ago
Installed on
trae-cn7
gemini-cli7
antigravity7
github-copilot7
codex7
kimi-cli7