Dump Telegram Channel History

Archive a complete Telegram channel/group/chat to NDJSON + downloaded media files.

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

Preflight

Session must exist: ~/.local/share/telethon/<profile>.session
- If missing, run /tlg:setup first
User must be subscribed to (or a member of) the target channel/chat

Usage

/usr/bin/env bash << 'EOF'
SCRIPT="${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/cc-skills/plugins/tlg}/scripts/tg-cli.py"

# Full dump: NDJSON + all media (photos, videos, documents)
uv run --python 3.13 "$SCRIPT" dump @ChannelName ./output/ChannelName

# NDJSON only (skip media downloads — much faster)
uv run --python 3.13 "$SCRIPT" dump @ChannelName ./output/ChannelName --no-media

# Dump by numeric chat ID
uv run --python 3.13 "$SCRIPT" dump -1001234567890 ./output/MyChannel

# Use a different profile
uv run --python 3.13 "$SCRIPT" -p missterryli dump @ChannelName ./output/ChannelName
EOF

Parameters

Parameter	Type	Description
chat	string/int	Channel username (@name) or numeric chat ID
output	path	Output directory (messages.ndjson + media/ created inside)
`--no-media`	flag	Skip media downloads, produce NDJSON only

Output Structure

output/ChannelName/
├── messages.ndjson   ← one JSON object per line, chronological (oldest first)
└── media/
    ├── 6.jpg         ← named by message ID for cross-referencing
    ├── 12.png
    ├── 45.mp4
    └── ...

NDJSON Record Schema

Each line is a JSON object with these fields:

Field	Type	Description
`id`	int	Telegram message ID
`date`	string	ISO 8601 timestamp with timezone
`text`	string/null	Full message text (no truncation)
`has_media`	bool	Whether message contains media
`media_type`	string/null	Telethon class name (MessageMediaPhoto, etc.)
`media_file`	string/null	Filename in media/ dir (e.g., "6.jpg")
`views`	int/null	View count (channels only)
`forwards`	int/null	Forward count
`reply_to_msg_id`	int/null	Parent message ID if reply
`grouped_id`	int/null	Album group ID (shared across album messages)
`edit_date`	string/null	ISO 8601 timestamp of last edit
`sender.id`	int	Sender's Telegram user/channel ID
`sender.name`	string	Display name (channel title or user first name)
`sender.username`	string/null	@username if set

Resume Support

Re-running the same command skips already-downloaded media files (checks dest.exists()). The NDJSON is fully rewritten each run. This makes it safe to resume interrupted downloads.

Querying the Output

# jq: find all GOLD BUY signals with chart screenshots
jq 'select(.text != null and (.text | test("GOLD.*BUY")) and .media_file != null)' messages.ndjson

# DuckDB: aggregate by date
duckdb -c "SELECT date::DATE as day, count(*) FROM read_ndjson('messages.ndjson') GROUP BY day ORDER BY day"

# Python/Polars
import polars as pl
df = pl.read_ndjson("messages.ndjson")

Performance Notes

~3000 messages + 1700 media files takes ~3-5 minutes
Telegram may briefly disconnect mid-download (Server closed the connection) — Telethon auto-reconnects
For very large channels (10k+ messages), expect 10-15 minutes with media

Recommended Storage Pattern

For git-tracked projects, gitignore the media folder:

# data/telegram/.gitignore
*/media/

This keeps the NDJSON (metadata) in version control while keeping large media files local-only.

Anti-Patterns

Don't dump channels you're not subscribed to — Telethon needs access via your account
Don't run multiple dumps concurrently on the same profile — session file contention

Post-Execution Reflection

After this skill completes, check before closing:

Did the command succeed? — If not, fix the instruction or error table that caused the failure.
Did parameters or output change? — If tg-cli.py's interface drifted, update Usage examples and Parameters table to match.
Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.

dump-channel