gdelt-gkg-fetch
SKILL.md
GDELT GKG Fetch
Core Goal
- Fetch GDELT 2.0
GKGtable exports (*.gkg.csv.zip) from official public endpoints. - Resolve latest available snapshot via
lastupdate.txt. - Resolve historical snapshots in a UTC range via
masterfilelist.txt. - Persist downloaded files and return machine-readable JSON manifest.
- Keep runtime observable with structured logs and optional log file.
Required Environment
- Configure runtime by environment variables (see
references/env.md). - Start from
assets/config.example.env. - Load env values before running commands:
set -a
source assets/config.example.env
set +a
Workflow
- Validate effective configuration.
python3 scripts/gdelt_gkg_fetch.py check-config --pretty
- Inspect the latest available GKG snapshot.
python3 scripts/gdelt_gkg_fetch.py resolve-latest --pretty
- Dry-run a historical range selection before downloading.
python3 scripts/gdelt_gkg_fetch.py fetch \
--mode range \
--start-datetime 20260301000000 \
--end-datetime 20260301120000 \
--max-files 3 \
--dry-run \
--pretty
- Fetch files with transport and structure validation.
python3 scripts/gdelt_gkg_fetch.py fetch \
--mode latest \
--max-files 1 \
--output-dir ./data/gdelt-gkg \
--preview-lines 2 \
--validate-structure \
--expected-columns 27 \
--quarantine-dir ./data/gdelt-gkg-quarantine \
--log-level INFO \
--log-file ./logs/gdelt-gkg-fetch.log \
--pretty
Built-in Robustness
- Apply retry with exponential backoff on transient HTTP/network failures.
- Respect
Retry-Afterwhen present on retriable responses. - Throttle request frequency with a minimum interval between requests.
- Enforce
--max-filessafety cap (GDELT_MAX_FILES_PER_RUN) to prevent accidental bulk pulls. - Validate datetime format and range boundaries before remote calls.
- Validate transport and structure after download:
- ZIP CRC/integrity check
- UTF-8 strict decoding check
- Tab column-count check (default 27)
- Optional bad-line issue quarantine (
--quarantine-dir)
- Emit JSON results while writing operational logs to stderr and optional log file.
Scope Decision
- Keep one concrete file-table fetch implementation:
GKG(*.gkg.csv.zip). - Keep atomic operations only; do not add internal scheduler/polling loops.
References
references/gdelt-data-sources.mdreferences/gdelt-limitations.mdreferences/gdelt-schema.mdreferences/env.mdreferences/openclaw-chaining-templates.md
Script
scripts/gdelt_gkg_fetch.py
OpenClaw Invocation Compatibility
- Keep skill trigger metadata in
name,description, andagents/openai.yaml. - Invoke in prompts with
$gdelt-gkg-fetch. - Keep the skill atomic: only resolve/fetch on demand.
- Use script parameters for fetch conditions (
--mode range --start-datetime --end-datetime). - If you need polling, let OpenClaw orchestrate repeated invocations externally, not inside this skill.
OpenClaw Prompt Templates
Use these templates directly in OpenClaw and only replace bracketed placeholders.
- Recon (latest availability)
Use $gdelt-gkg-fetch.
Run:
python3 scripts/gdelt_gkg_fetch.py resolve-latest --pretty
Return only the JSON result.
- Fetch (historical window, dry-run first)
Use $gdelt-gkg-fetch.
Run:
python3 scripts/gdelt_gkg_fetch.py fetch \
--mode range \
--start-datetime [YYYYMMDDHHMMSS] \
--end-datetime [YYYYMMDDHHMMSS] \
--max-files [N] \
--dry-run \
--pretty
Then run without --dry-run using:
--output-dir [OUTPUT_DIR]
--validate-structure
--expected-columns 27
--quarantine-dir [QUARANTINE_DIR]
Return only the JSON result.
- Validate (download quality gate)
Use $gdelt-gkg-fetch.
Run:
python3 scripts/gdelt_gkg_fetch.py fetch \
--mode latest \
--max-files 1 \
--output-dir [OUTPUT_DIR] \
--validate-structure \
--expected-columns 27 \
--quarantine-dir [QUARANTINE_DIR] \
--pretty
Check validation.issue_count, decode_error_count, column_mismatch_count.
Return JSON plus one-line pass/fail verdict.
Weekly Installs
1
Repository
tiangong-ai/skillsGitHub Stars
4
First Seen
1 day ago
Security Audits
Installed on
amp1
cline1
openclaw1
opencode1
cursor1
kimi-cli1