skills/fadeloo/skills/fetch-abstract-to-kb

fetch-abstract-to-kb

SKILL.md

Fetch Abstract to KB

Core Goal

  • Reuse the same PostgreSQL connection env variables as fetch-meta-to-kb.
  • Select rows whose abstract is empty and order by newest created_at first.
  • Open https://doi.org/<doi> in OpenClaw Browser and extract abstract text.
  • Write back only when the row is still empty at update time.
  • Default to dry run; require explicit --apply to write.

Required Environment

  • KB_DB_HOST
  • KB_DB_PORT
  • KB_DB_NAME
  • KB_DB_USER
  • KB_DB_PASSWORD
  • KB_LOG_DIR (required run log directory)

Workflow

  1. Run local self-test first (no DB/browser required):
python3 scripts/fetch_abstract_to_kb.py --self-test
  1. Dry run first (default mode; no DB write):
python3 scripts/fetch_abstract_to_kb.py --limit 100
  1. Apply updates after review:
python3 scripts/fetch_abstract_to_kb.py --limit 100 --apply
  1. Override table/column names when needed (created_at is fixed and required):
python3 scripts/fetch_abstract_to_kb.py \
  --table journals \
  --doi-column doi \
  --abstract-column abstract \
  --limit 100 \
  --apply

Safety Contract

  • Selection filter:
    • DOI not empty
    • abstract empty (NULL or blank)
  • Selection order:
    • newest created_at first (ORDER BY created_at DESC NULLS LAST LIMIT n)
  • Update filter (second guard):
    • WHERE doi = ? AND abstract is still empty
  • Run summary:
    • emit RUN_SUMMARY_JSON=<json> for current run only.
  • Abort behavior:
    • stop early when errors exceed --max-errors.

Browser Requirement

  • openclaw CLI must be installed.
  • Script checks openclaw browser status; if browser is not running, it tries openclaw browser start.
  • If start fails (for example extension tab not attached), attach OpenClaw browser session first, then rerun.

Script

  • scripts/fetch_abstract_to_kb.py
Weekly Installs
1
Repository
fadeloo/skills
First Seen
6 days ago
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1