Sustainability RSS Summary

Core Goal

Receive a request to build or run an RSS-to-summary pipeline for sustainability/science papers.
Receive RSS items with DOI/title/link and need legal, low-maintenance abstract retrieval.
Need OpenAlex-first, Semantic Scholar-fallback orchestration.
Need retry handling for papers that are too new to be indexed immediately.

Use queue-run on a schedule (for example daily).
If both APIs miss, increment retry_count, set next_retry_at, and keep status as new.
Mark record failed when retry_count >= max_retries.

python3 scripts/abstract_pipeline.py fetch \
  --doi "10.1177/014920639101700108" \
  --openalex-email "you@example.com" \
  --pretty

python3 scripts/abstract_pipeline.py queue-add \
  --db sustainability-rss-summary.db \
  --jsonl assets/rss-events.example.jsonl \
  --max-retries 3

python3 scripts/abstract_pipeline.py queue-run \
  --db sustainability-rss-summary.db \
  --backoff-hours "24,24,48"

Set credentials via environment variables (recommended):

export OPENALEX_EMAIL="you@example.com"
export S2_API_KEY="optional-semantic-scholar-key"

Manual emergency rerun (ignore next_retry_at):

python3 scripts/abstract_pipeline.py queue-run \
  --db sustainability-rss-summary.db \
  --force

python3 scripts/abstract_pipeline.py queue-list \
  --db sustainability-rss-summary.db \
  --pretty

Prefer OpenAlex first for cost and openness; use Semantic Scholar only as fallback.
Never scrape publisher webpages for abstract extraction in this skill.
Persist every miss into queue; do not drop DOI tasks silently.
Always carry exact DOI in output metadata for traceability.
Use --openalex-email (or OPENALEX_EMAIL) for polite and faster OpenAlex routing.
Treat API throttling/network errors as transient retries, not permanent failures.
Use --force only for manual backfill or debugging; keep scheduled jobs in normal due-mode.