sustainability-rss-summary
SKILL.md
Sustainability RSS Summary
Core Goal
- Fetch abstracts without webpage crawling.
- Use DOI as the only lookup key.
- Handle indexing lag with delayed retries.
- Output ready abstracts for downstream LLM summarization.
Triggering Conditions
- Receive a request to build or run an RSS-to-summary pipeline for sustainability/science papers.
- Receive RSS items with DOI/title/link and need legal, low-maintenance abstract retrieval.
- Need OpenAlex-first, Semantic Scholar-fallback orchestration.
- Need retry handling for papers that are too new to be indexed immediately.
Workflow
- Ingest DOI events from official RSS feeds.
- Keep at least
doi; includetitle,link, andsource_feedwhen available. - Use
assets/rss-events.example.jsonlas input format reference.
- Add events into the pending queue.
- Run
scripts/abstract_pipeline.py queue-addto upsert DOI tasks into SQLite. - Preserve existing
readyrecords; reset non-ready records to retryablenew.
- Fetch abstracts via dual-tower APIs.
- Primary: OpenAlex (
abstract_inverted_indexreconstruction). - Fallback: Semantic Scholar (
abstracttext field). - Use
fetchfor immediate single DOI checks and debugging.
- Retry delayed-index papers.
- Use
queue-runon a schedule (for example daily). - If both APIs miss, increment
retry_count, setnext_retry_at, and keep status asnew. - Mark record
failedwhenretry_count >= max_retries.
- Hand off ready abstracts to LLM.
- Pull rows with
status=readyviaqueue-list. - Send
title + abstract_text + metadatato summarization workflow.
Commands
Single DOI fetch
python3 scripts/abstract_pipeline.py fetch \
--doi "10.1177/014920639101700108" \
--openalex-email "you@example.com" \
--pretty
Queue ingest from JSONL
python3 scripts/abstract_pipeline.py queue-add \
--db sustainability-rss-summary.db \
--jsonl assets/rss-events.example.jsonl \
--max-retries 3
Queue run (with retries)
python3 scripts/abstract_pipeline.py queue-run \
--db sustainability-rss-summary.db \
--backoff-hours "24,24,48"
Set credentials via environment variables (recommended):
export OPENALEX_EMAIL="you@example.com"
export S2_API_KEY="optional-semantic-scholar-key"
Manual emergency rerun (ignore next_retry_at):
python3 scripts/abstract_pipeline.py queue-run \
--db sustainability-rss-summary.db \
--force
Inspect queue state
python3 scripts/abstract_pipeline.py queue-list \
--db sustainability-rss-summary.db \
--pretty
Queue Status Model
new: waiting for first attempt or delayed retry.ready: abstract available (abstract_source,abstract_textset).failed: exhausted retry budget; manual follow-up needed.
Operational Rules
- Prefer OpenAlex first for cost and openness; use Semantic Scholar only as fallback.
- Never scrape publisher webpages for abstract extraction in this skill.
- Persist every miss into queue; do not drop DOI tasks silently.
- Always carry exact DOI in output metadata for traceability.
- Use
--openalex-email(orOPENALEX_EMAIL) for polite and faster OpenAlex routing. - Treat API throttling/network errors as transient retries, not permanent failures.
- Use
--forceonly for manual backfill or debugging; keep scheduled jobs in normal due-mode.
References
references/architecture.mdreferences/testing.md
Assets
assets/rss-events.example.jsonl
Weekly Installs
1
Repository
tiangong-ai/skillsGitHub Stars
4
First Seen
Feb 11, 2026
Installed on
amp1
openclaw1
opencode1
kimi-cli1
codex1
github-copilot1