skills/fadeloo/skills/sustainability-rss-fetch

sustainability-rss-fetch

Originally fromtiangong-ai/skills
SKILL.md

Sustainability RSS Fetch

Core Goal

  • Ingest all RSS/Atom items into SQLite before topic filtering.
  • Use doi as the primary key in entries.
  • Keep RSS metadata isolated in its own DB file.
  • After semantic screening, keep relevant rows and prune non-relevant rows to DOI-only.

Triggering Conditions

  • Receive a request to import sustainability feeds and persist all fetched records first.
  • Receive a request to do prompt-based topic screening after DB ingestion.
  • Receive a request to convert irrelevant rows into lightweight DOI-only records.
  • Need stable DOI-keyed storage for downstream API/fulltext/summarization.

Mandatory Workflow

  1. Prepare runtime and RSS metadata DB path.
python3 -m pip install feedparser
export SUSTAIN_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/sustainability_rss.db"
python3 scripts/rss_subscribe.py init-db --db "$SUSTAIN_RSS_DB_PATH"
  1. Collect RSS window and ingest all fetched items first.
python3 scripts/rss_subscribe.py collect-window \
  --db "$SUSTAIN_RSS_DB_PATH" \
  --opml assets/journal.opml \
  --start 2026-02-01 \
  --end 2026-02-10 \
  --max-items-per-feed 150 \
  --topic-prompt "筛选与可持续主题相关的文章:生命周期评价、物质流分析、绿色供应链、绿电、绿色设计、减污降碳" \
  --output /tmp/sustainability-candidates.json \
  --pretty
  1. Screen candidates in agent context (semantic, not regex-only).
  • Use topic_prompt + user instructions.
  • Produce selected candidate_id list.
  1. Mark selected rows as relevant and prune unselected rows.
python3 scripts/rss_subscribe.py insert-selected \
  --db "$SUSTAIN_RSS_DB_PATH" \
  --candidates /tmp/sustainability-candidates.json \
  --selected-ids 3,7,12,21

Result:

  • selected candidates: is_relevant=1, keep metadata.
  • unselected candidates: clear metadata fields, keep DOI-only row (is_relevant=0).

Optional Maintenance Sync

python3 scripts/rss_subscribe.py sync --db "$SUSTAIN_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100

Source Management

python3 scripts/rss_subscribe.py add-feed --db "$SUSTAIN_RSS_DB_PATH" --url "https://example.com/feed.xml"
python3 scripts/rss_subscribe.py import-opml --db "$SUSTAIN_RSS_DB_PATH" --opml assets/journal.opml

Query Data

python3 scripts/rss_subscribe.py list-feeds --db "$SUSTAIN_RSS_DB_PATH" --limit 50
python3 scripts/rss_subscribe.py list-entries --db "$SUSTAIN_RSS_DB_PATH" --limit 100

Data Contract

  • feeds table: subscription and fetch state.
  • entries table (doi PK):
    • metadata fields (title/url/summary/categories/...)
    • doi_is_surrogate (when no DOI is present in source)
    • is_relevant (1 relevant, 0 pruned non-relevant, NULL not labeled yet)
  • Non-relevant rows are pruned to DOI-only payload for storage efficiency.

Configurable Parameters

  • --db
  • SUSTAIN_RSS_DB_PATH
  • --opml
  • --feed-url
  • --use-subscribed-feeds
  • --topic-prompt
  • --start/--end
  • --max-feeds
  • --max-items-per-feed
  • --user-agent
  • --cleanup-ttl-days

Error and Boundary Handling

  • Feed/network failure: continue other feeds and keep errors in feed state.
  • Missing feedparser: return install guidance.
  • Missing DOI in RSS item: create deterministic surrogate DOI key to keep full-ingestion guarantee.
  • Invalid selected IDs: fail fast before label/prune write.

References

  • references/input-model.md
  • references/output-rules.md
  • references/time-range-rules.md

Assets

  • assets/journal.opml
  • assets/config.example.json

Scripts

  • scripts/rss_subscribe.py
Weekly Installs
7
Repository
fadeloo/skills
First Seen
Feb 24, 2026
Installed on
opencode7
github-copilot7
codex7
amp7
kimi-cli7
openclaw7