sustainability-rss-fetch
Sustainability RSS Fetch
Core Goal
- Ingest all RSS/Atom items into SQLite before topic filtering.
- Use
doias the primary key inentries. - Keep RSS metadata isolated in its own DB file.
- After semantic screening, keep relevant rows and prune non-relevant rows to DOI-only.
Triggering Conditions
- Receive a request to import sustainability feeds and persist all fetched records first.
- Receive a request to do prompt-based topic screening after DB ingestion.
- Receive a request to convert irrelevant rows into lightweight DOI-only records.
- Need stable DOI-keyed storage for downstream API/fulltext/summarization.
Mandatory Workflow
- Prepare runtime and RSS metadata DB path.
python3 -m pip install feedparser
export SUSTAIN_RSS_DB_PATH="/absolute/path/to/workspace-rss-bot/sustainability_rss.db"
python3 scripts/rss_subscribe.py init-db --db "$SUSTAIN_RSS_DB_PATH"
- Collect RSS window and ingest all fetched items first.
python3 scripts/rss_subscribe.py collect-window \
--db "$SUSTAIN_RSS_DB_PATH" \
--opml assets/journal.opml \
--start 2026-02-01 \
--end 2026-02-10 \
--max-items-per-feed 150 \
--topic-prompt "筛选与可持续主题相关的文章:生命周期评价、物质流分析、绿色供应链、绿电、绿色设计、减污降碳" \
--output /tmp/sustainability-candidates.json \
--pretty
- Screen candidates in agent context (semantic, not regex-only).
- Use
topic_prompt+ user instructions. - Produce selected
candidate_idlist.
- Mark selected rows as relevant and prune unselected rows.
python3 scripts/rss_subscribe.py insert-selected \
--db "$SUSTAIN_RSS_DB_PATH" \
--candidates /tmp/sustainability-candidates.json \
--selected-ids 3,7,12,21
Result:
- selected candidates:
is_relevant=1, keep metadata. - unselected candidates: clear metadata fields, keep DOI-only row (
is_relevant=0).
Optional Maintenance Sync
python3 scripts/rss_subscribe.py sync --db "$SUSTAIN_RSS_DB_PATH" --max-feeds 20 --max-items-per-feed 100
Source Management
python3 scripts/rss_subscribe.py add-feed --db "$SUSTAIN_RSS_DB_PATH" --url "https://example.com/feed.xml"
python3 scripts/rss_subscribe.py import-opml --db "$SUSTAIN_RSS_DB_PATH" --opml assets/journal.opml
Query Data
python3 scripts/rss_subscribe.py list-feeds --db "$SUSTAIN_RSS_DB_PATH" --limit 50
python3 scripts/rss_subscribe.py list-entries --db "$SUSTAIN_RSS_DB_PATH" --limit 100
Data Contract
feedstable: subscription and fetch state.entriestable (doiPK):- metadata fields (
title/url/summary/categories/...) doi_is_surrogate(when no DOI is present in source)is_relevant(1relevant,0pruned non-relevant,NULLnot labeled yet)
- metadata fields (
- Non-relevant rows are pruned to DOI-only payload for storage efficiency.
Configurable Parameters
--dbSUSTAIN_RSS_DB_PATH--opml--feed-url--use-subscribed-feeds--topic-prompt--start/--end--max-feeds--max-items-per-feed--user-agent--cleanup-ttl-days
Error and Boundary Handling
- Feed/network failure: continue other feeds and keep errors in feed state.
- Missing
feedparser: return install guidance. - Missing DOI in RSS item: create deterministic surrogate DOI key to keep full-ingestion guarantee.
- Invalid selected IDs: fail fast before label/prune write.
References
references/input-model.mdreferences/output-rules.mdreferences/time-range-rules.md
Assets
assets/journal.opmlassets/config.example.json
Scripts
scripts/rss_subscribe.py
More from fadeloo/skills
email-imap-fetch
Listen for one or more IMAP inboxes with the IDLE command, fetch unread email metadata plus text previews, and forward each message to OpenClaw webhooks. Use when tasks need near-real-time mailbox monitoring, multi-account inbox ingestion via environment variables, and automatic trigger delivery into OpenClaw automation.
8ai-tech-fulltext-fetch
Fetch and persist article full text for RSS entries already stored in SQLite by ai-tech-rss-fetch. Use when backfilling or incrementally syncing body text from entries.url or entries.canonical_url into a companion table for downstream indexing, retrieval, or summarization.
8ai-tech-summary
Retrieve time-windowed RSS evidence from SQLite and let the agent produce final summaries using RAG over selected records and fields. Use when generating daily, weekly, monthly, or custom-range AI tech digests directly in agent responses instead of fixed template reports.
7email-smtp-send
Send emails through SMTP with optional local attachments and optional IMAP APPEND sync to Sent mailbox. Use when tasks need reliable outbound email delivery, attachment sending, SMTP connectivity checks, or cross-client sent-mail visibility (for example appending to "Sent Items" after SMTP send).
7ai-tech-rss-fetch
Subscribe to AI and tech RSS feeds and persist normalized metadata into SQLite using mature Python tooling (feedparser + sqlite3). Use when adding feed URLs/OPML sources, running incremental sync with deduplication, and storing entry metadata without full-text extraction or summarization.
7sustainability-summary
Retrieve time-windowed relevant sustainability RSS evidence from the RSS metadata SQLite database and optionally join DOI-keyed enriched content from a separate fulltext SQLite database. Use when generating grounded daily, weekly, monthly, or custom-range digests after relevance labeling.
7