Semantic Scholar Library & Feed

Overview

Use this skill to work against Semantic Scholar's authenticated Library and Research Feed surfaces without browser-driven login flows. Prefer the bundled CLI for Cookie Store inspection, browser-curl import, SSR probing, feed pagination, folder inspection, folder writes, and Graph paper/batch lookups.

Quick Start

Check whether the fixed Cookie Store already exists:

python3 scripts/semantic_scholar_cli.py cookie-summary

If the Cookie Store is missing or stale, ask the user to copy an authenticated Semantic Scholar request as curl from browser DevTools, then import it:

python3 scripts/semantic_scholar_cli.py import-curl \
  --curl-file /tmp/semantic-scholar-request.sh

import-header is still available when the user already extracted only the raw Cookie header.

Check cookie health:

python3 scripts/semantic_scholar_cli.py cookie-summary

Choose the task module below.

Cookie Store

Save Semantic Scholar auth state under:

~/.auth/semantic-scholar.cookies.json
~/.auth/semantic-scholar.cookie-header.txt

Treat sid and s2 as the minimum required cookies for private Library and Feed access. If either is missing, ask the user for a fresh browser-copied curl and re-import it before touching private endpoints.

Read references/auth-and-cookies.md when you need the Cookie Store workflow or the curl import format.

Task Modules

Research Feed

Use this path when the user wants feed export, history crawl, or local coarse filtering.

Probe SSR if you need to inspect var DATA:

python3 scripts/semantic_scholar_cli.py ssr-dump --list-names

Crawl the feed through /api/1/library/folders/recommendations:

python3 scripts/semantic_scholar_cli.py feed-crawl \
  --output /tmp/research-feed.json

Persist output after every window. Do not wait for the entire crawl to finish.

Read references/research-feed.md when you need the SSR decode order, the real API path, or the pagination stop rules.

Library Folder

Use this path when the user wants folder contents, folder diffs, or bulk add operations.

Export a folder:

python3 scripts/semantic_scholar_cli.py folder-entries \
  --folder-id 13895811 \
  --all-pages \
  --output /tmp/folder.json

Add a paper to a folder:

python3 scripts/semantic_scholar_cli.py folder-add \
  --paper-id 25f612200a3821c71b99819cd671f2e60df5b470 \
  --paper-title 'AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent' \
  --folder-ids 13895811

Read references/library.md when you need endpoint behavior, pageSize limits, or the verified entries/bulk request shape.

Paper ID Resolution

Use Graph paper/batch when BibTeX already contains stable identifiers such as arXiv IDs.

python3 scripts/semantic_scholar_cli.py graph-batch \
  --ids ARXIV:2602.08234,ARXIV:2602.12670

Prefer this over search-page scraping when possible.

Operating Rules

Prefer direct HTTP after cookies exist.
Do not use Playwright for login. If the Cookie Store is missing, ask the user to copy an authenticated browser request as curl and import it.
Use Playwright only when rendered-page behavior or network inspection is needed to discover hidden interfaces.
Treat browser clicking as a reconnaissance step, not the main extraction path.
For feed history, stop on empty days, missing nextWindowUTC, or repeated windows.
For folder sync, resolve paperId first, diff against existing folder entries, then call folder-add.
If the task depends on a private page and returns 401, ask for a fresh browser-copied curl and refresh the Cookie Store before debugging the endpoint.

semantic-scholar-library-feed