dedupe-rank

Installation
SKILL.md

Dedupe + Rank

Turns a raw candidate pool into a deduped pool and a stable core set.

Input

  • papers/papers_raw.jsonl

Outputs

  • papers/papers_dedup.jsonl
  • papers/core_set.csv

Script boundary

scripts/run.py should own only:

  • title/year deduplication
  • deterministic ranking
  • stable paper_id generation

Use shared domain packs or pipeline contract metadata for topic-specific or product-specific behavior.

Contract-driven behavior

The script should prefer pipeline contract metadata over profile-name branching.

Current important field:

  • quality_contract.candidate_pool_policy.keep_full_deduped_pool

If true, the script keeps the full deduped pool in papers/core_set.csv unless the user explicitly overrides core size.

Acceptance

  • deduped JSONL exists
  • core-set CSV exists
  • reruns are stable for the same inputs

Non-goals

  • retrieval
  • screening
  • manual topic authoring inside the script
Related skills
Installs
33
GitHub Stars
429
First Seen
Jan 23, 2026