dedupe-rank
Installation
SKILL.md
Dedupe + Rank
Turns a raw candidate pool into a deduped pool and a stable core set.
Input
papers/papers_raw.jsonl
Outputs
papers/papers_dedup.jsonlpapers/core_set.csv
Script boundary
scripts/run.py should own only:
- title/year deduplication
- deterministic ranking
- stable
paper_idgeneration
Use shared domain packs or pipeline contract metadata for topic-specific or product-specific behavior.
Contract-driven behavior
The script should prefer pipeline contract metadata over profile-name branching.
Current important field:
quality_contract.candidate_pool_policy.keep_full_deduped_pool
If true, the script keeps the full deduped pool in papers/core_set.csv unless the user explicitly overrides core size.
Acceptance
- deduped JSONL exists
- core-set CSV exists
- reruns are stable for the same inputs
Non-goals
- retrieval
- screening
- manual topic authoring inside the script
Related skills