dedupe-rank

Installation

SKILL.md

Dedupe + Rank

Turns a raw candidate pool into a deduped pool and a stable core set.

Input

papers/papers_raw.jsonl

Outputs

papers/papers_dedup.jsonl
papers/core_set.csv

Script boundary

scripts/run.py should own only:

title/year deduplication
deterministic ranking
stable paper_id generation

Use shared domain packs or pipeline contract metadata for topic-specific or product-specific behavior.

Contract-driven behavior

The script should prefer pipeline contract metadata over profile-name branching.

Current important field:

quality_contract.candidate_pool_policy.keep_full_deduped_pool

If true, the script keeps the full deduped pool in papers/core_set.csv unless the user explicitly overrides core size.

Acceptance

deduped JSONL exists
core-set CSV exists
reruns are stable for the same inputs

Non-goals

retrieval
screening
manual topic authoring inside the script

Related skills

More from willoscar/research-units-pipeline-skills

Installs

33

Repository

willoscar/resea…e-skills

GitHub Stars

429

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubPass