add-golden

SKILL.md

Add to Golden Dataset

Multi-agent curation workflow with quality score explanations, bias detection, and version tracking.

Quick Start

/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxx

Task Management (CC 2.1.16)

# Create main curation task
TaskCreate(
  subject="Add to golden dataset: {url}",
  description="Multi-agent curation with quality explanation",
  activeForm="Curating document"
)

# Create subtasks for 9-phase process
phases = ["Fetch content", "Run quality analysis", "Explain scores",
          "Check bias", "Check diversity", "Validate", "Get approval",
          "Write to dataset", "Update version"]
for phase in phases:
    TaskCreate(subject=phase, activeForm=f"{phase}ing")

Workflow Overview

Phase Activities Output
1. Input Collection Get URL, detect content type Document metadata
2. Fetch and Extract Parse document structure Structured content
3. Quality Analysis 4 parallel agents evaluate Raw scores
4. Quality Explanation Explain WHY each score Score rationale
5. Bias Detection Check for bias in content Bias report
6. Diversity Check Assess dataset balance Diversity metrics
7. Validation Schema, duplicates, gates Validation status
8. Silver-to-Gold Promote or mark as silver Classification
9. Version Tracking Track changes, rollback Version entry

Phase 1-2: Input and Extraction

Detect content type: article, tutorial, documentation, research_paper.

Extract: title, sections, code blocks, key terms, metadata (author, date).


Phase 3: Parallel Quality Analysis (4 Agents)

Launch ALL agents in ONE message with run_in_background=True.

Agent Focus Output
code-quality-reviewer Accuracy, coherence, depth, relevance Quality scores
workflow-architect Keyword directness, paraphrase, reasoning Difficulty level
data-pipeline-engineer Primary/secondary domains, skill level Tags
test-generator Direct, paraphrased, multi-hop queries Test queries

See Quality Scoring for detailed criteria.


Phase 4: Quality Explanation

Each dimension gets WHY explanation:

### Accuracy: [N.NN]/1.0
**Why this score:**
- [Specific reason with evidence]
**What would improve it:**
- [Specific improvement]

Phase 5: Bias Detection

See Bias Detection Guide for patterns.

Check for:

  • Technology bias (favors specific tools)
  • Recency bias (ignores LTS versions)
  • Complexity bias (assumed knowledge)
  • Vendor bias (promotes products)
  • Geographic/cultural bias
Bias Score Action
0-2 Proceed normally
3-5 Add disclaimer
6-8 Require user review
9-10 Recommend against

Phase 6: Diversity Dashboard

Track dataset balance across:

  • Domain distribution (AI/ML, Backend, Frontend, DevOps, Security)
  • Difficulty distribution (trivial, easy, medium, hard, adversarial)

Impact assessment: Does new document improve or worsen diversity?


Phase 7: Validation

  • URL validation (no placeholders)
  • Schema validation (required fields)
  • Duplicate check (>80% similarity)
  • Quality gates (min sections, content length)

Phase 8: Silver-to-Gold Workflow

See Silver-Gold Promotion for criteria.

Status Criteria Action
GOLD Score >= 0.75, no bias Add to main dataset
SILVER Score 0.55-0.74 Add to silver, track
REJECT Score < 0.55 Do not add

Promotion criteria: 7+ days in silver, quality >= 0.75, no negative feedback.


Phase 9: Version Tracking

{
  "version": "1.2.3",
  "change_type": "ADD|UPDATE|REMOVE|PROMOTE",
  "document_id": "doc-123",
  "quality_score": 0.82,
  "rollback_available": true
}
Update Type Version Bump
Add/Update document Patch (0.0.X)
Remove document Minor (0.X.0)
Schema change Major (X.0.0)

Quality Scoring

Dimension Weight
Accuracy 0.25
Coherence 0.20
Depth 0.25
Relevance 0.30

Formula: quality_score = accuracy*0.25 + coherence*0.20 + depth*0.25 + relevance*0.30


Key Decisions

Decision Choice Rationale
Score explanation Required Transparency, actionable feedback
Bias detection Dedicated agent Prevent dataset contamination
Two-tier system Silver + Gold Allow docs time to mature
Version tracking Semantic versioning Clear history, safe rollbacks

Related Skills

  • golden-dataset-validation - Validate existing datasets
  • llm-evaluation - LLM output evaluation patterns
  • test-data-management - Test data strategies

Version: 2.0.0 (January 2026)

Weekly Installs
11
GitHub Stars
92
First Seen
Jan 22, 2026
Installed on
claude-code8
antigravity6
gemini-cli6
opencode6
windsurf5
github-copilot5