add-golden
Add to Golden Dataset
Multi-agent curation workflow with quality score explanations, bias detection, and version tracking.
Quick Start
/add-golden https://example.com/article
/add-golden https://arxiv.org/abs/2312.xxxxx
Task Management (CC 2.1.16)
# Create main curation task
TaskCreate(
subject="Add to golden dataset: {url}",
description="Multi-agent curation with quality explanation",
activeForm="Curating document"
)
# Create subtasks for 9-phase process
phases = ["Fetch content", "Run quality analysis", "Explain scores",
"Check bias", "Check diversity", "Validate", "Get approval",
"Write to dataset", "Update version"]
for phase in phases:
TaskCreate(subject=phase, activeForm=f"{phase}ing")
Workflow Overview
| Phase | Activities | Output |
|---|---|---|
| 1. Input Collection | Get URL, detect content type | Document metadata |
| 2. Fetch and Extract | Parse document structure | Structured content |
| 3. Quality Analysis | 4 parallel agents evaluate | Raw scores |
| 4. Quality Explanation | Explain WHY each score | Score rationale |
| 5. Bias Detection | Check for bias in content | Bias report |
| 6. Diversity Check | Assess dataset balance | Diversity metrics |
| 7. Validation | Schema, duplicates, gates | Validation status |
| 8. Silver-to-Gold | Promote or mark as silver | Classification |
| 9. Version Tracking | Track changes, rollback | Version entry |
Phase 1-2: Input and Extraction
Detect content type: article, tutorial, documentation, research_paper.
Extract: title, sections, code blocks, key terms, metadata (author, date).
Phase 3: Parallel Quality Analysis (4 Agents)
Launch ALL agents in ONE message with run_in_background=True.
| Agent | Focus | Output |
|---|---|---|
| code-quality-reviewer | Accuracy, coherence, depth, relevance | Quality scores |
| workflow-architect | Keyword directness, paraphrase, reasoning | Difficulty level |
| data-pipeline-engineer | Primary/secondary domains, skill level | Tags |
| test-generator | Direct, paraphrased, multi-hop queries | Test queries |
See Quality Scoring for detailed criteria.
Phase 4: Quality Explanation
Each dimension gets WHY explanation:
### Accuracy: [N.NN]/1.0
**Why this score:**
- [Specific reason with evidence]
**What would improve it:**
- [Specific improvement]
Phase 5: Bias Detection
See Bias Detection Guide for patterns.
Check for:
- Technology bias (favors specific tools)
- Recency bias (ignores LTS versions)
- Complexity bias (assumed knowledge)
- Vendor bias (promotes products)
- Geographic/cultural bias
| Bias Score | Action |
|---|---|
| 0-2 | Proceed normally |
| 3-5 | Add disclaimer |
| 6-8 | Require user review |
| 9-10 | Recommend against |
Phase 6: Diversity Dashboard
Track dataset balance across:
- Domain distribution (AI/ML, Backend, Frontend, DevOps, Security)
- Difficulty distribution (trivial, easy, medium, hard, adversarial)
Impact assessment: Does new document improve or worsen diversity?
Phase 7: Validation
- URL validation (no placeholders)
- Schema validation (required fields)
- Duplicate check (>80% similarity)
- Quality gates (min sections, content length)
Phase 8: Silver-to-Gold Workflow
See Silver-Gold Promotion for criteria.
| Status | Criteria | Action |
|---|---|---|
| GOLD | Score >= 0.75, no bias | Add to main dataset |
| SILVER | Score 0.55-0.74 | Add to silver, track |
| REJECT | Score < 0.55 | Do not add |
Promotion criteria: 7+ days in silver, quality >= 0.75, no negative feedback.
Phase 9: Version Tracking
{
"version": "1.2.3",
"change_type": "ADD|UPDATE|REMOVE|PROMOTE",
"document_id": "doc-123",
"quality_score": 0.82,
"rollback_available": true
}
| Update Type | Version Bump |
|---|---|
| Add/Update document | Patch (0.0.X) |
| Remove document | Minor (0.X.0) |
| Schema change | Major (X.0.0) |
Quality Scoring
| Dimension | Weight |
|---|---|
| Accuracy | 0.25 |
| Coherence | 0.20 |
| Depth | 0.25 |
| Relevance | 0.30 |
Formula: quality_score = accuracy*0.25 + coherence*0.20 + depth*0.25 + relevance*0.30
Key Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Score explanation | Required | Transparency, actionable feedback |
| Bias detection | Dedicated agent | Prevent dataset contamination |
| Two-tier system | Silver + Gold | Allow docs time to mature |
| Version tracking | Semantic versioning | Clear history, safe rollbacks |
Related Skills
golden-dataset-validation- Validate existing datasetsllm-evaluation- LLM output evaluation patternstest-data-management- Test data strategies
Version: 2.0.0 (January 2026)