skills/toyworks/agent-skills/t3-hardware-scoring

t3-hardware-scoring

SKILL.md

MantaBase T3 Hardware Audit System — v2.1

Objectives

  • Crawl product information → Brand Blind → Triple-Auditor independent scoring → Eagle Eye Validation → Final Judge synthesis
  • Output: objective Tool / Toy / Trash classification with Eagle Eye safety veto
  • Core principles: evidence-first, parallel audits, zero hallucination, transparent math

Architecture Overview (v2.1)

Main Agent                             Subagents
───────────────────────────────────    ──────────────────────────────
Step 1: Data Collection                
Step 2: Organize + Brand Blind         
  ↓ write 02-brand-blinded.md          
                                       Step 3: ×3 parallel (Tool/Toy/Trash)
                                         · read file paths, print JSON to stdout
                                         · main agent writes 03-*.json files
  ↓ receive JSON, write files           
  ↓ auto-build auditor_reports.json    
                                       Step 4: ×1 Eagle Eye Validator (conditional)
                                         · AI subagent, not Python script
                                         · outputs adjustments[] diff
  ↓ apply diffs, rebuild JSON           
Step 5: synthesize_results.py           
Step 6: 99-audit-report.md             

Key v2.1 changes:

  • Step 4 replaces 3-way Peer Review with single AI-driven Eagle Eye Validator
  • Subagents print JSON to stdout; main agent writes all files
  • New Eagle Eye pattern: Architectural Implausibility
  • Mandatory auto-rebuild of auditor_reports.json from 03-*.json files
  • Token guardrails on verbatim_evidence and prompt size

Data Collection Strategy (Step 1-2)

Primary: crawl_product_info.py

python3 scripts/crawl_product_info.py --url https://example.com/product --pretty

Uses Jina AI Reader first, then direct HTTP fallback. Exit code 2 = content insufficient → trigger Exa fallback.

Fallback: Exa MCP (use when crawl fails or is insufficient)

mcporter call 'exa.web_search_exa(query: "{product name} specs review", numResults: 5)'
mcporter call 'exa.company_research_exa(companyName: "{brand}", numResults: 3)'

Exa is the preferred fallback — always available, no API key required. Collect at least 3 independent sources before proceeding.

Additional sources (check in order)

  1. Official product page (specs, pricing, marketing claims)
  2. Third-party reviews (TechAdvisor, The Verge, Tom's Guide, TrustedReviews)
  3. Community discussion (Reddit, forums) for real-user pain points
  4. Investigation/news coverage (privacy issues, lawsuits, controversies)

Minimum data threshold

Proceed only when you have:

  • At least one official source with tech specs + price
  • At least one third-party review with hands-on findings
  • Marketing claims verbatim (needed for Eagle Eye Honest check)

Pre-launch data flag

If the product meets ALL of the following, append (Pre-launch) to the case_id and skip Step 4:

  • Zero independent hands-on reviews
  • ≥3 critical specs undisclosed (battery, weight, connectivity, regulatory)
  • Pre-order only / not shipping

Procedure

Step 1: Collect Raw Data

  • Run crawl_product_info.py on official product URL
  • If exit code 2, run Exa MCP queries
  • Save raw extracts to 01-level0-extracts.md with source tags [S1], [S2], etc.

Step 2: Organize + Brand Blind (combined step)

  • Read references/organize-guide.md
  • Read references/defluff-guide.md
  • Merge all sources into one structured document
  • Apply Brand Blinding inline (replace brand names with [BRAND], [PRODUCT], [FEATURE])
  • Preserve ALL adjectives and marketing claims verbatim — needed by downstream auditors
  • Brand-blinded text should be ≤ 1200 words — deduplicate aggressively but keep all unique claims
  • Save to 02-brand-blinded.md

Step 3: Triple Auditor Scoring (independent, parallel)

Each Auditor sees only the Brand-Blinded text from Step 2.

Subagent execution rules (mandatory):

  • Pass file path to 02-brand-blinded.md — do NOT embed full text in prompt
  • Each subagent reads the file, reads their rubric file, and prints JSON to stdout
  • Main agent writes files — subagents never call write (prevents timeout file loss)
  • verbatim_evidence limits:
    • score 0–1: max 1 quote (or [] for score 0)
    • score 2–3: max 2 quotes
  • Recommended timeouts: Tool=150s, Toy=150s, Trash=180s

🟢 Tool Auditor

🟡 Toy Auditor

🔴 Trash Auditor

Key principles:

  • Each auditor works from Brand-Blinded text only
  • No cross-talk between auditors during scoring
  • Evidence FIRST, then score — no score without a verbatim quote
  • If a feature isn't in the text, it scores 0

Step 3.5: Auto-rebuild auditor_reports.json (mandatory)

After all three auditors return, the main agent MUST:

  1. Read each 03-tool-auditor.json, 03-toy-auditor.json, 03-trash-auditor.json
  2. Rebuild auditor_reports.json programmatically:
    merged = {
      "tool":  { "total_score": tool["total_score"], "litmus_gate": tool["litmus_gate"], "extract_for_report": tool["extract_for_report"] },
      "toy":   { "total_score": toy["total_score"],  "litmus_gate": toy["litmus_gate"],  "extract_for_report": toy["extract_for_report"] },
      "trash": { "total_score": trash["total_score"], "litmus_gate": trash["litmus_gate"], "critical_issues": trash["critical_issues"], "extract_for_report": trash["extract_for_report"] }
    }
    
  3. Sanity check before proceeding:
    • merged.trash.total_score == sum of all 14 items in 03-trash-auditor.json
    • len(merged.trash.critical_issues) == count of Eagle Eye triggers in 03-trash-auditor.json
    • If mismatch: re-read the source file (never hand-edit scores)

🚨 Never manually write auditor_reports.json from memory or summaries — always rebuild from 03-*.json files.

Step 4: Eagle Eye Validator (conditional, AI-driven)

Read references/eagle-eye-validator.md

Trigger conditions — run Step 4 if ANY is true:

  • Trash report critical_issues is non-empty
  • Any auditor has any item scored 3
  • Composite score is in Gray Zone (-10 to +10)

Otherwise: skip directly to Step 5.

Execution:

  • Spawn 1 subagent (timeout: 120s)
  • Subagent reads 02-brand-blinded.md + only the flagged items from 03-*.json
  • Performs three checks:
    1. Eagle Eye trigger validity — do both conflicting quotes exist verbatim?
    2. Architectural Plausibility — are on-device/local claims technically feasible for the form factor? (NEW in v2.1)
    3. Score=3 evidence check — does evidence actually support the 3-level rubric?
  • Outputs adjustments[] diff (not a full re-score)
  • Main agent applies diffs to auditor_reports.json and re-saves

Step 5: Final Judge Synthesis

python3 scripts/synthesize_results.py --input auditor_reports.json --text

Or manually per references/t3-classification.md:

  1. Normalize scores: Tool=(raw/33)×100, Toy=(raw/33)×100, Trash=(raw/42)×100
  2. Composite = max(NormTool, NormToy) - NormTrash
  3. Check primary conditions (need 2+ of 4 conditions per category)
  4. Check secondary conditions
  5. Apply Eagle Eye Veto (if Trash critical_issues non-empty → force Trash into label)

Step 6: Generate Audit Report

Output file: 99-audit-report.md Directory: tmp/reports/t3-{YYYY-MM-DD}-{slug}/

Follow YAML schema in references/report-schema.md.


Output Format for Messaging Surfaces (WhatsApp / Telegram)

When delivering results in a chat interface, do not paste the full file. Use this condensed format instead:

🏷️ T3 Audit — {Product Name}
Price: ${price} | Date: {YYYY-MM-DD}

📊 Scores
🟢 Tool:  {raw}/{33} → {norm}/100
🟡 Toy:   {raw}/{33} → {norm}/100
🔴 Trash: {raw}/{42} → {norm}/100
Composite: {composite:+.1f}

🏆 Classification: {final_label}
Confidence: {confidence}

{If Eagle Eye:}
🚨 Eagle Eye: {trigger name}
"{verbatim quote A}"
vs.
"{verbatim quote B}"

🟢 Tool highlights (3 bullets)
🟡 Toy highlights (3 bullets)
🔴 Trash issues (3 bullets)

💬 One-line verdict

Rules for messaging output:

  • No markdown tables (WhatsApp doesn't render them)
  • Use bold for emphasis, bullet lists for details
  • Keep total message under 600 words
  • If Eagle Eye triggered, always show the conflicting quotes

Important Constraints

🚨 Information Isolation (critical)

  • Brand-Blinded text is the ONLY input for auditors
  • Auditors must never see original brand names, marketing copy, or each other's reports during scoring
  • Final Judge reads only the three auditor JSON outputs, not original product text

📊 Objective data completeness

  • Collect specs, performance, reliability, market, and cost data
  • Brand Blinding must retain all specs and numbers
  • Every score must link to extracted verbatim evidence

Eagle Eye enforcement

  • Trash Auditor MUST check every item against references/trash-red-flags.md
  • Eagle Eye trigger → automatic score 3 → populate critical_issues[]
  • Eagle Eye Validator (Step 4) re-checks triggers including Architectural Plausibility
  • Final Judge must respect Eagle Eye Veto regardless of composite math

synthesize_results.py input format

The script expects a JSON file like:

{
  "tool":  { "total_score": <int 0-33>, "litmus_gate": "Yes|No", ... },
  "toy":   { "total_score": <int 0-33>, "litmus_gate": "Yes|No", ... },
  "trash": { "total_score": <int 0-42>, "litmus_gate": "Yes|No",
             "critical_issues": ["<eagle eye trigger text>", ...], ... }
}

Resource Index

Scripts

  • scripts/crawl_product_info.py — Multi-strategy product crawler (Jina → direct HTTP fallback)
  • scripts/synthesize_results.py — Final Judge math (normalization + Eagle Eye + Litmus Gate)

References (v2.1)

File Used In Notes
organize-guide.md Step 2
defluff-guide.md Step 2
tool-auditor.md Step 3
tool-auditor-template.md Step 3
toy-auditor.md Step 3
toy-auditor-template.md Step 3
trash-auditor.md Step 3
trash-red-flags.md Step 3 (Eagle Eye) v2.1: added Architectural Implausibility
eagle-eye-validator.md Step 4 v2.1: AI-driven, replaces peer review
t3-classification.md Step 5
report-schema.md Step 6
peer-review-guide.md Step 4 Deprecated in v2.1 — kept for reference only
Weekly Installs
15
GitHub Stars
2
First Seen
Feb 24, 2026
Installed on
opencode15
gemini-cli15
github-copilot15
codex15
amp15
kimi-cli15