tech-article-reproducibility

Installation

SKILL.md

Tech Article Reproducibility

Measure the quality of a technical article from the angle of "can a reader reproduce the same thing on their machine?" This is an independent axis from prose-style evaluation (mizchi-blog-style) or logical evaluation. The premise: the most important thing about a technical article is whether a reader can reproduce it on their own machine.

When to use

Final pre-publication check on a technical article draft
Hands-on articles / tutorial articles
Tool introduction articles / setup articles
Verifying an article that claims "it worked"

When not to use:

Conceptual explainer articles (nothing to reproduce)
Poems / opinion pieces
Self-contained small tidbits

Reproducibility check axes (10 axes)

Score each axis on a 0–2 scale, 20 points total → converted to a 10-point scale.

#	Axis	0 (NG)	1 (partial)	2 (OK)
1	Environment prerequisites stated	No OS / version / required tools listed	Partially listed	Everything listed (OS, lang version, CLI tools)
2	Code completeness	Fragments only, imports/setup omitted	Only the main part	Full, copy-pasteable form that runs
3	Command accuracy	Placeholders left as-is (`<your-token>` etc. without explanation)	Some placeholders	Runnable as-is
4	Version dependency stated	No mention	Partial	Explicit, e.g. "works on v3.x", "v2 or earlier behaves as X"
5	Full config files included	Excerpts only	Main keys only	Full minimal working config
6	Expected output shown	None	Explained in prose	Actual output / screenshot
7	Handling of errors	Not mentioned	One case touched on	Several major errors + how to handle them
8	Project prerequisites stated	Author-environment assumptions are implicit	Partially stated	Paths / repo structure / existing config all stated
9	Link health	Links broken or require auth	Some require auth	All accessible publicly
10	Author-specific knowledge stated	Helpers / dotfiles assumed implicitly	Partially stated	Fully stated or not required

Evaluation workflow

For evaluating technical articles, use the same subagent dispatch as empirical-prompt-tuning. The difference is that the subagent plays the role of "a first-time reader trying to reproduce the work" rather than "an executor."

Fix the target article
subagent dispatch (template below)
Extract "reproduction sticking points" from the returned evaluation
Add / fix text in the article to address those sticking points
If needed, re-evaluate with a fresh subagent

subagent dispatch template

You are a reader interested in <the article's subject area> but new to <the tech stack>.
You are going to read this article and try to reproduce the same thing in your local environment.

## Target article
<path to the article file>

## Evaluation axes (10 reproducibility axes)
Score each axis 0–2. Refer to the rubric in the `tech-article-reproducibility` skill:
/Users/mz/.claude/skills/tech-article-reproducibility/SKILL.md

1. Environment prerequisites stated
2. Code completeness
3. Command accuracy
4. Version dependency stated
5. Full config files included
6. Expected output shown
7. Handling of errors
8. Project prerequisites stated
9. Link health (actually verify with WebFetch)
10. Author-specific knowledge stated

## Tasks
1. While reading the article, imagine "where would I get stuck if I reproduced this on my own machine?"
2. Score each axis 0–2 with quoted evidence
3. List the top 5 sticking points with line numbers

## Report structure
- Reproducibility score: X/20 (breakdown table)
- Top 5 sticking points: <line number> <quote> → <why it sticks>
- Missing information: list of things that should be added to the article
- Overall verdict: what percentage chance (subjective) do you have of reproducing this after reading the article

How to read the score

18-20: Publishable as a hands-on piece; almost no additional information needed
14-17: Some googling required, but reproducible; okay to publish
10-13: Information outside the article is required to reproduce; revisions recommended
9 or below: Hard to reproduce; rethink the article's premise or position it as something other than a hands-on piece

Pitfalls

The evaluator's background knowledge is too high: if you don't explicitly tell the subagent to play a "beginner role," it will judge "enough information" from an expert's viewpoint. Emphasize "first-time reader" in the prompt
Ignoring link health: links that are alive at publication time can break a year later. Separately check whether reproduction is possible using only live links
Inlining all sample code: reproducibility goes up, but the article bloats. A hybrid approach that combines inline code with a link to the repository is realistic
Reproducibility ≠ prose quality: an article can be highly reproducible yet hard to read. Combine with mizchi-blog-style and similar to measure both axes

empirical-prompt-tuning — meta-skill for subagent dispatch + iterative improvement
mizchi-blog-style — evaluation on the prose-style axis (independent from this skill)

Related skills

More from mizchi/skills

Installs

Repository

mizchi/skills

GitHub Stars

153

First Seen

12 days ago

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

tech-article-reproducibility

Tech Article Reproducibility

When to use

Reproducibility check axes (10 axes)

Evaluation workflow

subagent dispatch template

How to read the score

Pitfalls

Related

More from mizchi/skills

empirical-prompt-tuning

gh-fix-ci

retrospective-codify

playwright-test

ast-grep-practice

apm-usage