minimal-run-and-audit
minimal-run-and-audit
When to apply
- After a reproduction target and setup plan exist.
- When the main skill needs execution evidence and normalized outputs.
- When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
- When the user already knows what command should be attempted and wants execution plus reporting only.
When not to apply
- During initial repo scanning.
- When environment or assets are still undefined enough to make execution meaningless.
- When the task is a literature lookup rather than repository execution.
- When the user is still deciding which reproduction target should count as the main run.
Clear boundaries
- This skill owns normalized reporting for an attempted command.
- It may receive execution evidence from the main skill or a thin helper.
- It does not choose the overall target on its own.
- It does not perform broad paper analysis.
- It does not own training startup, resume, or long-running training state.
- It should not normalize risky code edits into acceptable practice.
Input expectations
- selected reproduction goal
- runnable commands or smoke commands
- environment and asset assumptions
- optional patch metadata
Output expectations
- execution result summary
- standardized
repro_outputs/files - clear distinction between verified, partial, and blocked states
PATCHES.mdwhen repo files changed
Notes
Use references/reporting-policy.md, scripts/run_command.py, and scripts/write_outputs.py.
More from lllllllama/ai-paper-reproduction-skills
paper-context-resolver
Optional narrow helper skill for README-first AI repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacing README guidance by default.
21analyze-project
Trusted-lane analysis skill for deep learning research repositories. Use when the user wants to read and understand a repository, inspect model structure and training or inference entrypoints, review configs and insertion points, or flag suspicious implementation patterns without modifying code or running heavy jobs. Do not use for active command execution, broad refactoring, speculative code adaptation, or automatic bug fixing.
20ai-research-reproduction
Main orchestrator for README-first AI repo reproduction. Use when the user wants an end-to-end, minimal-trustworthy reproduction flow that reads the repository first, selects the smallest documented inference or evaluation target, coordinates intake, setup, trusted execution, optional trusted training, optional repository analysis, and optional paper-gap resolution, enforces conservative patch rules, records evidence assumptions deviations and human decision points, and writes the standardized `repro_outputs/` bundle. Do not use for paper summary, generic environment setup, isolated repo scanning, standalone command execution, silent protocol changes, or broad research assistance outside repository-grounded reproduction.
20explore-code
Explore-lane code adaptation skill for deep learning research repositories. Use when the researcher explicitly authorizes exploratory work on an isolated branch or worktree to transplant modules, adapt a backbone, add LoRA or adapter layers, replace a head, or stitch together low-risk migration ideas with summary-only records in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline reproduction, conservative debugging, environment setup, or default repository analysis.
19safe-debug
Trusted-lane debug skill for deep learning research work. Use when the user pastes a traceback, terminal error, CUDA OOM, checkpoint load failure, shape mismatch, NaN loss symptom, or training failure and wants conservative diagnosis before any patching. Do not use for broad refactoring, speculative adaptation, automatic exploratory patching, or general repository familiarization.
19explore-run
Explore-lane experimental execution skill for deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with results summarized in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline execution, conservative training verification, default routing, or implicit experimentation.
19