explore-run
explore-run
When to apply
- When the researcher explicitly authorizes exploratory runs.
- When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
- When the output should rank candidate runs rather than certify trusted success.
When not to apply
- When the user wants trusted training execution or conservative verification.
- When there is no explicit exploratory authorization.
- When the task is repository setup, intake, or debugging.
Clear boundaries
- This skill owns exploratory execution planning and summary only.
- Use
ai-research-exploreinstead when the task spans both current_research coordination and exploratory code changes. - It may hand off actual command execution to
minimal-run-and-auditorrun-train. - It should keep experiment state isolated from the trusted baseline.
- It should prefer small-subset and short-cycle checks before heavier exploratory runs.
Ranking Semantics
- Pre-execution candidate selection uses three factors:
cost,success_rate, andexpected_gain. - Default weights should stay conservative unless the researcher explicitly provides
selection_weights. - Budget pruning still applies after scoring through
max_variantsandmax_short_cycle_runs. - If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.
Variant Spec Hints
- Use
variant_axesto define the candidate dimension grid. - Use
subset_sizesandshort_run_stepsto express exploratory run scale. - Use
selection_weightsto rebalancecost,success_rate, andexpected_gain. - Use
primary_metricandmetric_goalso downstream ranking can order executed candidates consistently.
Output expectations
explore_outputs/CHANGESET.mdexplore_outputs/TOP_RUNS.mdexplore_outputs/status.json
Notes
Use references/execution-policy.md, ../../references/explore-variant-spec.md, scripts/plan_variants.py, and scripts/write_outputs.py.
More from lllllllama/ai-paper-reproduction-skills
paper-context-resolver
Optional narrow helper skill for README-first AI repo reproduction. Use only when the README and repository files leave a narrow reproduction-critical gap and the task is to resolve a specific paper detail such as dataset split, preprocessing, evaluation protocol, checkpoint mapping, or runtime assumption from primary paper sources while recording conflicts. Do not use for general paper summary, repo scanning, environment setup, command execution, title-only paper lookup, or replacing README guidance by default.
21analyze-project
Trusted-lane analysis skill for deep learning research repositories. Use when the user wants to read and understand a repository, inspect model structure and training or inference entrypoints, review configs and insertion points, or flag suspicious implementation patterns without modifying code or running heavy jobs. Do not use for active command execution, broad refactoring, speculative code adaptation, or automatic bug fixing.
20ai-research-reproduction
Main orchestrator for README-first AI repo reproduction. Use when the user wants an end-to-end, minimal-trustworthy reproduction flow that reads the repository first, selects the smallest documented inference or evaluation target, coordinates intake, setup, trusted execution, optional trusted training, optional repository analysis, and optional paper-gap resolution, enforces conservative patch rules, records evidence assumptions deviations and human decision points, and writes the standardized `repro_outputs/` bundle. Do not use for paper summary, generic environment setup, isolated repo scanning, standalone command execution, silent protocol changes, or broad research assistance outside repository-grounded reproduction.
20explore-code
Explore-lane code adaptation skill for deep learning research repositories. Use when the researcher explicitly authorizes exploratory work on an isolated branch or worktree to transplant modules, adapt a backbone, add LoRA or adapter layers, replace a head, or stitch together low-risk migration ideas with summary-only records in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline reproduction, conservative debugging, environment setup, or default repository analysis.
19safe-debug
Trusted-lane debug skill for deep learning research work. Use when the user pastes a traceback, terminal error, CUDA OOM, checkpoint load failure, shape mismatch, NaN loss symptom, or training failure and wants conservative diagnosis before any patching. Do not use for broad refactoring, speculative adaptation, automatic exploratory patching, or general repository familiarization.
19run-train
Trusted-lane training execution skill for deep learning research repositories. Use when a documented or selected training command should be run conservatively for startup verification, short-run verification, full kickoff, or resume, with status, checkpoint, and metric capture written to standardized `train_outputs/`. Do not use for environment setup, exploratory sweeps, speculative idea implementation, or end-to-end orchestration.
19