skills/ghostscientist/skills/implement-paper-from-scratch

implement-paper-from-scratch

SKILL.md

Implement Paper From Scratch

The best way to truly understand a paper is to implement it. This skill guides you through that process methodically.

Philosophy

  • No copy-pasting from reference implementations - We build understanding, not just working code
  • Checkpoint questions verify understanding - You should be able to answer "why" at each step
  • Minimal dependencies - Use NumPy/PyTorch fundamentals, not high-level wrappers
  • Deliberate debugging - Bugs are learning opportunities, not obstacles

Process

Phase 1: Pre-Implementation Analysis

Before writing any code:

  1. Identify the core algorithm - Strip away ablations, extensions, bells and whistles. What's the minimal version?

  2. List the components - Break into modules:

    • Data pipeline
    • Model architecture
    • Loss function(s)
    • Training loop
    • Evaluation metrics
  3. Find the tricky parts - What's non-obvious?

    • Custom layers or operations
    • Numerical stability concerns
    • Hyperparameter sensitivity
    • Implementation details buried in appendices
  4. Gather reference numbers - What should we expect?

    • Training loss trajectory
    • Validation metrics at convergence
    • Compute requirements (if stated)

Phase 2: Scaffolded Implementation

Build up the implementation in this order:

Step 1: Data

# Start with synthetic/toy data
# Verify shapes and types before touching real data

Checkpoint: Can you describe what each tensor represents and its expected shape?

Step 2: Model Architecture

# Build layer by layer
# Print shapes at each stage
# Verify parameter counts match paper

Checkpoint: If you randomly initialize and do a forward pass, do the output shapes match what the paper describes?

Step 3: Loss Function

# Implement exactly as described
# Test with known inputs/outputs
# Check gradient flow

Checkpoint: Can you explain each term in the loss and why it's there?

Step 4: Training Loop

# Minimal loop first (no logging, checkpointing, etc.)
# Verify loss decreases on tiny overfit test
# Then add bells and whistles

Checkpoint: Can you overfit a single batch? If not, something is broken.

Step 5: Evaluation

# Implement paper's exact metrics
# Compare against reported numbers

Checkpoint: On the same data split, how close are you to paper's numbers?

Phase 3: The Debugging Gauntlet

When it doesn't work (and it won't at first):

  1. The Overfit Test

    • Can you memorize 1 example? 10? 100?
    • If not, architecture or gradient bug
  2. The Gradient Check

    • Are gradients flowing to all parameters?
    • Any NaN or exploding gradients?
  3. The Initialization Check

    • Match paper's initialization exactly
    • This matters more than people think
  4. The Learning Rate Sweep

    • Log scale: 1e-5 to 1e-1
    • Loss should decrease for some range
  5. The Ablation Debug

    • Remove components until it works
    • Add back one at a time

Phase 4: Checkpoint Questions

At each stage, you should be able to answer:

Understanding:

  • Why does this component exist?
  • What would happen without it?
  • What alternatives were considered?

Implementation:

  • Why this specific implementation choice?
  • Where could numerical issues arise?
  • What's the computational complexity?

Debugging:

  • What would it look like if this was broken?
  • How would you test this in isolation?
  • What are the most likely bugs?

Output Format

For each implementation session, provide:

## Today's Implementation Goal
[Specific component we're building]

## Prerequisites Check
- [ ] Previous components working
- [ ] Understand what we're building
- [ ] Know expected behavior

## Implementation

### Code
[Code blocks with extensive comments]

### Checkpoint Questions
1. [Question]
   <details><summary>Answer</summary>[Answer]</details>

2. [Question]
   <details><summary>Answer</summary>[Answer]</details>

### Verification Steps
- [ ] Test 1: [What to check]
- [ ] Test 2: [What to check]

### Common Bugs at This Stage
1. [Bug pattern]: [How to identify and fix]

## What's Next
[Preview of next component and how it connects]

Tips for Specific Paper Types

Transformer-based

  • Attention mask shapes are the #1 bug source
  • Verify positional encoding is applied correctly
  • Check layer norm placement (pre vs post)

RL/Policy Gradient

  • Sign errors in policy gradient are silent killers
  • Advantage normalization matters
  • Verify discount factor handling

Generative Models

  • KL term balancing is finicky
  • Check latent space distribution
  • Verify reconstruction looks reasonable before training

Computer Vision

  • Normalization (ImageNet stats, batch norm) is crucial
  • Data augmentation can make or break results
  • Verify input preprocessing matches paper exactly

Success Criteria

You're done when:

  1. Numbers match - Within reasonable variance of paper's results
  2. Understanding is deep - You can explain every line of code
  3. You found the gotchas - You know what breaks and why
  4. You could modify it - Confident to try your own variations

Anti-Patterns to Avoid

  • ❌ Copying code you don't understand
  • ❌ Skipping checkpoint questions
  • ❌ Using pre-built components for core algorithm
  • ❌ Ignoring discrepancies with paper
  • ❌ Moving on before current step works
Weekly Installs
4
GitHub Stars
2
First Seen
Jan 28, 2026
Installed on
opencode4
gemini-cli3
claude-code3
github-copilot3
codex3
kimi-cli3