implement-paper-from-scratch
Implement Paper From Scratch
The best way to truly understand a paper is to implement it. This skill guides you through that process methodically.
Philosophy
- No copy-pasting from reference implementations - We build understanding, not just working code
- Checkpoint questions verify understanding - You should be able to answer "why" at each step
- Minimal dependencies - Use NumPy/PyTorch fundamentals, not high-level wrappers
- Deliberate debugging - Bugs are learning opportunities, not obstacles
Process
Phase 1: Pre-Implementation Analysis
Before writing any code:
-
Identify the core algorithm - Strip away ablations, extensions, bells and whistles. What's the minimal version?
-
List the components - Break into modules:
- Data pipeline
- Model architecture
- Loss function(s)
- Training loop
- Evaluation metrics
-
Find the tricky parts - What's non-obvious?
- Custom layers or operations
- Numerical stability concerns
- Hyperparameter sensitivity
- Implementation details buried in appendices
-
Gather reference numbers - What should we expect?
- Training loss trajectory
- Validation metrics at convergence
- Compute requirements (if stated)
Phase 2: Scaffolded Implementation
Build up the implementation in this order:
Step 1: Data
# Start with synthetic/toy data
# Verify shapes and types before touching real data
Checkpoint: Can you describe what each tensor represents and its expected shape?
Step 2: Model Architecture
# Build layer by layer
# Print shapes at each stage
# Verify parameter counts match paper
Checkpoint: If you randomly initialize and do a forward pass, do the output shapes match what the paper describes?
Step 3: Loss Function
# Implement exactly as described
# Test with known inputs/outputs
# Check gradient flow
Checkpoint: Can you explain each term in the loss and why it's there?
Step 4: Training Loop
# Minimal loop first (no logging, checkpointing, etc.)
# Verify loss decreases on tiny overfit test
# Then add bells and whistles
Checkpoint: Can you overfit a single batch? If not, something is broken.
Step 5: Evaluation
# Implement paper's exact metrics
# Compare against reported numbers
Checkpoint: On the same data split, how close are you to paper's numbers?
Phase 3: The Debugging Gauntlet
When it doesn't work (and it won't at first):
-
The Overfit Test
- Can you memorize 1 example? 10? 100?
- If not, architecture or gradient bug
-
The Gradient Check
- Are gradients flowing to all parameters?
- Any NaN or exploding gradients?
-
The Initialization Check
- Match paper's initialization exactly
- This matters more than people think
-
The Learning Rate Sweep
- Log scale: 1e-5 to 1e-1
- Loss should decrease for some range
-
The Ablation Debug
- Remove components until it works
- Add back one at a time
Phase 4: Checkpoint Questions
At each stage, you should be able to answer:
Understanding:
- Why does this component exist?
- What would happen without it?
- What alternatives were considered?
Implementation:
- Why this specific implementation choice?
- Where could numerical issues arise?
- What's the computational complexity?
Debugging:
- What would it look like if this was broken?
- How would you test this in isolation?
- What are the most likely bugs?
Output Format
For each implementation session, provide:
## Today's Implementation Goal
[Specific component we're building]
## Prerequisites Check
- [ ] Previous components working
- [ ] Understand what we're building
- [ ] Know expected behavior
## Implementation
### Code
[Code blocks with extensive comments]
### Checkpoint Questions
1. [Question]
<details><summary>Answer</summary>[Answer]</details>
2. [Question]
<details><summary>Answer</summary>[Answer]</details>
### Verification Steps
- [ ] Test 1: [What to check]
- [ ] Test 2: [What to check]
### Common Bugs at This Stage
1. [Bug pattern]: [How to identify and fix]
## What's Next
[Preview of next component and how it connects]
Tips for Specific Paper Types
Transformer-based
- Attention mask shapes are the #1 bug source
- Verify positional encoding is applied correctly
- Check layer norm placement (pre vs post)
RL/Policy Gradient
- Sign errors in policy gradient are silent killers
- Advantage normalization matters
- Verify discount factor handling
Generative Models
- KL term balancing is finicky
- Check latent space distribution
- Verify reconstruction looks reasonable before training
Computer Vision
- Normalization (ImageNet stats, batch norm) is crucial
- Data augmentation can make or break results
- Verify input preprocessing matches paper exactly
Success Criteria
You're done when:
- Numbers match - Within reasonable variance of paper's results
- Understanding is deep - You can explain every line of code
- You found the gotchas - You know what breaks and why
- You could modify it - Confident to try your own variations
Anti-Patterns to Avoid
- ❌ Copying code you don't understand
- ❌ Skipping checkpoint questions
- ❌ Using pre-built components for core algorithm
- ❌ Ignoring discrepancies with paper
- ❌ Moving on before current step works
More from ghostscientist/skills
paper-to-intuition
Transforms an academic paper into deep, multi-layered understanding. Use when asked to explain a paper, break down a research paper, understand an arXiv paper, or build intuition for a technical concept from a paper. Generates explanations at multiple levels plus visual intuition diagrams.
48ios-app-icon-generator
Generates a complete iOS app icon set with all required sizes. Use when asked to create an app icon, design an iOS icon, generate app store artwork, or make an icon for an iPhone/iPad app. Follows a philosophy-first approach - first defining the visual identity and concept, then producing production-ready icons.
17research-question-refiner
Helps transform a vague research interest into a concrete, tractable research question. Use when asked to refine a research idea, develop a research question, scope a research project, or figure out what to work on. Walks through systematic refinement with feasibility analysis.
14create-watchos-version
Analyzes existing iOS/macOS/Apple platform projects to create a comprehensive, phased plan for building a watchOS companion or standalone app. Use when users want to add watchOS support to an existing Apple platform app, create a Watch app version of their iOS app, or build watchOS features. The skill digests project architecture, identifies patterns, analyzes API compatibility, searches for current watchOS documentation, and produces a detailed implementation plan with API availability warnings before any code generation.
9turn-this-feature-into-a-blog-post
Generates a technical blog post from code implementation. Use when asked to write a blog post about a feature, explain an implementation for a blog, document code as a blog article, or create technical content from source code. Triggers on phrases like "write a blog post about", "turn this into a blog", "create a technical article", or "explain this for a blog".
5experiment-design-checklist
Generates a rigorous experiment design given a hypothesis. Use when asked to design experiments, plan experiments, create an experimental setup, or figure out how to test a research hypothesis. Covers controls, baselines, ablations, metrics, statistical tests, and compute estimates.
5