karpathy
Karpathy Guidelines
A preventive thinking discipline for code implementation. Activates before and during code writing to block the most common mistakes LLMs make when generating code.
This is not about performance (that's rob-pike) or debugging (that's systematic-debugging). This is about the act of writing code itself — reading before writing, changing only what's asked, verifying instead of assuming, and defining what "done" means before starting.
Hard Gates
These rules have no exceptions.
- Read before you write. Do not modify a file you haven't read. Do not modify a function without understanding the callers. Do not modify a module without understanding its role.
- Scope to the request. Change what was asked. Nothing more. No "while I'm here" improvements, no opportunistic refactoring, no adding features that weren't requested.
- Verify, don't assume. If you think a function does X, read it. If you think a type has field Y, check it. If you think a test covers scenario Z, find it. Assumptions are the primary source of LLM coding errors.
- Define success before starting. Before writing any code, state what "done" looks like in concrete, verifiable terms. If you can't define it, you don't understand the task yet.
When To Use
- Before implementing any feature or change
- When modifying existing code
- During code review (as a mental checklist)
- When you catch yourself generating code without having read the surrounding context
When NOT To Use
- Greenfield projects with no existing code to read (gates 1 and 3 still apply to dependencies)
- Pure documentation changes
- Performance optimization (use
rob-pikeinstead)
The Five Rules
Rule 1: Make Surgical Changes
Every change should be the minimum edit that achieves the goal.
Before writing, ask:
- What is the smallest change that solves this?
- Am I touching files that don't need to change?
- Am I adding code that wasn't requested?
Prohibited additions unless explicitly requested:
- Type annotations on code you didn't change
- Docstrings on functions you didn't change
- Comments on logic you didn't change
- Error handling for scenarios that aren't part of the task
- Refactoring of surrounding code
- "Improvements" you noticed along the way
One task, one change. If you discover something else that needs fixing, note it — don't fix it now.
Rule 2: Read The Existing Code
LLMs generate code based on patterns. Codebases have their own patterns. These often conflict.
Before modifying any file:
- Read the file
- Identify the conventions it uses (naming, error handling, patterns, structure)
- Match those conventions exactly in your changes
Before modifying any function:
- Find all callers
- Understand the contract (what goes in, what comes out, what side effects)
- Ensure your change doesn't break the contract
Before adding a new file:
- Check if similar functionality exists elsewhere
- Follow the project's file organization pattern
- Use the same naming conventions as neighboring files
Do not invent new patterns. Follow the ones that exist.
Rule 3: Verify Assumptions
Every assumption is a potential bug. The most dangerous assumptions are the ones that feel obvious.
Common assumptions that cause failures:
| Assumption | Verification |
|---|---|
| "This function returns X" | Read the function |
| "This field is always present" | Check the type definition and upstream producers |
| "This test covers that case" | Read the test |
| "This import path is correct" | Check the file exists at that path |
| "This API accepts these parameters" | Read the API definition or documentation |
| "This library works this way" | Check the version and docs |
| "This config value is set" | Check the actual config |
When in doubt, grep. When confident, grep anyway.
Rule 4: Define Success Criteria
Before writing code, state what "done" means.
Format:
Done when:
- [ ] <specific, verifiable condition>
- [ ] <specific, verifiable condition>
- [ ] <specific, verifiable condition>
Bad criteria:
- "The feature works" (not verifiable)
- "Code is clean" (subjective)
- "Tests pass" (which tests? what do they verify?)
Good criteria:
- "POST /api/users returns 201 with valid payload and 400 with missing email"
- "Existing tests in user.test.ts still pass"
- "New test covers the null-brand edge case from issue #42"
If you can't write specific criteria, you don't understand the task. Go back and clarify.
Rule 5: Don't Solve Problems That Don't Exist
LLMs love to anticipate future needs. This produces code that is more complex than necessary.
Block these impulses:
- "What if someone calls this with null?" — Is that possible in the current code? If not, don't guard against it.
- "This should be configurable" — Is configuration needed now? If not, hardcode it.
- "We might need to support multiple backends" — Do we have multiple backends? If not, don't abstract.
- "This could be a generic utility" — Is it used in more than one place? If not, keep it specific.
- "Let me add a feature flag" — Was a feature flag requested? If not, just change the code.
Build for what is needed today. Tomorrow's problems will have tomorrow's context.
Anti-Patterns
| Impulse | Rule Violated | Response |
|---|---|---|
| "Let me quickly refactor this while I'm here" | Rule 1 | One task, one change. Note it for later. |
| "I know how this works, I'll just write the fix" | Rule 2 | Read first. Your mental model may be wrong. |
| "This probably takes a string" | Rule 3 | Check the type. "Probably" means you don't know. |
| "I'll know it's done when it works" | Rule 4 | Define concrete criteria before starting. |
| "Let me make this extensible for future use" | Rule 5 | Build for now. Extensibility is a future task. |
| "The code around this is messy, let me clean it" | Rule 1 | Not your task. File a separate issue. |
| "I'll add some helpful logging" | Rule 1 | Was logging requested? If not, don't add it. |
Red Flags
Stop and re-read the rules if you catch yourself thinking:
- "This is obvious, I don't need to read the code"
- "I'll just add a few extra things while I'm at it"
- "This should probably handle edge case X" (without checking if X can occur)
- "Let me improve the type safety here too"
- "I know what this function does"
- "This needs better error handling" (without evidence of errors occurring)
- "The naming is inconsistent, let me fix it across the file"
Minimal Checklist
During implementation, verify against this list:
- I read the files I'm modifying before changing them
- My changes are scoped to what was requested
- I verified my assumptions about types, APIs, and behavior
- I defined concrete success criteria before starting
- I'm not solving hypothetical future problems
- I'm following existing project conventions, not inventing new ones
- Every new line of code is necessary for the task
Completion Standard
Implementation is disciplined when:
- All changes are within the requested scope
- No assumptions were made without verification
- Success criteria were defined and met
- No speculative code was added
- Existing conventions were followed
If any of these are not met, the implementation needs revision.
Transition
After implementation is complete:
- If AI-generated code smells remain → use
clean-ai-slopto run a corrective pass - If a bug is discovered → use
systematic-debuggingto investigate - If performance is a concern → use
rob-pikebefore optimizing
More from tmdgusya/engineering-discipline
clarification
Use when a user's request is vague, ambiguous, or underspecified. Launches an iterative Q&A loop to resolve ambiguity while a subagent explores the codebase in parallel. Outputs a clear, well-scoped context brief so the user can plan sharply. Triggers on "I want to...", "I need...", "let's build...", "can you help me...", "we should...", or any request where the full scope isn't immediately clear.
35run-plan
Use when you have a written implementation plan to execute. Loads the plan, reviews critically, executes tasks in dependency order, and reports completion. Triggers when the user says "run the plan", "execute the plan", or "let's start implementing".
34rob-pike
Rob Pike's 5 Rules of Programming — a decision framework that prevents premature optimization and enforces measurement-driven development. Use when the user says "optimize", "slow", "performance", "bottleneck", "speed up", "make faster", "too slow", or any request to improve code speed/efficiency. Also use when you notice yourself about to suggest a performance optimization without measurement data. This is a thinking discipline, not a tooling workflow.
33systematic-debugging
Use when encountering any bug, test failure, or unexpected behavior. Enforces a strict reproduce-first, root-cause-first, failing-test-first debugging workflow before fixing.
32plan-crafting
Use when a task's scope is clear and multi-step implementation is needed, before touching code. Triggered after clarification is complete, or when the user explicitly requests plan creation with a clear prompt.
31long-run
Orchestrates multi-day execution of complex tasks through milestones. Each milestone goes through plan-crafting, run-plan (worker-validator), and review-work phases with checkpoint/recovery. Triggers when the user says "long run", "start long run", "execute milestones", or "run all milestones".
29