error-design
Error Design Review Lens
When invoked with $ARGUMENTS, focus the analysis on the specified file or module. Read the target code first, then apply the checks below.
Each exception a module throws is an interface element. The best way to deal with exceptions is to not have them.
"Exception handling code rarely executes. Bugs can go undetected for a long time, and when the exception handling code is finally needed, there's a good chance that it won't work." — John Ousterhout, A Philosophy of Software Design
The "Too Many Exceptions" Anti-Pattern
Programmers are taught that "the more errors detected, the better," but this leads to an over-defensive style that throws exceptions for anything suspicious. Throwing exceptions is easy. Handling them is hard.
"Classes with lots of exceptions have complex interfaces, and they are shallower than classes with fewer exceptions." — John Ousterhout, A Philosophy of Software Design
When to Apply
- Reviewing error handling code or exception hierarchies
- When a function has many error cases or throws many exception types
- When callers are burdened with handling errors that rarely occur
- When error handling code is longer than the happy path
Core Principles
The Decision Tree
The four techniques below have no canonical ordering. This tree sequences them by preference for practical use.
For every error condition:
1. Can the error be defined out of existence?
Change the interface so the condition isn't an error. If yes: do this. Always the best option.
2. Can the error be masked?
Handle internally without propagating. If yes: mask if handling is safe and complete.
3. Can the error be aggregated?
Replace many specific exceptions with one general mechanism. If yes: aggregate to reduce interface surface.
4. Must the caller handle it?
Propagate only if the caller genuinely must decide. If the caller can't do anything meaningful: crash.
Define Errors Out of Existence
Error conditions follow from how an operation is specified. Change the specification, and the error disappears.
The general move: instead of "do X" (fails if preconditions aren't met), write "ensure state S" (trivially satisfied if state already holds).
- Unset variable? "Delete this variable" (fails if absent) → "ensure this variable no longer exists" (always succeeds)
- File not found on delete? Unix
unlinkdoesn't "delete a file." It removes a directory entry. Returns success even if processes have the file open. - Substring not found? Python slicing clamps out-of-range indices (no exception, no defensive code). Java's
substringthrowsIndexOutOfBoundsException, forcing bounds-clamping around a one-line call.
Defining errors out of existence is like a spice: a small amount improves the result but too much ruins the dish. The technique only works when the exception information is genuinely not needed outside the module. A networking module that masked all network exceptions left callers with no way to detect lost messages or failed peers. Those errors needed to be exposed because callers depended on them to build reliable applications.
Exception Masking
Handle internally without exposing to callers. Valid when:
- The module can recover completely
- Recovery doesn't lose important information
- The masking behavior is part of the module's specification
TCP masks packet loss this way. Before masking, ask whether a developer debugging the system would want to know it happened. If yes, log it. If the loss is irreversible and important, don't mask. Propagate.
Exception Aggregation
Replace many specific exceptions with fewer general ones handled in one place. Masking absorbs errors low and aggregation catches errors high. Together they produce an hourglass where middle layers have no exception handling at all.
Web Server Pattern
Let all NoSuchParameter exceptions propagate to the top-level dispatcher where a single handler generates the error response. New handlers automatically work with the system. The same applies to any request-processing loop: catch in one place near the top, abort the current request, clean up and continue.
Aggregation Through Promotion
Rather than building separate recovery for each failure type, promote smaller failures into a single crash-recovery mechanism. Fewer code paths, more frequently exercised (which surfaces bugs in recovery sooner). Trade-off: promotion increases recovery cost per incident, so it only makes sense when the promoted errors are rare.
Just Crash
When an error is difficult or impossible to handle and occurs infrequently, the simplest response is to print diagnostic information and abort. Out-of-memory errors fit this pattern because there's not much an application can do and the handler itself may need to allocate memory. The same principle applies anywhere: wrap the operation so it aborts on failure, eliminating exception handling at every call site.
Appropriate When
The error is infrequent, recovery is impractical, and the caller can't do anything meaningful.
Not Appropriate When
The system's value depends on handling that failure (e.g., a replicated storage system must handle I/O errors, not crash on them).
Review Process
- Inventory exceptions: List every error case, exception throw, and error return.
- Apply the decision tree: Can each one be defined out? Masked? Aggregated?
- Check depth impact: How many exception types are in the module's interface?
- Audit catch blocks: Are callers doing meaningful work, or just logging and re-throwing?
- Evaluate safety: For any proposed masking, verify nothing important is lost.
- Recommend simplification: Propose specific reductions in error surface.
Red flag signals for error design are cataloged in red-flags (Catch-and-Ignore, Overexposure, Shallow Module).