hammertime
HammerTime — Behavioral Rule System
Rules live at ~/.claude/hammertime/rules.json. The hook registers as a Stop hook and fires on every assistant turn.
Two rule types:
- Content rules — scored detection against response text (three-layer scoring + optional Haiku verification)
- Timer rules — time-based blocking until a deadline passes (no scoring, no Haiku)
When to Create a Rule
Recognize rule intent even when the user doesn't say "hammertime":
Imperative patterns:
- "Always fix lint errors before stopping"
- "Never say 'everything looks good' when there are warnings"
- "Stop summarizing what you did at the end"
- "From now on, run tests before committing"
- "I want you to use the visual planner for architecture decisions"
- "Don't dismiss issues as pre-existing"
Implicit patterns (user describes frustration or expectation):
- "You keep forgetting to run the linter" → rule to always run linter
- "I've told you three times to check types" → rule to verify types
- "Why do you always skip the tests?" → rule to never skip tests
Rule vs one-time request: "Fix this lint error" is a task. "Always fix lint errors" is a rule. Future behavioral expectations → rule.
When NOT to create a rule:
- One-time instructions for the current task
- Preferences already handled by CLAUDE.md or settings
- Rules that would fire on nearly every response (too broad)
Timer Rules
Timer rules block all stop attempts until a deadline. No keywords, no scoring — purely time-based. Use them for deep focus sessions where you want the agent to keep iterating.
/hammertime 30m deep focus on this refactoring
/hammertime 1h thorough security review of the codebase
/hammertime 45m finish this feature completely
Timer rule schema:
{
"name": "timer-1742054400",
"rule": "Deep focus on this refactoring task.",
"enabled": true,
"deadline": "2026-03-15T14:30:00",
"keywords": [],
"max_iterations": 0
}
Key behaviors:
- Timer rules bypass the
stop_hook_activeguard (safe because deadline provides termination) max_iterations: 0means unlimited blocks until deadline (the deadline IS the guard)- When deadline passes, the rule is auto-deleted from
rules.json - Block message includes remaining time and motivational prompt
- Timer rules are evaluated before content rules — a timer block prevents content rule evaluation
Content Rule Schema
Content rules are JSON objects in an array. Required and optional fields:
{
"name": "kebab-case-name",
"rule": "Natural language description of what the agent should or shouldn't do",
"enabled": true,
"keywords": ["trigger", "words", "4-8", "terms"],
"intent_patterns": ["regex\\s+patterns?\\s+for\\s+structural\\s+matching"],
"dismissal_verbs": "\\b(?:skip|ignore|dismiss)\\b",
"qualifiers": "\\b(?:pre-?existing|scope|unrelated)\\b",
"confidence_threshold": 5,
"skill": null,
"evaluate_full_turn": true,
"max_iterations": 3
}
Required fields:
name— Unique kebab-case identifierrule— Natural language rule text (used in Haiku verification and the block message)enabled— Boolean togglekeywords— Array of 4-8 trigger strings for Layer 1 scoring
Optional fields (content rules):
intent_patterns— Array of regex strings for Layer 2 structural matching (compiled at load time)dismissal_verbs— Single regex string matching refusal/avoidance verbs for Layer 3qualifiers— Single regex string matching attribution/deflection terms for Layer 3confidence_threshold— Score at which to block without Haiku (default: 5)skill— Fully-qualified skill ID to invoke when rule fires (e.g.,gemskills:visual-planner)evaluate_full_turn— Boolean. Whentrue, scores ALL assistant messages since the user's last message (reads session transcript). Whenfalseor omitted, scores only the final assistant message. Default:false.max_iterations— Maximum times this rule can block per session before auto-allowing exit (default: 3, set 0 for unlimited). Prevents infinite loops when a rule is too broad or can't be satisfied.check_git_state— Boolean. Whentrue, the hook runsgit status --porcelain,git log @{u}..HEAD, andgit ls-files --others --exclude-standardbefore blocking. If the working tree is clean and all commits are pushed, the rule is skipped entirely (both the direct-block and Haiku phase paths). Useful for rules about pushing or committing work, so they don't fire when there is genuinely nothing to push. Default:false.
Timer-specific field:
deadline— ISO 8601 datetime string (e.g.,"2026-03-15T14:30:00"). When present, the rule becomes a timer rule — it blocks all stop attempts until the deadline passes, bypassing content scoring entirely. Auto-deleted fromrules.jsonwhen expired.
Three-Layer Scoring
Every assistant response is scored against all enabled rules:
| Layer | What it matches | Score | Purpose |
|---|---|---|---|
| 1 — Keywords | Case-insensitive substring match | +1 each | Broad signal detection |
| 2 — Intent Patterns | Regex patterns on full text | +2 each | Structural paraphrase catching |
| 3 — Co-occurrence | Dismissal verb + qualifier in same sentence | +3 | Highest-confidence signal |
Score thresholds:
- 0 — No match. Exit immediately.
- 1–4 — Ambiguous. Haiku verification decides (costs ~500ms, ~$0.001).
- 5+ — Obvious violation. Block directly, skip Haiku.
The confidence_threshold field controls the direct-block cutoff (default: 5). Lower it for stricter enforcement; raise it to rely more on Haiku verification.
Creating a Rule — Workflow
Step 1: Derive the name
Use a short kebab-case name: fix-lint-errors, no-trailing-summaries, use-visual-planner.
Step 2: Write the rule text
The rule field is natural language that describes the expected behavior. It's shown to Haiku for verification and included in the block message. Be specific and actionable.
Good: "Fix all lint errors before stopping. Do not report them without fixing." Bad: "Be better at linting."
Step 3: Extract keywords (4-8)
Pick words/phrases that would appear in a violating response. Focus on terms the model would use when breaking the rule, not terms describing the rule itself.
For "fix lint errors": ["lint warning", "lint error", "linting issue", "eslint", "biome", "I'll leave", "you can fix", "not critical"]
For "run tests before committing": ["skipping tests", "without testing", "skip the tests", "no tests needed", "tests aren't necessary", "commit without running"]
Guideline: 4-8 keywords. Fewer = more false negatives. More = more false positives. Intent patterns are better for catching paraphrases than adding more keywords.
Important: Keywords match as substrings, so "lint error" matches "there are 3 lint errors remaining". Multi-word keywords are strongly preferred over single words to reduce false positive rates.
Step 4: Write intent patterns (2-4 regex)
Regex patterns that catch structural dismissal even when exact keywords aren't used. These match on the full response text with re.IGNORECASE.
# Catches: "these lint issues aren't critical", "the warnings don't need fixing"
r"(?:lint|warning|error)s?\s+(?:aren't|are\s+not|don't|do\s+not)\s+(?:critical|important|blocking)"
# Catches: "I'll leave these for you to fix", "you may want to address these"
r"(?:leave|defer|skip)\s+(?:these|them|this)\s+(?:for|to)\s+(?:you|the\s+user)"
Use \s+ for flexible whitespace, (?:...|...) for alternatives, .*? for gaps. See references/rule-design.md for comprehensive regex guidance.
Step 5: Add co-occurrence terms (optional)
For Layer 3, define dismissal_verbs and qualifiers as regex strings. These fire only when both appear in the same sentence — the highest-confidence signal.
"dismissal_verbs": "\\b(?:skip|ignore|leave|defer|won't\\s+fix)\\b",
"qualifiers": "\\b(?:non-?critical|cosmetic|minor|style|optional)\\b"
Step 6: Set confidence threshold
Default is 5 (conservative). For rules where false positives are costly, raise to 7. For rules where catching violations matters more, lower to 3.
Step 7: Resolve skill names (if applicable)
If the rule should invoke a skill when it fires, resolve the informal name to a fully-qualified ID:
- Use
Skill(find-skills)to search: "visual planner" →gemskills:visual-planner - Use
Skill(simplify)→bopen-tools:simplify(prefix with plugin name)
Set the skill field to the resolved ID, or null if no skill is needed.
Step 8: Write the rules file
Read the existing ~/.claude/hammertime/rules.json (or start with [] if it doesn't exist). Append the new rule and write the file. Create the ~/.claude/hammertime/ directory if needed.
mkdir -p ~/.claude/hammertime
Step 9: Remind about restart
Rules are loaded at hook registration time. Tell the user: "Restart Claude Code for the new rule to take effect."
Skill Invocation in Rules
The skill field triggers automatic skill invocation when a rule fires. The block message appends Invoke Skill(<id>) to address this.
| Rule detects | Skill invoked | Effect |
|---|---|---|
| Model skips tests | superpowers:test-driven-development |
Redirects to TDD workflow |
| Model ignores lint | bopen-tools:simplify |
Runs code simplification |
| Model skips architecture planning | gemskills:visual-planner |
Forces visual planning step |
| Model writes insecure code | bopen-tools:code-audit-scripts |
Runs security audit |
Resolve informal skill names with Skill(find-skills) before setting the skill field.
Mode Inference
The hook infers whether to auto-fix or ask the user based on the rule text:
- Fix mode: Rule contains "fix all", "always fix", "fix any", "fix every" → block message says "Fix these issues NOW"
- Ask mode: Everything else → block message says "Ask the user whether to fix"
Write rule text accordingly to control the behavior.
Full-Turn vs Last-Message Evaluation
By default, HammerTime scores only the final assistant message. Set "evaluate_full_turn": true to score ALL assistant text since the user's last message — necessary for violations that occur in intermediate messages (dismissing an error in message 3, then finishing with "Done." in message 5).
Use full-turn for rules about dismissing work, skipping process steps, or any violation that happens during execution rather than in the final summary. Last-message scoring is sufficient for format rules and output style rules.
Debug Logging
Set the environment variable to enable score breakdowns:
export HAMMERTIME_DEBUG=~/.claude/hammertime/debug.log
[ 2ms] SCORE: rule 'fix-lint-errors' score=4 (kw=2, intent=1, cluster=0)
[ 3ms] PHASE2: score 4 < 5, verifying with Haiku
[ 487ms] PHASE2: rule 'fix-lint-errors' violated=true
Key Principles
- Intent patterns > more keywords. Two good patterns catch more than ten mediocre keywords. Patterns match structure; keywords match vocabulary.
- Co-occurrence is highest signal. A dismissal verb + qualifier in the same sentence is almost always a violation. Use Layer 3 for your most confident detection.
- Don't over-keyword. Too many keywords cause false positives on innocent responses. If you find yourself adding 10+ keywords, the rule is probably too broad — split it or use patterns instead.
- Default threshold of 5 is conservative. Most rules work well at 5. Only adjust if you see specific problems in the debug log.
- Test with real examples. Before writing patterns, look at actual model responses that violate the rule. Write keywords and patterns that match those real responses, not hypothetical ones.
- Rules stack. Multiple rules can match the same response. The first confirmed violation triggers a block.
- Minimal viable rule. Not every rule needs all three layers. A simple rule with good keywords and 2 intent patterns is often sufficient. Add co-occurrence only when you need high-confidence same-sentence detection.
- Rule text matters for Haiku. When a score falls in the 1-4 range, Haiku reads the
ruletext to decide. A clear, specific rule text helps Haiku make better decisions. Include the "why" in the rule text, not just the "what".
Built-in Rules
The hook ships with one hardcoded rule:
- project-owner — "Fix all errors instead of dismissing them as pre-existing. The assistant has no session history and cannot know what is pre-existing."
This rule can be overridden by adding a user rule with "name": "project-owner" to the JSON file.
Corpus-Driven Rule Creation
Always prefer keywords and patterns derived from real session logs over synthetic guesses. Real dismissal language ("this appears to be pre-existing") rarely matches what a human would guess ("pre-existing issue").
The workflow
- Extract 2-3 key terms from the user's behavior description.
- Call the remind search script with those terms and
--no-recencyto search the full corpus:python3 "${CLAUDE_PLUGIN_ROOT}/skills/remind/scripts/search.py" "TERMS" --limit 10 --no-recency - Scan returned assistant messages. Classify each as:
- True positive — message demonstrating the bad behavior. Extract the exact phrases used.
- True negative — message using similar vocabulary but NOT violating. These prevent keyword overfitting.
- Derive keywords and intent_patterns from the true positive phrases. Use true negative phrases to rule out overly broad matches.
- Present extracted test cases to the user as a table: message excerpt, should_trigger, category, source note.
Fallback
If the remind script is absent, the Scribe DB is unavailable, or the search returns zero results, fall back to generating keywords and patterns from the description alone (the pre-corpus workflow). Note the fallback in the output so the user knows the rule is synthetic and may need tuning.
Reference
references/rule-design.md— Deep guide on designing each scoring layer with annotated examplesexamples/rules.json— Complete example rules showing full v2 schema