scoring
NLPM Quality Scoring Rubric
100-point quality scale for all NL programming artifacts. Apply penalties deterministically. Use calibration examples to anchor judgment on borderline cases.
Scoring Formula
base_score = 100
adjustments = sum of all applicable penalties (all penalties are negative)
final_score = max(0, min(100, base_score + adjustments))
Penalties stack. The floor is 0; the ceiling is 100. No bonuses — the default assumption is that an artifact is well-formed, and quality is measured by what is missing or wrong.
Penalty Tables
Skills
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| -- | name present |
Missing | -25 |
| R04 | description present |
Missing | -25 |
| R04 | Trigger quality | Description is generic (≤1 specific phrase) | -15 |
| R05 | Body length | 400–500 lines | -5 |
| R05 | Body length | >500 lines | -10 |
| R06 | Code examples | Complex concepts with no examples | -5 |
| R06 | Code examples | No examples at all in a technical skill | -10 |
| R07 | Scope note | No scope note / cross-references | -3 |
Agents
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| R09 | description present |
Missing | -25 |
| R09 | <example> blocks |
Exactly 1 example | -5 |
| R09 | <example> blocks |
Zero examples | -15 |
| R10 | model declared |
Not declared | -5 |
| R10 | model appropriate |
Wrong tier for task (e.g. opus for parsing) | -5 |
| R11 | tools declared |
Not declared | -5 |
| R11 | Unused tools | Each tool declared but not used in body | -3 each |
| R12 | Output format | No output format spec in body | -10 |
| R11 | Write on read-only | Audit/review/scan agent declares Write or Edit | -10 |
Commands
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| -- | description present |
Missing | -25 |
| R18 | argument-hint present |
Command takes input but no hint | -5 |
| R14 | Steps numbered | Multi-step body with no numbered steps | -10 |
| R15 | Empty input handling | No handling for empty/missing input | -10 |
| R16 | Output format | No output format defined | -10 |
| R17 | Error paths | No error handling for missing files or bad data | -5 |
Shared Partials
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| R19 | user-invocable: false |
Missing or set to true | -25 |
| R20 | Purpose clear | Description doesn't state it's a partial | -10 |
Rules
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| R21 | description present |
Missing frontmatter description | -10 |
| R21 | Format: bold imperative | No bold imperative opening | -5 |
| R21 | Format: rationale | No rationale following the imperative | -10 |
| R22 | Enforceability | Rule is not specific/testable | -10 |
| R23 | Budget | Rule file over 500 lines | -15 |
| R26 | Conflicts with other rules | Direct contradiction with another rule in same set | -20 |
| R24 | Duplicates tooling | Re-states what eslint/ruff/clippy already catches | -10 |
Hooks
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| -- | Valid JSON | hooks.json fails JSON parse | -25 |
| R27 | Event names valid | Uses unrecognized event name | -15 |
| R27 | Case correct | Event name has wrong case (e.g. pretooluse) |
-10 |
| R29 | Scripts exist | Referenced script file does not exist | -20 |
| -- | Command safety | Hook command contains dangerous patterns (rm -rf, git push --force, DROP TABLE) | -15 |
| -- | Matcher regex valid | Matcher pattern doesn't compile as valid regex | -10 |
| -- | Timeout reasonable | Hook specifies timeout > 30s (likely hangs) | -5 |
plugin.json
| Check | Condition | Penalty |
|---|---|---|
name present |
Missing | -25 |
version is semver |
Present but not valid semver | -10 |
description present |
Missing | -5 |
.mcp.json
| Check | Condition | Penalty |
|---|---|---|
| Valid JSON | File fails JSON parse | -25 |
Server command present |
MCP server entry missing command field |
-15 |
Settings Files (.claude/settings.json, .claude/settings.local.json)
| Check | Condition | Penalty |
|---|---|---|
| Valid JSON | File fails JSON parse | -25 |
| No hardcoded secrets | Contains API keys, tokens, or passwords | -25 |
| Permission mode sanity | bypassPermissions enabled in a shared project settings file (not .local) |
-15 |
| Recognized keys | Contains unknown top-level keys not in Claude Code schema | -5 each, cap -15 |
| Hook definitions valid | hooks key present — check event names valid and case-correct |
-10 per invalid |
CLAUDE.md
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| R49 | File exists | No CLAUDE.md in plugin root | -10 |
| -- | Under 200 lines | CLAUDE.md exceeds 200 lines | -5 |
| R38 | Actionable content | CLAUDE.md has no actionable guidance (just filler) | -10 |
| R33 | Build/run command | No instructions for how to build or run the project | -10 |
| R34 | Test command | No instructions for how to run tests | -5 |
| R35 | Architecture overview | No structure/component description (what lives where) | -5 |
| R36 | Valid @ imports |
Contains @ import syntax referencing a file that doesn't exist |
-10 |
| R37 | No stale file references | Mentions files or functions that no longer exist in the repo | -10 |
| R38 | Actionability ratio | >60% of content is description rather than instructions | -5 |
| -- | Prerequisites section | No section covering required tools, versions, or setup steps | -5 |
| R39 | No rule conflicts | CLAUDE.md says X while a .claude/rules/ file says not-X |
-15 |
Memory Files
Applies to .md files located in ~/.claude/projects/*/memory/ directories.
| Rule | Check | Pass (+0) | Penalty |
|---|---|---|---|
| -- | Has YAML frontmatter | Present | -15 |
| -- | name in frontmatter |
Present | -10 |
| -- | description in frontmatter |
Present | -10 |
| -- | type in frontmatter |
Present (user/feedback/project/reference) |
-5 |
| -- | Content matches declared type | Yes | -10 |
| -- | Referenced in MEMORY.md index | Yes | -5 (orphaned memory) |
| R37 | Stale content | No references to removed files or functions | -10 |
All Artifact Types: Vague Quantifiers
| Rule | Check | Condition | Penalty |
|---|---|---|---|
| R01 | Vague quantifier | Each occurrence of: "appropriate", "relevant", "as needed", "sufficient", "adequate", "reasonable", "properly", "correctly", "some", "several", "various" without measurable criteria | -2 each |
| R01 | Vague quantifier cap | Total vague quantifier penalty | max -20 |
Cross-Component (--plugin flag)
Applied when linting an entire plugin rather than individual files.
| Check | Condition | Penalty |
|---|---|---|
| Broken partial refs | Command references commands/shared/X.md that doesn't exist |
-20 |
| Broken skill refs | Agent references plugin:skill that isn't installed |
-20 |
| Missing scripts | Hook references script that doesn't exist | -20 |
| Orphaned files | Agent/command/skill file not referenced by anything | -5 per file |
| Contradictions | Two rules/instructions in same plugin directly contradict each other | -15 per pair |
Score Bands
| Range | Label | Meaning |
|---|---|---|
| 90–100 | Excellent | Production-ready; minor or no issues |
| 80–89 | Good | Solid; one or two non-critical gaps |
| 70–79 | Adequate | Meets threshold; noticeable gaps to address |
| 60–69 | Weak | Below threshold; significant issues |
| <60 | Rewrite | Fundamental problems; recommend rewriting from scratch |
Default pass threshold: 70. Configurable in .claude/nlpm.local.md.
Calibration Examples
Example 1: Excellent Agent (95/100)
Artifact:
---
name: dependency-auditor
description: |
Audits project dependencies for security vulnerabilities, outdated packages,
and license compliance issues. Use this agent when checking npm/pip/cargo
dependencies, reviewing package.json or requirements.txt, or running a
security audit before release.
<example>
Context: Developer preparing for production release
user: check if any of our dependencies have known CVEs
assistant: I'll audit your dependencies for security vulnerabilities using
the package manifest files...
</example>
<example>
Context: CI pipeline running pre-merge checks
user: /audit-deps
assistant: Running dependency audit. Scanning package.json and
package-lock.json for vulnerabilities and license issues...
</example>
model: sonnet
color: yellow
tools: ["Read", "Glob", "Bash"]
skills: ["nlpm:conventions"]
---
You are a dependency security auditor. Read all package manifests in the
project. For each dependency, check version ranges against known vulnerability
patterns. Report findings in the format below.
## Output Format
### Summary
Total dependencies: N | Vulnerable: N | Outdated: N | License issues: N
### Findings
| Package | Version | Issue | Severity |
|---------|---------|-------|----------|
Score breakdown:
- Base: 100
descriptionpresent and has 3+ specific phrases: 0- 2
<example>blocks: 0 model: sonnetdeclared and appropriate for security analysis: 0toolsdeclared with only tools used: 0- Output format defined: 0
- Read-only agent (no Write/Edit): 0
- Minor:
Bashtool declared but body doesn't explicitly call it — one unused tool: -3 - No scope note in body: -2 (vague: none)
Final: 97/100 — Excellent. One unused tool (Bash declared but body text doesn't explicitly invoke it) costs 3 points.
(For calibration: a 95 example would have zero unused tools and a scope note. The range 90-100 is Excellent regardless of the exact number.)
Example 2: Rewrite Agent (41/100)
Artifact:
---
name: code-helper
description: "Helps with code tasks in an appropriate and relevant way as needed."
model: opus
tools: ["Read", "Write", "Edit", "Bash", "Glob", "WebSearch", "WebFetch"]
---
You are a helpful coding assistant. Analyze the code and make appropriate
improvements. Handle edge cases as needed and ensure the output is relevant
to the user's requirements.
Score breakdown:
- Base: 100
- Zero
<example>blocks: -15 - Description is generic (1 vague phrase, 0 specific phrases): -15
opusdeclared for a routine code-help task (haiku/sonnet appropriate): -5toolsdeclared but too many unused (WebSearch, WebFetch, Glob all declared without body justification): -10 (judged as 3–4 unused, rounded)- "appropriate" + "relevant" + "as needed" (vague quantifiers, 2 instances): -4
- No output format defined: -10
Total penalties: -59
Final: max(0, 100 - 59) = 41/100 — Rewrite.
(For calibration: the exact number of unused tools and vague quantifier hits can vary by reviewer. The important thing is that this artifact scores well below 60 — multiple fundamental issues.)
Example 3: Excellent Rule (92/100)
Artifact:
---
description: "Always use ${CLAUDE_PLUGIN_ROOT} for intra-plugin file references in hooks and scripts"
paths: ["**/.claude/hooks.json", "**/scripts/*.sh"]
---
**Use `${CLAUDE_PLUGIN_ROOT}` for all file paths within a plugin.**
Because plugins are installed at different locations for different users and
environments, hardcoded absolute paths (e.g. `/Users/alice/.claude/plugins/...`)
break when the plugin is installed by anyone other than the original author.
Using `${CLAUDE_PLUGIN_ROOT}` ensures paths resolve correctly regardless of
install location.
Correct:
```json
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/check.sh"
Incorrect:
"command": "/Users/alice/.claude/plugins/cache/my-plugin/1.0.0/scripts/check.sh"
**Score breakdown:**
- Base: 100
- `description` present: 0
- Bold imperative opening: 0
- Rationale follows: 0
- Specific and testable (can grep for `/Users/` in hooks.json): 0
- `paths` scoped: 0
- Under 500 lines: 0
- Does not duplicate a linter rule: 0
- Minor: no reference to a related rule about portability in other contexts: **-3** (judgment call; not a formal penalty — scoring at 92 reflects excellent but not perfect)
- Vague quantifiers: none: 0
**Final: 92/100** — Excellent. Loses ~8 points for not covering related portability contexts (env vars in MCP configs, etc.) — a judgment call reflected in the score rather than a formal penalty table entry.
---
### Example 4: Weak Rule (40/100)
**Artifact:**
```markdown
Don't write bad code. Code should be clean and well-organized. Avoid using
outdated patterns. Make sure to handle errors appropriately.
Score breakdown:
- Base: 100
- Missing frontmatter (no
descriptionfield): -10 - No bold imperative opening: -5
- No rationale: -10
- Not specific or testable ("bad code", "clean", "well-organized" are unmeasurable): -10
- "appropriately" (vague quantifier): -2
- "well-organized" (vague): -2
- Duplicates what every linter/formatter already enforces ("clean code"): -10
- Rule is not enforceable by NLPM or any automated tool: -10 (enforceability)
Total penalties: -59
Final: max(0, 100 - 59) = 41/100 — Rewrite.
(Calibrated near 40 as specified. The exact value depends on judgment on "well-organized" as a vague quantifier.)
Scope Note
This skill covers the NLPM scoring formula, penalty tables, score bands, and calibration examples. It does NOT cover:
- Artifact schemas and valid field values → see
nlpm:conventions - Patterns and anti-patterns catalog → see
nlpm:patterns - How to run the score command → see
commands/score.md
Known False Positive Patterns
The following findings have historically been reported by the scorer despite having no backing in this rubric. They MUST NOT be penalized:
| Invalid finding | Why it is invalid |
|---|---|
Missing namespace: on skill |
Not in the skill schema; conventions §5 does not list it |
Missing inline hooks:/skills: registration blocks in plugin.json |
conventions §1 defines these as optional path strings |
AskUserQuestion / Task / WebFetch flagged as undocumented tool |
Built-in per conventions §14 |
Agent missing skills: when omission is documented in CLAUDE.md |
Intentional architectural choice |
plugin.json missing engines: / minClaudeVersion: / main: |
All optional per conventions §1 |
| plugin.json description shorter than sibling marketplace.json description | Desynchronization ≠ defect; only penalize if required field is absent |
When in doubt: if a finding cannot be cited to a specific row in the penalty tables above, drop it.
More from xiaolai/nlpm-for-claude
patterns
Use when writing or reviewing NL artifacts and need to check for anti-patterns — vague quantifiers, prohibitions without alternatives, oversized skills, write-on-read-only agents, monolithic prompts, or linter-duplicating rules.
2conventions
Use when writing, reviewing, or validating Claude Code plugin artifacts — check frontmatter schemas, hook event names, naming conventions, prompt structure, or reference syntax. Loaded by the NLPM scorer and checker agents for schema validation.
2writing-prompts
How to write effective system prompts for any LLM. Universal prompt engineering -- role clarity, structured output, injection resistance, few-shot examples. Use when writing prompts, system instructions, or AI configuration.
2security
Detects execution surface risks, supply chain vulnerabilities, data exfiltration vectors, and prompt injection patterns in Claude Code plugins. Use when auditing plugins for security risks, reviewing MCP server configurations, scanning hooks and scripts for vulnerabilities, or checking extensions before installation.
1testing
Use when writing test specs for NL artifacts, running /nlpm:test, or setting up TDD workflows for skills, agents, commands, rules, hooks, and prompts.
1rules
The 50 rules of natural language programming. Loaded when writing, reviewing, or improving any NL artifact — skills, agents, commands, rules, hooks, prompts, plugins, CLAUDE.md. The definitive style guide for NL code quality.
1