scoring

Installation

SKILL.md

NLPM Quality Scoring Rubric

100-point quality scale for all NL programming artifacts. Apply penalties deterministically. Use calibration examples to anchor judgment on borderline cases.

Scoring Formula

base_score = 100
adjustments = sum of all applicable penalties (all penalties are negative)
final_score = max(0, min(100, base_score + adjustments))

Penalties stack. The floor is 0; the ceiling is 100. No bonuses — the default assumption is that an artifact is well-formed, and quality is measured by what is missing or wrong.

Penalty Tables

Skills

Rule	Check	Condition	Penalty
--	`name` present	Missing	-25
R04	`description` present	Missing	-25
R04	Trigger quality	Description is generic (≤1 specific phrase)	-15
R05	Body length	400–500 lines	-5
R05	Body length	>500 lines	-10
R06	Code examples	Complex concepts with no examples	-5
R06	Code examples	No examples at all in a technical skill	-10
R07	Scope note	No scope note / cross-references	-3

Agents

Rule	Check	Condition	Penalty
R09	`description` present	Missing	-25
R09	`<example>` blocks	Exactly 1 example	-5
R09	`<example>` blocks	Zero examples	-15
R10	`model` declared	Not declared	-5
R10	`model` appropriate	Wrong tier for task (e.g. opus for parsing)	-5
R11	`tools` declared	Not declared	-5
R11	Unused tools	Each tool declared but not used in body	-3 each
R12	Output format	No output format spec in body	-10
R11	Write on read-only	Audit/review/scan agent declares Write or Edit	-10

Commands

Rule	Check	Condition	Penalty
--	`description` present	Missing	-25
R18	`argument-hint` present	Command takes input but no hint	-5
R14	Steps numbered	Multi-step body with no numbered steps	-10
R15	Empty input handling	No handling for empty/missing input	-10
R16	Output format	No output format defined	-10
R17	Error paths	No error handling for missing files or bad data	-5

Shared Partials

Rule	Check	Condition	Penalty
R19	`user-invocable: false`	Missing or set to true	-25
R20	Purpose clear	Description doesn't state it's a partial	-10

Rules

Rule	Check	Condition	Penalty
R21	`description` present	Missing frontmatter description	-10
R21	Format: bold imperative	No bold imperative opening	-5
R21	Format: rationale	No rationale following the imperative	-10
R22	Enforceability	Rule is not specific/testable	-10
R23	Budget	Rule file over 500 lines	-15
R26	Conflicts with other rules	Direct contradiction with another rule in same set	-20
R24	Duplicates tooling	Re-states what eslint/ruff/clippy already catches	-10

Hooks

Rule	Check	Condition	Penalty
--	Valid JSON	hooks.json fails JSON parse	-25
R27	Event names valid	Uses unrecognized event name	-15
R27	Case correct	Event name has wrong case (e.g. `pretooluse`)	-10
R29	Scripts exist	Referenced script file does not exist	-20
--	Command safety	Hook command contains dangerous patterns (rm -rf, git push --force, DROP TABLE)	-15
--	Matcher regex valid	Matcher pattern doesn't compile as valid regex	-10
--	Timeout reasonable	Hook specifies timeout > 30s (likely hangs)	-5

plugin.json

Check	Condition	Penalty
`name` present	Missing	-25
`version` is semver	Present but not valid semver	-10
`description` present	Missing	-5

.mcp.json

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
Server `command` present	MCP server entry missing `command` field	-15

Settings Files (.claude/settings.json, .claude/settings.local.json)

Check	Condition	Penalty
Valid JSON	File fails JSON parse	-25
No hardcoded secrets	Contains API keys, tokens, or passwords	-25
Permission mode sanity	`bypassPermissions` enabled in a shared project settings file (not .local)	-15
Recognized keys	Contains unknown top-level keys not in Claude Code schema	-5 each, cap -15
Hook definitions valid	`hooks` key present — check event names valid and case-correct	-10 per invalid

CLAUDE.md

Rule	Check	Condition	Penalty
R49	File exists	No CLAUDE.md in plugin root	-10
--	Under 200 lines	CLAUDE.md exceeds 200 lines	-5
R38	Actionable content	CLAUDE.md has no actionable guidance (just filler)	-10
R33	Build/run command	No instructions for how to build or run the project	-10
R34	Test command	No instructions for how to run tests	-5
R35	Architecture overview	No structure/component description (what lives where)	-5
R36	Valid `@` imports	Contains `@` import syntax referencing a file that doesn't exist	-10
R37	No stale file references	Mentions files or functions that no longer exist in the repo	-10
R38	Actionability ratio	>60% of content is description rather than instructions	-5
--	Prerequisites section	No section covering required tools, versions, or setup steps	-5
R39	No rule conflicts	CLAUDE.md says X while a `.claude/rules/` file says not-X	-15

Memory Files

Applies to .md files located in ~/.claude/projects/*/memory/ directories.

Rule	Check	Pass (+0)	Penalty
--	Has YAML frontmatter	Present	-15
--	`name` in frontmatter	Present	-10
--	`description` in frontmatter	Present	-10
--	`type` in frontmatter	Present (`user`/`feedback`/`project`/`reference`)	-5
--	Content matches declared type	Yes	-10
--	Referenced in MEMORY.md index	Yes	-5 (orphaned memory)
R37	Stale content	No references to removed files or functions	-10

All Artifact Types: Vague Quantifiers

Rule	Check	Condition	Penalty
R01	Vague quantifier	Each occurrence of: "appropriate", "relevant", "as needed", "sufficient", "adequate", "reasonable", "properly", "correctly", "some", "several", "various" without measurable criteria	-2 each
R01	Vague quantifier cap	Total vague quantifier penalty	max -20

Cross-Component (--plugin flag)

Applied when linting an entire plugin rather than individual files.

Check	Condition	Penalty
Broken partial refs	Command references `commands/shared/X.md` that doesn't exist	-20
Broken skill refs	Agent references `plugin:skill` that isn't installed	-20
Missing scripts	Hook references script that doesn't exist	-20
Orphaned files	Agent/command/skill file not referenced by anything	-5 per file
Contradictions	Two rules/instructions in same plugin directly contradict each other	-15 per pair

Score Bands

Range	Label	Meaning
90–100	Excellent	Production-ready; minor or no issues
80–89	Good	Solid; one or two non-critical gaps
70–79	Adequate	Meets threshold; noticeable gaps to address
60–69	Weak	Below threshold; significant issues
<60	Rewrite	Fundamental problems; recommend rewriting from scratch

Default pass threshold: 70. Configurable in .claude/nlpm.local.md.

Calibration Examples

Example 1: Excellent Agent (95/100)

Artifact:

---
name: dependency-auditor
description: |
  Audits project dependencies for security vulnerabilities, outdated packages,
  and license compliance issues. Use this agent when checking npm/pip/cargo
  dependencies, reviewing package.json or requirements.txt, or running a
  security audit before release.

  <example>
  Context: Developer preparing for production release
  user: check if any of our dependencies have known CVEs
  assistant: I'll audit your dependencies for security vulnerabilities using
  the package manifest files...
  </example>

  <example>
  Context: CI pipeline running pre-merge checks
  user: /audit-deps
  assistant: Running dependency audit. Scanning package.json and
  package-lock.json for vulnerabilities and license issues...
  </example>
model: sonnet
color: yellow
tools: ["Read", "Glob", "Bash"]
skills: ["nlpm:conventions"]
---

You are a dependency security auditor. Read all package manifests in the
project. For each dependency, check version ranges against known vulnerability
patterns. Report findings in the format below.

## Output Format
### Summary
Total dependencies: N | Vulnerable: N | Outdated: N | License issues: N

### Findings
| Package | Version | Issue | Severity |
|---------|---------|-------|----------|

Score breakdown:

Base: 100
description present and has 3+ specific phrases: 0
2 <example> blocks: 0
model: sonnet declared and appropriate for security analysis: 0
tools declared with only tools used: 0
Output format defined: 0
Read-only agent (no Write/Edit): 0
Minor: Bash tool declared but body doesn't explicitly call it — one unused tool: -3
No scope note in body: -2 (vague: none)

Final: 97/100 — Excellent. One unused tool (Bash declared but body text doesn't explicitly invoke it) costs 3 points.

(For calibration: a 95 example would have zero unused tools and a scope note. The range 90-100 is Excellent regardless of the exact number.)

Example 2: Rewrite Agent (41/100)

Artifact:

---
name: code-helper
description: "Helps with code tasks in an appropriate and relevant way as needed."
model: opus
tools: ["Read", "Write", "Edit", "Bash", "Glob", "WebSearch", "WebFetch"]
---

You are a helpful coding assistant. Analyze the code and make appropriate
improvements. Handle edge cases as needed and ensure the output is relevant
to the user's requirements.

Score breakdown:

Base: 100
Zero <example> blocks: -15
Description is generic (1 vague phrase, 0 specific phrases): -15
opus declared for a routine code-help task (haiku/sonnet appropriate): -5
tools declared but too many unused (WebSearch, WebFetch, Glob all declared without body justification): -10 (judged as 3–4 unused, rounded)
"appropriate" + "relevant" + "as needed" (vague quantifiers, 2 instances): -4
No output format defined: -10

Total penalties: -59

Final: max(0, 100 - 59) = 41/100 — Rewrite.

(For calibration: the exact number of unused tools and vague quantifier hits can vary by reviewer. The important thing is that this artifact scores well below 60 — multiple fundamental issues.)

Example 3: Excellent Rule (92/100)

Artifact:

---
description: "Always use ${CLAUDE_PLUGIN_ROOT} for intra-plugin file references in hooks and scripts"
paths: ["**/.claude/hooks.json", "**/scripts/*.sh"]
---

**Use `${CLAUDE_PLUGIN_ROOT}` for all file paths within a plugin.**

Because plugins are installed at different locations for different users and
environments, hardcoded absolute paths (e.g. `/Users/alice/.claude/plugins/...`)
break when the plugin is installed by anyone other than the original author.
Using `${CLAUDE_PLUGIN_ROOT}` ensures paths resolve correctly regardless of
install location.

Correct:
```json
"command": "${CLAUDE_PLUGIN_ROOT}/scripts/check.sh"

Incorrect:

"command": "/Users/alice/.claude/plugins/cache/my-plugin/1.0.0/scripts/check.sh"


**Score breakdown:**
- Base: 100
- `description` present: 0
- Bold imperative opening: 0
- Rationale follows: 0
- Specific and testable (can grep for `/Users/` in hooks.json): 0
- `paths` scoped: 0
- Under 500 lines: 0
- Does not duplicate a linter rule: 0
- Minor: no reference to a related rule about portability in other contexts: **-3** (judgment call; not a formal penalty — scoring at 92 reflects excellent but not perfect)
- Vague quantifiers: none: 0

**Final: 92/100** — Excellent. Loses ~8 points for not covering related portability contexts (env vars in MCP configs, etc.) — a judgment call reflected in the score rather than a formal penalty table entry.

---

### Example 4: Weak Rule (40/100)

**Artifact:**
```markdown
Don't write bad code. Code should be clean and well-organized. Avoid using
outdated patterns. Make sure to handle errors appropriately.

Score breakdown:

Base: 100
Missing frontmatter (no description field): -10
No bold imperative opening: -5
No rationale: -10
Not specific or testable ("bad code", "clean", "well-organized" are unmeasurable): -10
"appropriately" (vague quantifier): -2
"well-organized" (vague): -2
Duplicates what every linter/formatter already enforces ("clean code"): -10
Rule is not enforceable by NLPM or any automated tool: -10 (enforceability)

Total penalties: -59

Final: max(0, 100 - 59) = 41/100 — Rewrite.

(Calibrated near 40 as specified. The exact value depends on judgment on "well-organized" as a vague quantifier.)

Scope Note

This skill covers the NLPM scoring formula, penalty tables, score bands, and calibration examples. It does NOT cover:

Artifact schemas and valid field values → see nlpm:conventions
Patterns and anti-patterns catalog → see nlpm:patterns
How to run the score command → see commands/score.md

Known False Positive Patterns

The following findings have historically been reported by the scorer despite having no backing in this rubric. They MUST NOT be penalized:

Invalid finding	Why it is invalid
Missing `namespace:` on skill	Not in the skill schema; `conventions` §5 does not list it
Missing inline `hooks:`/`skills:` registration blocks in plugin.json	`conventions` §1 defines these as optional path strings
`AskUserQuestion` / `Task` / `WebFetch` flagged as undocumented tool	Built-in per `conventions` §14
Agent missing `skills:` when omission is documented in CLAUDE.md	Intentional architectural choice
plugin.json missing `engines:` / `minClaudeVersion:` / `main:`	All optional per `conventions` §1
plugin.json description shorter than sibling marketplace.json description	Desynchronization ≠ defect; only penalize if required field is absent

When in doubt: if a finding cannot be cited to a specific row in the penalty tables above, drop it.

Related skills

More from xiaolai/nlpm-for-claude

Installs

Repository

xiaolai/nlpm-for-claude

GitHub Stars

First Seen

2 days ago

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

scoring

NLPM Quality Scoring Rubric

Scoring Formula

Penalty Tables

Skills

Agents

Commands

Shared Partials

Rules

Hooks

plugin.json

.mcp.json

Settings Files (.claude/settings.json, .claude/settings.local.json)

CLAUDE.md

Memory Files

All Artifact Types: Vague Quantifiers

Cross-Component (--plugin flag)

Score Bands

Calibration Examples

Example 1: Excellent Agent (95/100)

Example 2: Rewrite Agent (41/100)

Example 3: Excellent Rule (92/100)

Scope Note

Known False Positive Patterns

More from xiaolai/nlpm-for-claude

patterns

conventions

writing-prompts

security

testing

rules