Evaluation Methodology

This document is the authoritative reference for how PluginEval measures plugin and skill quality. It covers the three evaluation layers, all ten scoring dimensions, the composite formula, badge thresholds, anti-pattern flags, Elo ranking, and actionable improvement tips.

Related: Full rubric anchors

The Three Evaluation Layers

PluginEval stacks three complementary layers. Each layer produces a score between 0.0 and 1.0 for each applicable dimension, and later layers override or blend with earlier ones according to per-dimension blend weights.

Layer 1 — Static Analysis

Speed: < 2 seconds. No LLM calls. Deterministic.

Installs

4.3K

Repository

wshobson/agents

GitHub Stars

37.5K

First Seen

Mar 27, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass