workflow-quality-audit
Workflow: Quality Audit
A 7-step pipeline for auditing any collection of items against a set of standards. Domain-agnostic by design — the caller supplies a config block that plugs in the item source, standards document, evidence capture method, evaluation template, and reporting target.
When to Use
- After completing a body of work (design wave, sprint, migration) that needs QA sign-off
- Before a release to verify quality across a surface area
- For structured reviews: code, visual, accessibility, compliance, performance
- Any time you have N items and need a per-item verdict against defined criteria
Config Schema
Supply a YAML config block when invoking. All fields are optional — the pipeline will prompt for missing required fields.
# --- Required ---
domain: <string> # Human label (e.g., "kingly-design-system", "api-endpoints")
items:
source: <enum> # bd-children | file-glob | manual-list | api
parent: <string> # bd parent bead ID (if source=bd-children)
glob: <string> # File glob pattern (if source=file-glob)
list: [<string>, ...] # Explicit item list (if source=manual-list)
api: <string> # API endpoint/command (if source=api)
standards:
doc: <path> # Path to standards/checklist document
sections: [<string>, ...] # Specific sections to evaluate (optional, default=all)
known_gaps: <string> # Section to skip or flag separately
# --- Optional ---
evidence:
type: <enum> # screenshot | test-run | recording | measurement | source-read
capture: <string> # Shell command template for capture
naming: <string> # Filename template: {id}, {slug}, {date} supported
storage: <path> # Output directory for evidence artifacts
evaluation:
template: <enum> # component-audit | visual-audit | code-review | generic | custom
custom_path: <path> # Path to custom checklist (if template=custom)
agents: <int> # Number of parallel evaluator agents (default: 2)
parallel: <bool> # Run evaluators in parallel (default: true)
split_by: <string> # How to partition work (e.g., "source-vs-visual", "by-epic")
reporting:
target: <enum> # bd-comments | markdown-files | stdout
verdict_taxonomy: [<string>, ...] # Custom verdicts (default: [PASS, NEEDS_WORK, BLOCKED])
output_dir: <path> # Directory for report files (default: .lev/reports/)
summary:
format: <enum> # scorecard | heatmap | backlog (default: scorecard)
Pipeline Steps
Step 1: INVENTORY
Enumerate all items to audit.
| Source | Method |
|---|---|
bd-children |
bd list --parent {parent} — collects bead IDs + titles |
file-glob |
Glob({glob}) — each matching file is an item |
manual-list |
Literal list from config |
api |
Execute {api} command, parse JSON/lines output |
Output: An ordered list of (id, title/name, metadata) tuples.
Persist to .lev/reports/audit-{domain}-{date}/inventory.json.
Step 2: STANDARDS
Load and parse the evaluation criteria.
- Read the
standards.docfile - If
sectionsis specified, extract only those sections - If
known_gapsis specified, flag that section as informational (not scored) - Parse into a checklist of
(category, criterion, weight)triples
Output: Structured checklist ready for per-item evaluation.
Step 3: EVIDENCE
Capture per-item evidence using the configured method.
| Type | Method |
|---|---|
screenshot |
Execute {capture} command per item, save to {storage}/{naming} |
test-run |
Execute test suite, capture per-item results |
recording |
Capture screen/interaction recording per item |
measurement |
Run measurement command, capture metrics |
source-read |
Read source files associated with each item |
Evidence is stored at {storage}/{naming} with template variables expanded.
If capture fails for an item, mark it EVIDENCE_FAILED and continue.
Output: Evidence manifest mapping each item ID to its artifact paths.
Step 4: EVALUATE
Run the evaluation checklist against each item's evidence.
Team delegation pattern:
- If
agents > 1andparallel: true, partition items across N evaluator agents - Each evaluator receives: item subset, standards checklist, evidence paths
- Use
split_byto control partitioning (round-robin if not specified)
Per-item evaluation produces:
## {item.title}
### {category_1}
- [ ] {criterion}: {PASS|FAIL} — {note}
- [ ] {criterion}: {PASS|FAIL} — {note}
### {category_2}
...
**Verdict:** {verdict} ({pass_count}/{total_count})
**Issues:**
1. {issue description} — severity: {P0-P3}
Built-in templates (see templates/ directory):
component-audit— Visual + Code Quality + Accessibility + Standards Compliancevisual-audit— Visual Fidelity + Storybook Coverage + Code Structurecode-review— Correctness + Style + Performance + Security + Testsgeneric— User-defined categories (bare scaffold)custom— Load fromcustom_path
Step 5: REPORT
Post the structured verdict for each item.
| Target | Method |
|---|---|
bd-comments |
bd comment {item.id} --body "{verdict}" per item |
markdown-files |
Write {output_dir}/{item.id}-verdict.md per item |
stdout |
Print verdicts to console |
Each verdict includes: item ID, title, verdict label, pass/fail counts, issue list.
Step 6: VERIFY
Confirm 100% coverage.
- Load inventory from Step 1
- Load all verdicts from Step 5
- Check: every item has exactly one verdict
- Check: no items are
EVIDENCE_FAILEDwithout a retry or exception note - Check: verdict taxonomy matches config (no unknown labels)
Output: Coverage report — {covered}/{total} with any gaps listed.
If gaps exist, report them and optionally re-run Steps 3-5 for missing items.
Step 7: SUMMARIZE
Aggregate findings into a top-level summary.
Scorecard format (default):
Domain: {domain}
Date: {date}
Items: {total} | PASS: {n} | NEEDS_WORK: {n} | BLOCKED: {n}
Coverage: {covered}/{total} ({pct}%)
Issues: {total_issues} (P0: {n}, P1: {n}, P2: {n}, P3: {n})
Top systemic patterns:
1. {pattern} — {count} items affected
2. {pattern} — {count} items affected
Write full report to {output_dir}/audit-{domain}-{date}.md.
Team Delegation
For audits with >5 items, delegate to parallel evaluators:
| Role | Responsibility |
|---|---|
| Coordinator (you) | Steps 1, 2, 6, 7 — inventory, standards, verify, summarize |
| Evaluator 1..N | Steps 3-5 — capture evidence, evaluate, post verdicts |
Evaluators receive a self-contained work packet:
- Their item subset (IDs + titles)
- The full standards checklist
- The evidence capture config
- The report template + target config
Evaluators return a manifest: {items_evaluated, verdicts, issues_found, evidence_paths}.
Domain Examples
UI Components (the original use case)
domain: kingly-design-system
items: { source: bd-children, parent: oclaw-top.2 }
standards: { doc: packages/kingly-ui/DESIGN_STANDARDS.md }
evidence: { type: screenshot, capture: "apptestr screenshot" }
evaluation: { template: component-audit, agents: 2, parallel: true }
reporting: { target: bd-comments, verdict_taxonomy: [PASS, NEEDS_WORK] }
Code Review
domain: api-endpoints
items: { source: file-glob, glob: "src/routes/**/*.ts" }
standards: { doc: docs/API_STANDARDS.md }
evidence: { type: source-read }
evaluation: { template: code-review, agents: 2 }
reporting: { target: markdown-files, output_dir: .lev/reports/api-review/ }
SDLC / Sprint Review
domain: sprint-42
items: { source: bd-children, parent: sprint-42 }
standards: { doc: docs/DEFINITION_OF_DONE.md }
evidence: { type: test-run, capture: "pnpm test --filter {id}" }
evaluation: { template: generic }
reporting: { target: bd-comments, verdict_taxonomy: [SHIP, FIX, HOLD] }
Icon Design
domain: app-icons
items: { source: file-glob, glob: "assets/icons/**/*.svg" }
standards: { doc: docs/BRAND_GUIDE.md, sections: [Icons, Color Palette] }
evidence: { type: measurement, capture: "convert {path} -resize 16x16 /tmp/{slug}-16.png" }
evaluation: { template: visual-audit }
reporting: { target: markdown-files, verdict_taxonomy: [APPROVED, REVISE] }
Validation
The audit is complete when:
- Inventory persisted with item count logged
- Standards checklist parsed and loaded
- Evidence captured for all items (or failures documented)
- Every item has exactly one verdict
- Coverage is 100% (or gaps explicitly noted)
- Summary report written to output directory
- All P0/P1 issues have corresponding tracking items (if using bd)
Notes
- For
bd-childrensource, items inherit the parent's epic/project context - Evidence capture failures are non-fatal — the item gets
EVIDENCE_FAILEDand continues - The
known_gapsstandards section is included in reports as informational but not scored - Parallel evaluation scales linearly: 2 agents = ~2x throughput for >10 items
- Custom verdict taxonomies must have at least 2 labels (one positive, one negative)
- Templates are composable —
component-auditincludes visual + code + a11y categories
More from lev-os/agents
agent-browser
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
14research
Use when any research, search, or information gathering is needed.
13lev-design
|
12work
|
11lev-intake
|
11lev
|
10