audit-context
audit-context
Diagnose a nao context. Find gaps, MECE violations, failure root causes, and bloat. Output is a short in-conversation report ending in a prioritized plan. Diagnose only — never fix. Route fixes to write-context-rules / add-semantic-layer / create-context-tests.
Run any time: right after setup-context, mid-build, before a release, or when the agent's behavior gets surprising.
Six checks (run in order)
1. Synced context
Read nao_config.yaml. What's wired in (warehouse, repos, Notion, semantic layer, MCPs)? What's missing (dbt repo, ETL configs, BI repo, internal docs)? Has nao sync run — are databases/, repos/, docs/, semantics/ populated?
Scope check: <100 tables is the hard ceiling, ≤20 is the target. Better 12 well-documented tables than 80 half-documented ones. Flag oversized scope explicitly — it's the biggest predictor of reliability failure.
2. RULES.md vs target structure
Six standard sections (from write-context-rules): Business overview, Data architecture, Core data models (Most Used + Tables detail), Key Metrics Reference, Date filtering, Analysis Process. Per section, mark present / missing / thin. Flag placeholders, TODO: markers, and metric entries with no source-of-truth pointer.
3. Context coverage (per table)
For every table in databases/: is it in ## Most Used Tables? Does it have a ## Tables detail block? Is there dbt context (repos/<dbt>/models/**/schema.yml)? Any extra .md?
Then per-table gaps: undocumented columns the agent will reference, calculated fields with no explanation, foreign keys with no relation, common WHERE filters not mentioned anywhere. A table with no docs anywhere is a high-priority finding.
4. Data model consistency (MECE)
- Mutually exclusive? Two tables computing the same metric differently (worst issue — the agent picks one unpredictably).
- Collectively exhaustive? Asked metrics that no in-scope table can answer.
- Duplicated columns? Same logical field under different names (
user_id/customer_id/account_id). - Ambiguous columns?
amountwithout unit,statuswithout enum values.
5. Test coverage
If tests/ is empty → recommend create-context-tests. Otherwise read tests/outputs/ (most recent run) and categorize each failure:
| Category | Looks like | Fix |
|---|---|---|
| Data model | Wrong column / wrong table | Add column descriptions; clarify granularity |
| Date selection | Wrong period / week start | Add DO/DON'T SQL in ## Date filtering |
| Test issue | Test SQL itself is wrong | Fix the test, not the context |
| Interpretation | Reasonable but different reading | Add to naming conventions or ## Key Metrics Reference |
| Metric definition | Wrong formula / source | Tighten ## Key Metrics Reference or add a semantic layer |
Propose the smallest rule change per failure. Sort by impact (tests affected).
6. Token optimization
- Files >40KB (flag).
## Tables detailblocks exceeding the 10-column cap.- Duplication between
RULES.mdanddatabases/<table>.md. - In-scope tables with no mention in any test or recent question (trim candidates).
- Raw / staging tables that snuck into scope.
If RULES.md is bloated, suggest moving per-table detail to databases/<table>.md and keeping only the one-line pointer in ## Most Used Tables. For multi-domain bloat, propose a per-domain file map referenced from RULES.md. Show the proposed structure before moving anything.
Output (in conversation, not a file)
Lead with a one-paragraph summary: sync state | scope wideness (N tables vs ≤100 ceiling) | rules quality (N/6 sections substantive) | test coverage (N tests, X% passing).
Then deep-dive only the sections with findings. Skip clean ones. Format hints:
- Synced / RULES.md / token bloat → bulleted gaps.
- Context coverage → table:
Table | RULES.md | dbt docs | Extra .md | Gap. - MECE → bullets.
- Test failures → table:
Test | Category | Proposed fix.
End with a prioritized plan (easiest-win → biggest-work), each item naming the skill that does the work:
## Plan
1. (easy / 5 min) ... → write-context-rules
2. (small / 30 min) ... → create-context-tests
3. (medium / 1-2 hr) ... → audit-context (rerun after)
4. (large / multi-session) ... → add-semantic-layer
Guardrails
- Apply one change at a time. Re-run tests between fixes.
- Tests are the source of truth. If the user says "it's working," ask for the latest pass rate first.
- Don't move or split files without confirmation. Show the file map first.
- Don't fix in this skill — diagnose only.
More from getnao/nao
write-context-rules
Create or extend a nao project's RULES.md. Owns the RULES.md template. Use when the user wants to generate the initial RULES.md from synced metadata (called by setup-context), or improve their existing RULES.md. Do not use for first-time scope setup (use setup-context) or for diagnosing existing problems (use audit-context).
22create-context-tests
Generate a test suite of natural-language → SQL pairs that becomes the quality benchmark for a nao agent, then run it via `nao test`. Use when the user wants to start measuring agent reliability, extend an existing test suite, or add tests for new metrics. Tests are the only honest answer to "is the context working?". Do not use for writing rules (write-context-rules) or diagnosing failures (audit-context).
20setup-context
Bootstrap a nao agent for a project — gather warehouse + scope + extra-context info in one round, look up the warehouse-specific config from nao docs, write nao_config.yaml, run nao init + nao sync, set up the LLM key, and generate the first RULES.md. Use when the user has just decided to use nao on a new project. Only for first-time setup; for editing rules, generating tests, or reviewing an existing context, use write-context-rules / create-context-tests / audit-context.
19add-semantic-layer
Wire a semantic layer into a nao agent so that metric queries are routed through a single source of truth. Supports dbt MetricFlow (dbt Cloud with Semantic Layer), Snowflake (views or semantic views via MCP), an in-house nao YAML semantic layer, or other tools (via MCP discovery). Installs the right MCP server, updates RULES.md to route metric queries through the semantic layer, and (for the nao YAML option) generates starter metric files. Use after a first round of tests has shown the agent struggling with metric reliability. Do not use for raw rule writing (write-context-rules) or first-time setup (setup-context).
19