LLM Challenge

Benchmark for @tailor-platform/sdk AI-friendliness. Located at llm-challenge/.

Read llm-challenge/README.md for commands, scoring, and verification details.

Core Rule

When AI fails a challenge, improve the SDK (JSDoc, error messages, types, CLAUDE.md) — NEVER add hints to problem.md.

ALWAYS build SDK before running: pnpm -C packages/sdk build

Structure: problems/<id>-<name>/ with meta.json, problem.md, scaffold/, solution/, tests/

meta.json rules:

id: 3-digit zero-padded, sequential
scoring: Category defaults — tailordb: 20/20/60, resolver/executor/workflow: 15/15/70, config: 30/20/50, fix-broken: 15/15/70
Fix-broken problem: same file appears in both implement and scaffold

problem.md rules:

Sections: Goal → Domain Context/Instructions → What to Build → Requirements → Reference
NEVER include SDK code examples — AI must discover API from the SDK package itself
Always end with "Refer to the installed SDK package for ..."

Read existing tests in problems/*/tests/ for patterns
Helpers: shared/test-helpers.ts (createWorkDirContext, importPath, expectFieldType, etc.)
Mocks: shared/mocks.ts (setupTailordbMock, setupWorkflowMock)
ALWAYS use describe.skipIf(!workDirReady) guard

Next sequential ID (e.g., 013)
Write solution first, then tests
Verify: pnpm -C llm-challenge challenge --problem <id> --use-solution → must be 100/100

pnpm -C llm-challenge challenge:solve --retry 2 → analyze failures
Improve SDK source (NOT problem descriptions)
pnpm -C packages/sdk build → pnpm -C llm-challenge challenge:verify-solution
Re-run benchmark to measure improvement