skillshare-cli-e2e-test
Run isolated E2E tests in devcontainer. $ARGUMENTS specifies runbook name or "new".
Flow
Phase 0: Environment Check
-
Confirm devcontainer is running and get container ID:
CONTAINER=$(docker compose -f .devcontainer/docker-compose.yml ps -q skillshare-devcontainer)- If empty → prompt user:
docker compose -f .devcontainer/docker-compose.yml up -d - Ensure
CONTAINERis set for all subsequentdocker execcalls.
- If empty → prompt user:
-
Confirm Linux binary is available:
docker exec $CONTAINER bash -c \ '/workspace/.devcontainer/ensure-skillshare-linux-binary.sh && ss version' -
Confirm mdproof is installed:
docker exec $CONTAINER /workspace/.devcontainer/ensure-mdproof.shThis auto-installs from GitHub release, or falls back to
/workspace/bin/mdproof(local dev binary). -
Check for lessons learned from previous runs:
test -f /workspace/.mdproof/lessons-learned.md && cat /workspace/.mdproof/lessons-learned.mdIf the file exists, read it before writing or debugging runbooks — it contains known gotchas and assertion patterns.
Phase 1: Detect Scope
-
Preview all available runbooks via the container:
docker exec $CONTAINER mdproof --dry-run --report json /workspace/ai_docs/tests/This returns JSON with every runbook's steps, commands, and expected assertions — no manual markdown parsing needed. Use this to understand what each runbook covers.
-
Identify recent changes (unstaged + recent commits):
git diff --name-only HEAD~3 -
Match changes to relevant runbooks (compare changed file paths against step commands in the JSON output).
Phase 2: Select Tests
Prompt user (via AskUserQuestion):
- Option A: Run existing runbook (list all available + mark those related to recent changes)
- Option B: Auto-generate new test script based on recent changes
- Option C: If $ARGUMENTS specifies a runbook, skip to Phase 3
Phase 3: Prepare & Execute
Running existing runbook:
-
Create isolated environment with auto-initialization:
ENV_NAME="e2e-$(date +%Y%m%d-%H%M%S)" # Use --init to automatically run 'ss init -g' with all targets docker exec $CONTAINER ssenv create "$ENV_NAME" --init -
Execute the entire runbook via mdproof inside the container:
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- \ mdproof --report json \ /workspace/ai_docs/tests/<runbook_file>.mdmdproof executes each step (
bash -c <command>) in the ssenv-isolated HOME, then returns structured JSON:{ "version": "1", "runbook": "<runbook_file>.md", "duration_ms": 12345, "summary": { "total": 7, "passed": 5, "failed": 1, "skipped": 1 }, "steps": [ { "step": { "number": 1, "title": "...", "command": "...", "expected": ["..."] }, "status": "passed", // "passed" | "failed" | "skipped" "exit_code": 0, "stdout": "...", "stderr": "..." } ] } -
Analyze the JSON output:
- All passed → proceed to Phase 4
- Any failed → filter for failures only (full JSON can be too large for terminal output):
mdproof --report json runbook.md 2>&1 | jq '{ summary: .summary, failed: [.steps[] | select(.status == "failed") | { step: .step.number, title: .step.title, exit_code: .exit_code, failed_assertions: [.assertions[]? | select(.matched == false) | .pattern], stderr: (.stderr // "" | .[0:200]) }] }' - Skipped steps (executor=
manual) → these need manual verification, run them individually:docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- <command from step.command>
-
For failed steps, debug individually using manual docker exec (same as before):
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \ ssenv enter "$ENV_NAME" -- bash -c '<failed step command>'- Prefer
--json+jqfor assertions — see the JSON Reference below
- Prefer
Generating new runbook:
- Read
git diff HEAD~3to find changed files incmd/skillshare/orinternal/ - Read changed files to understand new/modified functionality
- Validate all CLI flags before writing — for every
ss <command> <flag>in the runbook:- Grep
cmd/skillshare/<command>.gofor the exact flag string (e.g."--force") - Run
ss <command> --helpinside container if needed - Common mistakes to avoid:
uninstall --yes→ wrong, use--force/-finit --target <name>→ wrong,inithas no--targetflaginit -phas a completely separate flag set from globalinit— only supports--targets,--discover,--select,--mode,--dry-run. Global-only flags like--no-copy,--no-skill,--no-git,--all-targets,--forcedo NOT exist in project mode- Audit custom rules: disable by rule ID (e.g.
prompt-injection-0,prompt-injection-1), NOT pattern name (e.g.prompt-injection). Rule IDs are ininternal/audit/rules.yaml
- Grep
- Generate new runbook to
ai_docs/tests/<slug>_runbook.md, following existing conventions:- YAML-free, pure Markdown
- Has Scope, Environment, Steps (each with bash + Expected), Pass Criteria
- Use
jq:assertions in Expected blocks for JSON commands — e.g.- jq: .extras | length == 1. This is a native mdproof assertion type, NOT a bashjqpipe - Use
--json+jq -ein bash for inline verification within multi-command steps - Config idempotency — never bare
cat >> config.yaml; always prependsed -i '/^section:/,$d'to remove existing section first, or use CLI commands (ss extras init,ss extras remove --force) that handle duplicates - Check
ai_docs/tests/runbook.jsonfor project-level config (build, setup, teardown, step_setup, timeout) that affects all runbooks - Check
.mdproof/lessons-learned.mdfor known assertion patterns and gotchas
- Run the runbook quality checklist (see below) before executing
- Then execute the new runbook (same flow as above)
Phase 4: Cleanup & Report
-
Ask user before cleanup (via AskUserQuestion):
- Option A: Delete ssenv environment now
- Option B: Keep for manual debugging (print env name for later
ssenv delete)
-
If user chose Option A:
docker exec $CONTAINER ssenv delete "$ENV_NAME" --force -
Output summary (derived from the runbook JSON output):
── E2E Test Report ── Runbook: {runbook name} Env: {ENV_NAME} Duration: {duration_ms}ms Step 1: {title} PASS Step 2: {title} PASS Step 3: {title} FAIL ← exit_code={N}, stderr: {error detail} ... Result: {passed}/{total} passed ({skipped} skipped)All values come directly from mdproof's JSON output —
summary.passed,summary.total,steps[].step.title,steps[].status. -
If any FAIL → distinguish between runbook bug vs real bug:
- Runbook bug: wrong flag, wrong file path, stale assertion → fix runbook, re-run step
- Real bug: CLI misbehavior → analyze cause, provide fix suggestions
-
Retrospective — ask user (via AskUserQuestion):
Did you encounter any friction during this test run that the skill or runbook could handle better?
- Option A: Yes, improve e2e skill — review test friction (wrong flags, stale assertions, missing checklist items, unclear instructions), then update SKILL.md and/or runbooks
- Option B: Yes, but only fix the runbook — fix the specific runbook without changing the skill itself
- Option C: No, skip
Improvement targets:
- SKILL.md: add new checklist items, common-mistake examples, or rule clarifications learned from this run
- Runbooks: fix stale assertions (e.g. config.yaml → registry.yaml), wrong flags, outdated paths
- Both: when a systemic issue (e.g. a refactor changed file locations) affects both the skill's guidance and existing runbooks
Runbook Quality Checklist
Before executing a newly generated runbook, verify:
- All CLI flags exist — every
ss <cmd> --flagwas grep-verified against source -
--initinteraction — if runbook hasss init, account forssenv create --initalready initializing (add--forceto re-init, or skip init step) -
--initcreates default extras —ssenv create --initcreates arulesextra by default. Runbooks that assume an empty extras list must add cleanup first:ss extras remove rules --force -g 2>/dev/null || true+rm -rf ~/.claude/rules - Correct confirmation flags —
uninstalluses--force(not--yes);initre-run needs no flag (just fails gracefully) - Skill data in registry.yaml — assertions about installed skills check
registry.yaml, NOTconfig.yaml; config.yaml should never containskills: - File existence timing —
registry.yamlis only created after first install/reconcile, not onss init - Project mode paths — project commands use
.skillshare/not~/.config/skillshare/ - Project init flags —
init -ponly supports--targets,--discover,--select,--mode,--dry-run; global-only flags (--no-copy,--no-skill,--no-git,--all-targets,--force) are not available - Audit rule IDs — custom rules in
audit-rules.yamluse rule IDs (e.g.prompt-injection-0), not pattern names (e.g.prompt-injection). Verify IDs againstinternal/audit/rules.yaml - Use
--jsonfor assertions — if the command supports--json, use it withjqinstead of grepping human-readable output. Text output changes between versions; JSON structure is stable - Expected = actual substrings, NOT descriptions — the runbook assertion engine does case-insensitive substring matching. Write
- Installedor- cangjie-docs-navigator, NOT- Install completes without erroror- Output contains at least one skill. Negation: useNot <substring>prefix (e.g.- Not cangjie-docs-navigator) - Skill name ≠ repo name — after
ss install <repo>, the actual skill name may differ from the repo name (e.g. repocangjie-docs-mcp→ skillcangjie-docs-navigator). Always verify the installed skill name viass listbefore writing uninstall/check steps -
/tmp/cleanup — ssenv only isolates$HOME;/tmp/is shared across runs. Any step using/tmp/<path>must start withrm -rf /tmp/<path>to avoid stale state from previous runs -
echo > symlinkwrites through —echo "content" > pathwherepathis a symlink writes to the symlink's target, it does NOT replace the symlink with a real file. To create a local (non-managed) file at a symlinked path: either use a different filename, orrmthe symlink first thenecho -
cat >>is not idempotent — appending to config files (cat >> config.yaml) will duplicate sections on re-run. Preferss extras init(which validates duplicates) or full file replacement overcat >>when possible - Extras source path layout — extras use
~/.config/skillshare/extras/<name>/(not the legacy flat path~/.config/skillshare/<name>/). Symlink assertions must includeextras/in the path regex (e.g.regex: skillshare/extras/rules/tdd\.md) - Prefer
jq:overpython3 -c— for JSON output validation, use mdproof's nativejq:assertion type (e.g.- jq: .extras | length == 1) instead of piping topython3 -c. It's one line vs 10, and mdproof handles failure reporting automatically - Config append idempotency — when appending YAML sections with
cat >>, always prependsed -i '/^section_key:/,$d'to remove existing section. Or prefer CLI commands (ss extras init,ss extras remove --force) over manual config editing - Check lessons-learned — read
.mdproof/lessons-learned.mdbefore writing new runbooks for known gotchas and proven assertion patterns
Runbook Assertion Types
mdproof supports 6 assertion types under Expected: blocks. Use the most specific type for each check:
| Type | Syntax | When to use | Example |
|---|---|---|---|
| Substring | plain text | Simple output check | - hello world |
| Negated | Not/Should NOT prefix |
Verify absence | - Not FAIL |
| Exit code | exit_code: N |
Every step should have this | - exit_code: 0 |
| Regex | regex: prefix |
Pattern matching | - regex: v\d+\.\d+ |
| jq | jq: prefix |
JSON output (preferred) | - jq: .extras | length == 1 |
| Snapshot | snapshot: prefix |
Stable output comparison | - snapshot: api-response |
jq: best practices:
# Simple field check
- jq: .name == "rules"
# Array length
- jq: .extras | length == 3
# Sorted array comparison
- jq: [.extras[].name] | sort | . == ["a","b","c"]
# Null/missing field (omitempty)
- jq: .extras == null
# Nested access
- jq: .[0].targets[0].status == "synced"
# Boolean
- jq: .source_exists == true
Rules
- Always execute inside devcontainer — use
docker exec, never run CLI on host - Always use
ssenvfor HOME isolation — don't pollute container default HOME - Always create fresh ssenv environments — never reuse an environment from a previous run; stale config/state causes confusing cascade failures (e.g. duplicate YAML keys, "already exists" errors)
- ssenv only isolates
$HOME—/tmp/,/var/, and other system paths are shared across all environments. Runbook steps using/tmp/must includerm -rfcleanup at the start - Verify every step — never skip Expected checks
- Don't abort on failure — record FAIL, continue to next step, summarize at end
- Ask before cleanup — Phase 4 must prompt user before deleting ssenv environment
ss=skillshare— same binary in runbooks~= ssenv-isolated HOME —ssenv enterauto-setsHOME- Use
--init— simplify setup by usingssenv create <name> --init --initalready runs init — the env is pre-initialized; runbook steps callingss initagain will fail unless the step explicitly resets state first
ssenv Quick Reference
| Command | Purpose |
|---|---|
sshelp |
Show shortcuts and usage |
ssls |
List isolated environments |
ssnew <name> |
Create + enter isolated shell (interactive) |
ssuse <name> |
Enter existing isolated shell (interactive) |
ssback |
Leave isolated context |
ssenv enter <name> -- <cmd> |
Run single command in isolation (automation) |
- For interactive debugging:
ssnew <env>thenexitwhen done - For deterministic automation: prefer
ssenv enter <env> -- <command>one-liners
Test Command Policy
When running Go tests inside devcontainer (not via runbook):
# ssenv changes HOME, so always cd to /workspace first for Go test commands
cd /workspace
go build -o bin/skillshare ./cmd/skillshare
SKILLSHARE_TEST_BINARY="$PWD/bin/skillshare" go test ./tests/integration -count=1
go test ./...
Always run in devcontainer unless there is a documented exception.
Note: ssenv enter changes HOME, which may affect Go module resolution — always cd /workspace before running go test or go build.
--json Quick Reference
Most commands support --json for structured output, making assertions more reliable than text matching.
| Command | --json |
Notes |
|---|---|---|
ss status |
--json |
Skills, targets, sync status |
ss list |
--json / -j |
All skills with metadata |
ss target list |
--json |
Configured targets |
ss install <src> |
--json |
Implies --force --all (skip prompts) |
ss uninstall <name> |
--json |
Implies --force (skip prompts) |
ss collect <path> |
--json |
Implies --force (skip prompts) |
ss check |
--json |
Update availability per repo |
ss update |
--json |
Update results per skill |
ss diff |
--json |
Per-file diff details |
ss sync |
--json |
Sync stats per target |
ss audit |
--format json |
Also accepts --json (deprecated alias) |
ss log |
--json |
Raw JSONL (one object per line) |
Key behaviors:
--jsonthat implies--force/--allskips interactive prompts — safe for automation- Output goes to stdout only (progress/spinners suppressed)
auditprefers--format json;--jsonstill works but is the deprecated formlog --jsonoutputs JSONL (newline-delimited), not a JSON array
Assertion Patterns with jq
# Count installed skills
ss list --json | jq 'length'
# Check a specific skill exists
ss list --json | jq -e '.[] | select(.name == "my-skill")'
# Verify target is configured
ss target list --json | jq -e '.[] | select(.name == "claude")'
# Assert no critical audit findings
ss audit --format json | jq -e '.summary.critical == 0'
# Check update availability
ss check --json | jq -e '.tracked_repos | length > 0'
# Verify sync succeeded (zero errors)
ss sync --json | jq -e '.errors == 0'
# Install and verify result
ss install https://github.com/user/repo --json | jq -e '.skills | length > 0'
When a jq -e expression fails (exit code 1 = false, 5 = no output), the step FAILs — no ambiguous text matching needed.
Container Command Templates
# Single command
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- ss status
# JSON assertion (preferred for verification)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
ss list --json | jq -e ".[] | select(.name == \"my-skill\")"
'
# Multi-line compound command (use bash -c) — global mode flags
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
ss init --no-copy --all-targets --no-git --no-skill
ss status
'
# Project mode init (different flag set!)
docker exec $CONTAINER env SKILLSHARE_DEV_ALLOW_WORKSPACE_PROJECT=1 \
ssenv enter "$ENV_NAME" -- bash -c '
cd /tmp/test-project && ss init -p --targets claude
'
# Check files (HOME is set to isolated path by ssenv)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
cat ~/.config/skillshare/config.yaml
'
# With environment variables
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
TARGET=~/.claude/skills
ls -la "$TARGET"
'
# Go tests (must cd /workspace because ssenv changes HOME)
docker exec $CONTAINER ssenv enter "$ENV_NAME" -- bash -c '
cd /workspace
go test ./internal/install -run TestParseSource -count=1
'
Relationship with /mdproof Skill
This skill (/cli-e2e-test) and the /mdproof skill are complementary, not competing:
| Concern | /cli-e2e-test |
/mdproof |
|---|---|---|
| Scope | Skillshare project-specific E2E | General-purpose runbook authoring |
| Infrastructure | Devcontainer, ssenv, binary build | None — format and assertions only |
| Config | ai_docs/tests/runbook.json (build, setup, teardown) |
Assertion types, snapshot, coverage |
| Lessons | Checklist items, CLI flag gotchas | .mdproof/lessons-learned.md |
| When | Running or debugging a test | Writing or improving a runbook |
How they work together
- Writing a new runbook → invoke
/mdprooffirst for format guidance (assertion types,jq:patterns, snapshot usage), then/cli-e2e-testto execute it in isolation - Improving existing runbooks → invoke
/mdprooffor assertion quality review (python3 → jq:, idempotency), then/cli-e2e-testto verify changes pass - Debugging failures →
/cli-e2e-testPhase 3 step 4 handles manual docker exec;/mdprooflessons-learned captures recurring patterns - After a test run →
/mdproofSelf-Learning section guides recording discoveries to.mdproof/lessons-learned.md
Rule of thumb
- Need to run tests or debug in devcontainer? →
/cli-e2e-test - Need to write assertions or improve runbook quality? →
/mdproof - User says "run extras E2E" →
/cli-e2e-test - User says "improve runbook assertions" →
/mdproofthen/cli-e2e-testto verify
More from jetbrains/skills
spring-kotlin-code-review
Review Kotlin + Spring changes for behavioral regressions, transaction and proxy bugs, API and serialization mistakes, persistence risks, security issues, configuration drift, and missing tests. Use when reviewing a PR, diff, patch, or design change where generic style-focused review would miss Spring-specific correctness and operational risks.
4dependency-conflict-resolver
Diagnose and resolve Gradle and Spring classpath conflicts, version drift, and binary incompatibilities in Kotlin applications. Use when `NoSuchMethodError`, `ClassNotFoundException`, linkage errors, duplicate logging bindings, Jackson or Hibernate mismatches, or BOM-versus-explicit-version conflicts appear, and the fix must respect the repository's real version authorities.
3kotlin-spring-proxy-compatibility
Diagnose and prevent Kotlin plus Spring proxy failures around `@Transactional`, `@Cacheable`, `@Async`, method security, retry, configuration proxies, and JPA entity requirements. Use when AOP annotations appear to do nothing, transactional or cache behavior is inconsistent, compiler plugins may be missing, self-invocation is suspected, or Kotlin final-by-default semantics may break Spring behavior.
3error-model-validation-architect
Design and implement consistent API validation and error-handling behavior for Kotlin plus Spring services. Use when defining error payloads, mapping framework and domain exceptions, standardizing HTTP status codes, adding `@ControllerAdvice`, preventing internal-detail leakage, or ensuring clients can rely on stable machine-readable error semantics across endpoints.
3performance-concurrency-advisor
Analyze and improve performance, throughput, latency, and concurrency behavior in Kotlin plus Spring services using real evidence from metrics, traces, SQL, thread or heap signals, and code paths. Use when endpoints are slow, pools saturate, coroutines or reactive flows block unexpectedly, N+1 or contention appears, or caching and parallelism decisions need precise, non-generic guidance.
3pdf
Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.
3