qa-tui
/qa-tui — Systematic TUI QA via shellwright
Drive a TUI program inside a shellwright-managed PTY, collect evidence (raw bytes + rendered screenshots + recordings), grade issues, and produce a report. Fixing is out of scope — this skill reports. If the user wants to fix bugs, hand the report to a session that has access to the TUI's source.
Why this exists
A TUI is a program that draws its UI with escape sequences on a PTY. That makes it harder to test than a web or desktop app for three reasons:
- There's no DOM. The "UI" is a stream of bytes interpreted by a terminal emulator. Two emulators can render the same bytes differently — or correctly bytes that your TUI emits incorrectly.
- Input is keyboard-only, mostly. Every action is a keystroke, and
many TUIs use modal state (vim-style) or chord keys (tmux prefix, Emacs
C-x C-c). The skill has to know how to drive these. - The terminal's state is shared. If the TUI crashes halfway through a raw-mode switch, it leaves the parent shell unable to echo characters correctly. Cleanup matters more than in a sandboxed GUI app.
Shellwright solves the driving problem by giving us a managed PTY with
shell_send (send keystrokes) + shell_read (read the byte stream) +
shell_screenshot (render the current terminal state to an image) +
shell_record_start/stop (capture a session replay). This skill codifies
the systematic path that catches the most common classes of TUI bugs.
Scope and non-goals
In scope: Black-box testing of a TUI program driven through a shellwright PTY on the current host — sending keystrokes, reading the rendered buffer, capturing screenshots, noting what breaks under resize / non-standard TERM / NO_COLOR / narrow widths.
Out of scope:
- Cross-terminal testing on a single host — the skill tests the emulator shellwright is running inside. If the user needs Kitty + iTerm2 + Apple Terminal coverage, run the skill in each (the taxonomy has a "terminal compat" category for what you find).
- Non-TUI CLI tools (
grep,curl,ls) — use/qapatterns for those, or spot-check manually. The skill assumes the program draws an interactive UI, not just prints output and exits. - Source-code reading or editing (delegate fixes elsewhere)
- Performance profiling (use
flamegraph,samply,perf— out of scope) - Security review of a TUI's config-parsing attack surface
Required context before starting
If any of these are unknown, use AskUserQuestion to collect them:
- Launch command — the exact command that starts the TUI, including any
args/flags. E.g.
nvim,lazygit,htop -d 10,k9s --context prod,./target/release/my-tui --config test.toml. - TUI kind — modal (vim-style) / non-modal / chord-based (tmux / Emacs) / dialog-style (one screen per choice). This affects how Phase 3 drives the app.
- Quit key — how do users exit?
:q,q,Ctrl+C,Esc,Ctrl+X Ctrl+C. Knowing this up front prevents leaving orphan processes. - Scope — "whole app" vs "only the file browser" vs "only the keybind set for ". A scoped run is cheaper.
- Terminal context —
echo $TERM,echo $COLORTERM, emulator name (iTerm2, Kitty, etc.), PTY dimensions (tput cols,tput lines). These go in the report. - Reset command — if the TUI is known to leave the terminal in a bad
state on crash (some do), note the user's recovery command — usually
stty saneorreset.
Do not guess. Launching the wrong command or not knowing the quit key causes the skill to get stuck or kill the wrong process.
Phase 0: Start PTY + baseline
Goal: clean PTY, TUI launched, baseline screenshot + byte log captured, recording started for the full session.
-
Record the pre-launch context (for the report's environment section):
echo "TERM=$TERM" echo "COLORTERM=$COLORTERM" tput cols; tput lines uname -srm -
Start the shellwright session:
mcp__shellwright__shell_startNote the session ID — all subsequent commands target it.
-
Start recording the full session (for the report's appendix):
mcp__shellwright__shell_record_startSave path goes in
/tmp/qa-tui-session/recording.cast(asciicast) or similar — whatever shellwright returns. Capture it. -
Launch the TUI by sending the launch command + Enter:
mcp__shellwright__shell_send({ keys: "<launch-command>\r" })Then wait ~1 second for the TUI to initialize its first frame. Some heavier TUIs (k9s, lazygit with large repos) take 2–5 seconds.
-
Baseline screenshot + byte read:
mcp__shellwright__shell_screenshot → /tmp/qa-tui-session/00-baseline.png mcp__shellwright__shell_read → /tmp/qa-tui-session/00-baseline.txtThe screenshot is what the user sees. The byte read is what the TUI emitted. Both are useful — keep both per surface.
-
Confirm the TUI is interactive: send a no-op key the TUI should swallow (e.g.,
hfor help in many TUIs) and verify the buffer changed with anothershell_read. If nothing changes, the launch may have failed silently — check the buffer for a shell prompt or error.
If the launch fails, STOP and report. Common causes: binary not in PATH, required env var missing, TUI requires a minimum terminal size the PTY doesn't meet (typical: 80×24).
Phase 1: Surface mapping
Goal: inventory every screen / pane / mode before testing any of them.
A TUI's surfaces:
- Main screen — the entrypoint view
- Modes — vim's Normal/Insert/Visual/Command; nvim's terminal mode; tmux's copy mode; lazygit's commit/rebase/stash views
- Panes / splits — if the TUI has a multi-pane layout (lazygit's 5 panes, k9s's namespace/pod panes), each pane is its own surface
- Popups / dialogs — help overlay, confirmation prompts, input boxes
- Modals — search prompt (
/in many TUIs), fuzzy picker (fzf-style)
Walk them by:
- Open the help screen if one exists (usually
?orhorF1). Screenshot it — this is the TUI's own documentation of what the user can do. It also doubles as the checklist for Phase 3 (exercise every keybind listed). - For each mode / pane listed, switch to it (sending the appropriate keys), screenshot, byte-read, and note any visible elements.
- Note the exit path from each — some popups dismiss with Esc, some with
q, some only with a specific confirmation. If a popup has no documented dismiss key, that's a finding.
Budget: 2–5 minutes. If the help screen lists >50 keybinds, ask the user which feature sets matter for this run.
Phase 2: Visual + byte scan
For each surface from Phase 1:
Visual scan
- Screenshot → eyeball for: flicker between screenshots taken back-to-back, garbled wide chars (CJK, emoji), cursor in the wrong spot relative to the last cursor-positioning escape, color leaks across pane boundaries, truncation without ellipsis
- Compare against
references/terminal-conventions.md— minimum 80×24 support? Graceful degradation below that? Consistent color semantics (red for error, green for success, yellow for warning)?
Byte scan
shell_read returns the raw stream the TUI emitted. Look for:
- Unhandled escape sequences leaking to the screen: literal
^[[31mwhere\x1b[31mshould have gone — indicates the TUI emitted an escape with a control-char that the emulator didn't recognize - Trailing / leading junk — mouse-reporting garbage (
^[[M,^[[<0;...) when the TUI doesn't support mouse but also doesn't disable mouse reporting - Flickering patterns — a full clear-and-redraw (
\x1b[2J) on every keystroke is usually a bug; incremental draws (\x1b[H+ deltas) are correct - TERM-specific assumptions — a program that hardcodes truecolor
escapes (
\x1b[38;2;r;g;bm) breaks in 256-color terminals
Phase 3: Keybind coverage
Enumerate every keybind from the help screen (or the TUI's man page /
README) and fire each:
mcp__shellwright__shell_send({ keys: "<key-sequence>" })
mcp__shellwright__shell_screenshot → <Nth>-<action>.png
mcp__shellwright__shell_read → check for error output
For each keybind, verify:
- The expected action happens (screen changes in the documented way)
- The screen returns to a clean state after (no residual overlays, cursor in a sensible spot)
- No unexpected escape garbage in the byte stream
- The keybind's reverse works — e.g., if
jmoves down,kmoves up the same distance
Special keys to always test:
- Ctrl+C — cancels current operation, or quits if at top level. Critical: confirm the terminal returns to a usable state after.
- Ctrl+Z — suspends (sends SIGTSTP). After
fg, does the TUI redraw cleanly? Many don't — they leak the pre-suspend screen. - Ctrl+L — redraw (many TUIs support this). After, is the screen pixel-identical to pre-Ctrl+L? If not, note which chars differ.
- Ctrl+D at an empty input — EOF. Some TUIs exit; some ignore. Document the observed behavior.
- Esc — exit mode / dismiss popup. Verify double-Esc is safe.
- Tab, Shift+Tab — focus cycling in multi-pane TUIs.
- Arrow keys + Home/End/PgUp/PgDn — cursor navigation.
- The documented quit key — must return cleanly to a working shell
prompt with line discipline intact (type
echo hiand confirm it echoes normally).
Key-sequence syntax
Shellwright's shell_send takes a keys string. Typical conventions:
- Literal chars:
"abc" - Enter:
"\r"(or sometimes"\n"— check shellwright behavior on first use) - Control keys:
"\u0003"for Ctrl+C,"\u001b"for Esc,"\u0009"for Tab - Escape sequences (arrow keys):
"\u001b[A"Up,"\u001b[B"Down,"\u001b[C"Right,"\u001b[D"Left
See references/shellwright-tui-reference.md for the full key-code table
and quirk list.
Phase 4: Terminal states
Force the conditions the terminal emulator / shell controls:
| State | How to force | What to check |
|---|---|---|
| Narrow width | Resize PTY to 60 cols (if shellwright supports) or launch with COLUMNS=60 stty cols 60 in the session |
Layout adapts, no overflow, minimum-size message if below floor |
| Tall / wide | Start a large PTY (120×40) | No wasted gaps, tables expand to fill |
| Resize mid-session | Change PTY size after launch | TUI handles SIGWINCH — redraws cleanly, no lingering ghost content |
TERM=dumb |
TERM=dumb <launch-command> |
Program degrades gracefully or prints a readable error — doesn't crash |
TERM=xterm vs xterm-256color |
Set TERM to each, relaunch | Colors adapt; 16-color fallback works if 256 isn't available |
NO_COLOR=1 |
NO_COLOR=1 <launch-command> |
All color disabled, output still legible (spec: https://no-color.org/) |
LANG=C / ASCII-only |
LANG=C <launch-command> |
Box-drawing chars fall back to ASCII, no mojibake |
| Piped stdin | echo foo | <launch-command> |
TUI refuses politely ("not a TTY") or adapts — doesn't segfault |
| Background / SIGSTOP | Send Ctrl+Z, then fg |
Full redraw on resume |
| Long idle | Wait 5 minutes, then send any key | No memory growth, no reconnection delay (for network TUIs), no screen corruption |
At minimum: narrow width + resize mid-session + NO_COLOR + TERM=dumb.
These catch the vast majority of terminal-compat bugs.
Phase 5: Accessibility
- Screen reader compatibility — macOS VoiceOver follows the system cursor; does the TUI keep a meaningful cursor position that VoiceOver can announce? (TUIs that repaint the whole screen often lose the cursor in the upper-left.)
- High contrast — on some terminal emulators, "bold" renders brighter rather than heavier. Does the TUI still distinguish elements when bold is interpreted as color shift?
- NO_COLOR — already covered in Phase 4; if colors were load-bearing (e.g., red = error, green = success) and NO_COLOR disabled them, is there a fallback (icons, text labels)?
- Slow terminals / remote SSH — TUIs that full-redraw on every
keystroke are unusable on high-latency links. Send a sequence of 20
keys rapidly with
shell_send, thenshell_read. Is the output coherent or does it show redraw artifacts?
See references/issue-taxonomy-tui.md for the full category list.
Phase 6: Crash + cleanup behavior
Deliberately stress the exit paths:
- Graceful quit — the documented quit key. Does it return to a clean
shell prompt? Type
echo okat the prompt and confirm it echoes normally. If characters don't echo, raw mode was left on — critical. - SIGINT during work — send Ctrl+C while the TUI is in an active loop (mid-animation, mid-search). Does it quit cleanly? Leave raw mode on? Print a partial output?
- SIGTERM — from outside:
kill <pid>. Same questions. - Force kill —
kill -9 <pid>. The TUI can't catch SIGKILL, so this simulates a crash. Afterward, usestty -aor try typing into the shell. If line discipline is broken, the program should have usedatexitor equivalent to restore — and didn't. Note as critical.
After each, if the terminal is in a bad state, restore with stty sane or
reset and note what was needed. (The user may need to do the same after
a real crash.)
Phase 7: Triage
Take every issue collected across phases and classify:
- severity: critical / high / medium / low (definitions in
references/issue-taxonomy-tui.md) - category: one of
rendering/keybind/terminal-compat/resize/signal-exit/state-config/unicode/performance/accessibility/crash - repro: minimum key sequence to reproduce on a fresh launch, starting from the shell prompt (include launch command + keys)
- evidence: screenshot + byte snippet + recording timestamp if any
De-duplicate aggressively. If the same rendering glitch appears in every pane because a shared component misrenders wide chars, that's one issue with a list of affected panes — not five.
Phase 8: Report
Use templates/qa-report-template-tui.md as the skeleton. Fill in:
- TUI metadata (name, version if
<cmd> --versionworks, language/runtime if known) - Terminal environment (emulator,
$TERM,$COLORTERM, PTY size) - Health score per category (see the template)
- Top 3 things to fix, with issue IDs linking below
- Full issue list, severity-grouped
Save the report to ./qa-reports/tui-<date>-<app>.md relative to the
user's current directory. Save all screenshots, byte dumps, and the
asciicast recording to ./qa-reports/tui-<date>-<app>/.
Don't embed full byte dumps in the report — they can contain long escape sequences and blow up the file. Save them alongside and link. Only embed the specific snippet relevant to an issue.
Phase 9: Session cleanup
Always, even if the report isn't finished:
-
Stop recording:
mcp__shellwright__shell_record_stop -
Quit the TUI via its documented quit key (safer than killing the session from outside):
mcp__shellwright__shell_send({ keys: "<quit-key>" }) -
If the TUI didn't quit cleanly, send Ctrl+C, then
exitto the shell:mcp__shellwright__shell_send({ keys: "\u0003" }) mcp__shellwright__shell_send({ keys: "exit\r" }) -
Stop the shellwright session:
mcp__shellwright__shell_stop -
If the QA run changed terminal state (NO_COLOR, custom TERM, narrow width), those only lived inside the PTY — no host cleanup needed. But if you touched real env for some reason, unset here.
Deliverables checklist
Before telling the user "done":
-
qa-reports/tui-<date>-<app>.mdexists and opens cleanly - At least one screenshot per issue (not one of the whole session)
- Each issue has severity, category, key-sequence repro
- Health score table is filled (no
{SCORE}placeholders left) - "Top 3 Things to Fix" has actual issue titles
- Asciicast recording saved alongside the report
- Terminal emulator +
$TERM+ PTY size recorded (defines coverage) - Cleanup phase ran (shellwright session stopped)
Reference files
references/issue-taxonomy-tui.md— severity + category definitionsreferences/shellwright-tui-reference.md— shellwright MCP tool cheat sheet + key-code tablereferences/terminal-conventions.md— what terminals / TUIs conventionally do, for comparison during Phase 2 + Phase 4templates/qa-report-template-tui.md— fill-in report skeleton
Load the references on demand (they're not in context by default). When you hit a category question during triage, read the taxonomy. When you need a key-code you don't remember, read the shellwright reference.
Escape hatches
- Shellwright session won't start: check the shellwright MCP server is running and healthy. If it's not, ask the user to restart it.
- TUI launches but first screenshot is blank: it may be slow to draw.
Wait 2–3 more seconds and retry. If still blank,
shell_readto see what's actually in the buffer — maybe it's drawing with cursor positioning into a single row and the image capture missed it. - TUI exits immediately after launch: it may be non-interactive with
this stdin (piped). Check by running manually. If the TUI needs a real
TTY that shellwright's PTY should already provide, check shellwright's
TERM env — some TUIs refuse
dumb. - Terminal is in a bad state after a crash: send
\u0003(Ctrl+C) thenstty sane\rthenreset\rto the session. If the session is itself wedged,shell_stop+shell_startagain. shell_sendkeystrokes don't seem to register: some TUIs poll stdin slowly. Add a small wait after each send before the nextshell_read/shell_screenshot. Find the minimum that works, note it in the report.- Character encoding garbled in byte dump: confirm the PTY is UTF-8
(
localeshould showLANG=en_US.UTF-8or similar). If not, the issue may be the PTY's encoding, not the TUI. - Screen reader testing isn't feasible from shellwright alone: flag in the "Coverage notes" section that screen-reader compat requires a human-driven test on the target emulator.