Fix JSONL Surrogates

Repairs Claude Code JSONL chat files that contain lone Unicode surrogates (U+D800-U+DFFF), which cause the Anthropic API to reject requests with "invalid high surrogate in string".

Background

Node.js internally uses UTF-16 strings. JSON.stringify() doesn't validate surrogate pairs, so lone surrogates from file reads, terminal output, or web content can end up serialized as \uD8xx without a matching low surrogate — producing invalid JSON that the API rejects.

Common triggers on Windows:

Emoji in code comments or terminal output
Box-drawing characters from build logs
Web content with malformed UTF-16 encoding

Upstream issue: anthropics/claude-code#44230

How to Use

Quick diagnosis

If the user has a specific session or request ID, find the JSONL file:

grep -rl "req_XXXXX" ~/.claude/projects/*/

Run the repair script

The bundled Python script handles scanning and fixing. It works on individual files, session directories, or entire project trees.

# Scan a specific JSONL file + its session directory (subagents, tool-results)
python {SKILL_DIR}/scripts/fix_surrogates.py <path-to-file.jsonl>

# Dry run first to see what would change
python {SKILL_DIR}/scripts/fix_surrogates.py <path-to-file.jsonl> --dry-run --verbose

# Scan all sessions in a project directory
python {SKILL_DIR}/scripts/fix_surrogates.py ~/.claude/projects/<project-dir>/

# Scan everything recursively (includes subagent files)
python {SKILL_DIR}/scripts/fix_surrogates.py ~/.claude/projects/<project-dir>/ --recursive

The script:

Reads each file with errors='surrogateescape' to preserve bad characters for detection
Parses JSON and recursively checks all string values for surrogate code points
Replaces lone surrogates with U+FFFD (replacement character) via str.encode('utf-8', 'surrogatepass').decode('utf-8', 'replace')
Backs up originals as .bak before modifying
Reports a summary of what was found and fixed

Important notes

Surrogates are often transient: They may exist only in Node.js memory during request construction, not in the stored JSONL. If the script finds nothing, the error was transient and retrying the session should work.
Check all session artifacts: The script automatically scans the main JSONL plus any subagent files and tool-result cache in the session directory.
Safe to re-run: The script is idempotent. Running it on already-clean files is a no-op.

Manual investigation

If the script finds nothing but the error persists, the surrogate may come from content loaded at request time (CLAUDE.md, memory files, etc.). Scan those too:

python {SKILL_DIR}/scripts/fix_surrogates.py ~/.claude/ --recursive --dry-run

What `{SKILL_DIR}` means

Replace {SKILL_DIR} with the actual path to this skill directory. When Claude invokes this skill, it knows the skill's location and can substitute the correct path.

fix-jsonl-surrogates

Fix JSONL Surrogates

Background

How to Use

Quick diagnosis

Run the repair script

Important notes

Manual investigation

What `{SKILL_DIR}` means

More from tebjan/agent-skills

caveman-smart

swarm-advisor

gsd-orchestrator

install-github-plugin

addy-orchestrator

ralph-orchestrator

fix-jsonl-surrogates

Fix JSONL Surrogates

Background

How to Use

Quick diagnosis

Run the repair script

Important notes

Manual investigation

What {SKILL_DIR} means

More from tebjan/agent-skills

caveman-smart

swarm-advisor

gsd-orchestrator

install-github-plugin

addy-orchestrator

ralph-orchestrator

What `{SKILL_DIR}` means