debug
Debug
Goals
- Find why a run is stuck, retrying, or failing.
- Correlate Linear issue identity to a Codex session quickly.
- Read the right logs in the right order to isolate root cause.
Log Sources
- Primary runtime log:
log/symphony.log- Default comes from
SymphonyElixir.LogFile(log/symphony.log). - Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
- Default comes from
- Rotated runtime logs:
log/symphony.log*- Check these when the relevant run is older.
Correlation Keys
issue_identifier: human ticket key (example:MT-625)issue_id: Linear UUID (stable internal ID)session_id: Codex thread-turn pair (<thread_id>-<turn_id>)
elixir/docs/logging.md requires these fields for issue/session lifecycle logs. Use
them as your join keys during debugging.
Quick Triage (Stuck Run)
- Confirm scheduler/worker symptoms for the ticket.
- Find recent lines for the ticket (
issue_identifierfirst). - Extract
session_idfrom matching lines. - Trace that
session_idacross start, stream, completion/failure, and stall handling logs. - Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.
Commands
# 1) Narrow by ticket key (fastest entry point)
rg -n "issue_identifier=MT-625" log/symphony.log*
# 2) If needed, narrow by Linear UUID
rg -n "issue_id=<linear-uuid>" log/symphony.log*
# 3) Pull session IDs seen for that ticket
rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u
# 4) Trace one session end-to-end
rg -n "session_id=<thread>-<turn>" log/symphony.log*
# 5) Focus on stuck/retry signals
rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*
Investigation Flow
- Locate the ticket slice:
- Search by
issue_identifier=<KEY>. - If noise is high, add
issue_id=<UUID>.
- Search by
- Establish timeline:
- Identify first
Codex session started ... session_id=.... - Follow with
Codex session completed,ended with error, or worker exit lines.
- Identify first
- Classify the problem:
- Stall loop:
Issue stalled ... restarting with backoff. - App-server startup:
Codex session failed .... - Turn execution failure:
turn_failed,turn_cancelled,turn_timeout, orended with error. - Worker crash:
Agent task exited ... reason=....
- Stall loop:
- Validate scope:
- Check whether failures are isolated to one issue/session or repeating across multiple tickets.
- Capture evidence:
- Save key log lines with timestamps,
issue_identifier,issue_id, andsession_id. - Record probable root cause and the exact failing stage.
- Save key log lines with timestamps,
Reading Codex Session Logs
In Symphony, Codex session diagnostics are emitted into log/symphony.log and
keyed by session_id. Read them as a lifecycle:
Codex session started ... session_id=...- Session stream/lifecycle events for the same
session_id - Terminal event:
Codex session completed ..., orCodex session ended with error ..., orIssue stalled ... restarting with backoff
For one specific session investigation, keep the trace narrow:
- Capture one
session_idfor the ticket. - Build a timestamped slice for only that session:
rg -n "session_id=<thread>-<turn>" log/symphony.log*
- Mark the exact failing stage:
- Startup failure before stream events (
Codex session failed ...). - Turn/runtime failure after stream events (
turn_*/ended with error). - Stall recovery (
Issue stalled ... restarting with backoff).
- Startup failure before stream events (
- Pair findings with
issue_identifierandissue_idfrom nearby lines to confirm you are not mixing concurrent retries.
Always pair session findings with issue_identifier/issue_id to avoid mixing
concurrent runs.
Notes
- Prefer
rgovergrepfor speed on large logs. - Check rotated logs (
log/symphony.log*) before concluding data is missing. - If required context fields are missing in new log statements, align with
elixir/docs/logging.mdconventions.
More from odysseus0/symphony
symphony-setup
Set up Symphony (OpenAI's Codex orchestrator) for a user's repo. Use when the user mentions Symphony setup, configuring Symphony, getting Symphony running, or wants to connect their repo to Linear for autonomous Codex agents. Also use when the user says "set up symphony", "configure symphony for my repo", or references WORKFLOW.md configuration.
196commit
Create a well-formed git commit from current changes using session history for
182push
Push current branch changes to origin and create or update the corresponding
182land
Land a PR by monitoring conflicts, resolving them, waiting for checks, and
180pull
Pull latest origin/main into the current local branch and resolve merge
179linear
|
174