debug

SKILL.md

Debug

Goals

  • Find why a run is stuck, retrying, or failing.
  • Correlate Linear issue identity to a Codex session quickly.
  • Read the right logs in the right order to isolate root cause.

Log Sources

  • Primary runtime log: log/symphony.log
    • Default comes from SymphonyElixir.LogFile (log/symphony.log).
    • Includes orchestrator, agent runner, and Codex app-server lifecycle logs.
  • Rotated runtime logs: log/symphony.log*
    • Check these when the relevant run is older.

Correlation Keys

  • issue_identifier: human ticket key (example: MT-625)
  • issue_id: Linear UUID (stable internal ID)
  • session_id: Codex thread-turn pair (<thread_id>-<turn_id>)

elixir/docs/logging.md requires these fields for issue/session lifecycle logs. Use them as your join keys during debugging.

Quick Triage (Stuck Run)

  1. Confirm scheduler/worker symptoms for the ticket.
  2. Find recent lines for the ticket (issue_identifier first).
  3. Extract session_id from matching lines.
  4. Trace that session_id across start, stream, completion/failure, and stall handling logs.
  5. Decide class of failure: timeout/stall, app-server startup failure, turn failure, or orchestrator retry loop.

Commands

# 1) Narrow by ticket key (fastest entry point)
rg -n "issue_identifier=MT-625" log/symphony.log*

# 2) If needed, narrow by Linear UUID
rg -n "issue_id=<linear-uuid>" log/symphony.log*

# 3) Pull session IDs seen for that ticket
rg -o "session_id=[^ ;]+" log/symphony.log* | sort -u

# 4) Trace one session end-to-end
rg -n "session_id=<thread>-<turn>" log/symphony.log*

# 5) Focus on stuck/retry signals
rg -n "Issue stalled|scheduling retry|turn_timeout|turn_failed|Codex session failed|Codex session ended with error" log/symphony.log*

Investigation Flow

  1. Locate the ticket slice:
    • Search by issue_identifier=<KEY>.
    • If noise is high, add issue_id=<UUID>.
  2. Establish timeline:
    • Identify first Codex session started ... session_id=....
    • Follow with Codex session completed, ended with error, or worker exit lines.
  3. Classify the problem:
    • Stall loop: Issue stalled ... restarting with backoff.
    • App-server startup: Codex session failed ....
    • Turn execution failure: turn_failed, turn_cancelled, turn_timeout, or ended with error.
    • Worker crash: Agent task exited ... reason=....
  4. Validate scope:
    • Check whether failures are isolated to one issue/session or repeating across multiple tickets.
  5. Capture evidence:
    • Save key log lines with timestamps, issue_identifier, issue_id, and session_id.
    • Record probable root cause and the exact failing stage.

Reading Codex Session Logs

In Symphony, Codex session diagnostics are emitted into log/symphony.log and keyed by session_id. Read them as a lifecycle:

  1. Codex session started ... session_id=...
  2. Session stream/lifecycle events for the same session_id
  3. Terminal event:
    • Codex session completed ..., or
    • Codex session ended with error ..., or
    • Issue stalled ... restarting with backoff

For one specific session investigation, keep the trace narrow:

  1. Capture one session_id for the ticket.
  2. Build a timestamped slice for only that session:
    • rg -n "session_id=<thread>-<turn>" log/symphony.log*
  3. Mark the exact failing stage:
    • Startup failure before stream events (Codex session failed ...).
    • Turn/runtime failure after stream events (turn_* / ended with error).
    • Stall recovery (Issue stalled ... restarting with backoff).
  4. Pair findings with issue_identifier and issue_id from nearby lines to confirm you are not mixing concurrent retries.

Always pair session findings with issue_identifier/issue_id to avoid mixing concurrent runs.

Notes

  • Prefer rg over grep for speed on large logs.
  • Check rotated logs (log/symphony.log*) before concluding data is missing.
  • If required context fields are missing in new log statements, align with elixir/docs/logging.md conventions.
Weekly Installs
76
GitHub Stars
39
First Seen
5 days ago
Installed on
codex75
gemini-cli7
claude-code7
github-copilot7
amp7
cline7