await-polygraph-ci
This skill contains shell command directives (!`command`) that may execute system commands. Review carefully before installing.
name: await-polygraph-ci description: Wait for CI to settle across all repos in a Polygraph session, then report results and investigate failures. USE WHEN user says "await polygraph", "wait for polygraph ci", "polygraph ci status", "check polygraph ci", "watch polygraph session", "monitor polygraph".
Await Polygraph CI
Wait for all CI pipelines in a Polygraph session to reach a stable state (succeeded, failed, etc.), then produce a unified summary. If any pipelines failed, investigate via child agents and present fix options.
Phase 1: Session Discovery
- Get the current branch name: !
git branch --show-current - Use the branch name as the session ID. If on
main,master, ordev, ask the user for an explicit session ID. - Fetch session:
cloud_polygraph_get_session(sessionId: <session-id>) - Record
monitorStartedAt= current timestamp (epoch millis). - Build a tracking table of all repos with PRs. For each PR, record:
repo: repository nameprUrl: PR URLprStatus: DRAFT / OPEN / MERGED / CLOSEDciStatus: from session (may already be a terminal status from a previous run)cipeUrl: CI pipeline URL (null if none)cipeCompletedAt:completedAtfrom session (epoch millis, null if CIPE is active or absent)selfHealingStatus: self-healing fix status (null if none)firstSeenAt: current timestamp
- If no PRs found, report "No PRs in session" and exit.
- Stale detection: For each PR, determine if its CI status is stale — meaning it reflects a previous run, not a current one. A PR's CI status is stale if:
cipeCompletedAtis non-null ANDcipeCompletedAt < monitorStartedAt(the CIPE finished before the monitor started)- Mark these PRs as
stale: true
- Display the initial status table, annotating stale PRs:
backend: SUCCEEDED (stale) | frontend: SUCCEEDED (stale) | shared-lib: NOT_STARTED
Phase 2: Polling Loop
Configuration:
- Timeout: 30 minutes total
- Backoff: 60s → 90s → 120s (cap)
- Circuit breaker: exit after 5 consecutive polls with no status change
Each poll iteration:
- Call
cloud_polygraph_get_session(sessionId: <session-id>) - Update each tracked PR from the session response:
ciStatus,cipeUrl,cipeCompletedAt, andselfHealingStatus - Clear stale flag: If a PR was marked
stale: trueand itscipeCompletedAthas changed (or become null, meaning a new CIPE is active), clear the stale flag — this PR now has fresh CI data. - Display status update:
Include[await-polygraph-ci] Poll #N | Elapsed: Xm | Repos: Y total, Z completed backend: SUCCEEDED | frontend: FAILED (self-healing: PENDING) | shared-lib: SUCCEEDED (stale)selfHealingStatusinline when non-null. Annotate stale PRs. - Check exclusion rule: if a PR has
prStatus: DRAFTandciStatus: NOT_STARTEDfor more than 5 minutes sincefirstSeenAt, mark it asEXCLUDED(DRAFT PRs may not trigger CI) - Check terminal conditions — a PR is terminal when:
- It is NOT stale, AND:
- CI status is
SUCCEEDED,CANCELED, orTIMED_OUT, OR - CI status is
FAILEDAND there is no active self-healing (i.e.,selfHealingStatusis null or a final state likeAPPLIED,REJECTED,FAILED)
- CI status is
- A
FAILEDPR withselfHealingStatusindicating an in-progress fix (e.g.,PENDING,IN_PROGRESS) is NOT terminal — keep polling to track the self-healing outcome - A stale PR is NOT terminal — keep polling until it gets a fresh CIPE or is excluded
- It is NOT stale, AND:
- Stale timeout: If a stale PR remains stale for more than 5 minutes, assume no new CI is expected for it. Clear the stale flag and treat its current status as final.
- If all non-excluded PRs are terminal → proceed to Phase 3
- If timeout or circuit breaker hit → proceed to Phase 3 with partial results
- Otherwise → wait with backoff, then poll again
Phase 3: Results Analysis
Categorize repos into: succeeded, failed, canceled, timed_out, excluded, in_progress (if timed out).
Display final summary table. When showing self-healing status, distinguish clearly between these states:
COMPLETED= a fix was generated and verified, but NOT yet applied. Display asfix available.APPLIED= the fix was applied by the user or agent. Display asfix applied, awaiting re-run.IN_PROGRESS/PENDING= the fix is still being generated. Display asin progress.REJECTED= the fix was rejected. Display asfix rejected.FAILED= self-healing failed to produce a fix. Display asfix failed.
[await-polygraph-ci] Final Results | Elapsed: Xm
SUCCEEDED: backend, shared-lib
FAILED: frontend (self-healing: fix available)
EXCLUDED: docs (DRAFT, no CI)
Include self-healing status for any repo that has one.
- If all succeeded → report success and exit
- If any failed with
selfHealingStatus: APPLIED, inform the user that the fix was applied and a CI re-run may be in progress or needed - If any failed with
selfHealingStatus: COMPLETED, inform the user that a fix is available but not yet applied, and offer to apply it - If any failed → proceed to Phase 4
Phase 4: Failure Investigation (Child Agent Delegation)
For each repo with ciStatus: FAILED:
-
Display known info from the session data before delegating:
Repository: frontend CI Pipeline: <cipeUrl from session> Self-healing: <selfHealingStatus from session, or "None"> Investigating failure details... -
Delegate investigation (non-blocking) — call
cloud_polygraph_delegatefor each failed repo:sessionId: the session IDtarget: the repository nameinstruction: Use theci_informationMCP tool to investigate the CI failure on this branch. Return a structured summary with: (1) list of failed task IDs with a one-line error summary each, (2) failure category (Build / Test / Lint / E2E / Infra / Other).context: Polygraph session monitoring — investigating CI failure for unified summary.
Since
cloud_polygraph_delegateis non-blocking, you can delegate to multiple failed repos in parallel. -
Monitor investigation progress — poll
cloud_polygraph_child_statusto wait for each child agent to complete:cloud_polygraph_child_status(sessionId: "<session-id>", target: "frontend")Poll until the child agent's status indicates completion. Use the
tailparameter to retrieve recent output lines containing the investigation results. -
Collect each child agent's response from the status output. If a child agent fails or gets stuck, use
cloud_polygraph_stop_childto terminate it and skip that repo. -
Display failure summary for each repo:
Repository: frontend CI Pipeline: <cipeUrl> Failed Tasks (2): - frontend:build → TypeScript error in src/app.tsx:42 - frontend:test → 3 test suites failed Category: Build + Test failures Self-healing: <selfHealingStatus>
Phase 5: Fix Planning
- Group failures by category (Build, Test, Lint, E2E, Infra)
- Identify cross-repo dependency issues (e.g., shared-lib build failure blocking frontend)
- Suggest fix order based on dependency graph (upstream repos first)
- Present next actions to the user based on self-healing status:
- If any repo has
selfHealingStatuswith an available fix → offer to apply self-healing viaupdate_self_healing_fix(action: "APPLY")or reject it - If self-healing was already applied → offer to resume monitoring to watch the re-triggered CI
- Delegate fixes: use Polygraph to send fix instructions to child agents (for repos without self-healing or where self-healing was rejected/failed)
- Get more details: drill into a specific repo's failure
- Exit: done monitoring
- If any repo has
Notes
- This skill does NOT push code directly. The only write action it may take is applying/rejecting a self-healing fix via
update_self_healing_fix, which is an Nx Cloud operation (not a local code change). - All heavy CI data inspection happens in child agents via
cloud_polygraph_delegateto keep this context window clean. cloud_polygraph_delegateis non-blocking — it starts the child agent and returns immediately. Usecloud_polygraph_child_statusto poll for results andcloud_polygraph_stop_childto terminate stuck agents.- The
cloud_polygraph_get_sessionresponse is compact and safe to poll from the main agent.