debug-pipeline
/debug-pipeline Skill
Diagnose OpenShift Pipelines (Tekton) CI/CD failures by automatically gathering PipelineRun status, failed TaskRun details, step container logs, and related resources.
Prerequisites
Before running this skill:
- User is logged into OpenShift cluster
- User has access to the target namespace
- OpenShift Pipelines operator is installed on the cluster
- PipelineRun name is known (or can be identified from recent runs)
Tekton CRD Access via MCP
Tekton resources are standard Kubernetes CRDs. Use the generic MCP tools with these parameters:
| Resource | kind | apiVersion |
|---|---|---|
| PipelineRun | PipelineRun |
tekton.dev/v1 |
| TaskRun | TaskRun |
tekton.dev/v1 |
| Pipeline | Pipeline |
tekton.dev/v1 |
| Task | Task |
tekton.dev/v1 |
| ClusterTask | ClusterTask |
tekton.dev/v1beta1 |
| EventListener | EventListener |
triggers.tekton.dev/v1beta1 |
| TriggerTemplate | TriggerTemplate |
triggers.tekton.dev/v1beta1 |
| TriggerBinding | TriggerBinding |
triggers.tekton.dev/v1beta1 |
When to Use This Skill
Use this skill when OpenShift Pipelines (Tekton) fail, hang, or produce unexpected results. It diagnoses PipelineRun failures, TaskRun step errors, workspace/PVC binding issues, and authentication problems by analyzing run status, step container logs, and related resources.
Critical: Human-in-the-Loop Requirements
See Human-in-the-Loop Requirements for mandatory checkpoint behavior.
Workflow
Step 1: Identify Target PipelineRun
## Pipeline Debugging
**Current OpenShift Context:**
- Cluster: [cluster]
- Namespace: [namespace]
Which PipelineRun would you like me to debug?
1. **Specify PipelineRun name** - Enter the PipelineRun name directly
2. **List failed PipelineRuns** - Show recent failed PipelineRuns in current namespace
3. **From Pipeline** - Debug latest run of a specific Pipeline
Select an option or enter a PipelineRun name:
WAIT for user confirmation before proceeding.
If user selects "List failed PipelineRuns":
Use kubernetes MCP resources_list with kind PipelineRun, filter by Failed status:
## Recent Failed PipelineRuns in [namespace]
| PipelineRun | Pipeline | Status | Started | Duration |
|-------------|----------|--------|---------|----------|
| [run-name] | [pipeline] | Failed | [timestamp] | [duration] |
Which PipelineRun would you like me to debug?
WAIT for user to select a PipelineRun.
Step 2: Get PipelineRun Status Overview
Use kubernetes MCP resources_get for the PipelineRun:
## PipelineRun Status: [pipelinerun-name]
**PipelineRun Info:**
| Field | Value |
|-------|-------|
| Pipeline | [pipeline-name] |
| Status | [Succeeded/Failed/Running/Cancelled] |
| Started | [timestamp] |
| Completed | [timestamp or "Still running"] |
| Duration | [duration] |
**Parameters:**
| Name | Value |
|------|-------|
| [param-name] | [param-value] |
**TaskRun Status:**
| Task | TaskRun | Status | Duration |
|------|---------|--------|----------|
| [task-1] | [taskrun-1] | Succeeded | [duration] |
| [task-2] | [taskrun-2] | **Failed** | [duration] |
| [task-3] | [taskrun-3] | Skipped | - |
**Quick Assessment:**
[Based on status conditions - e.g., "PipelineRun failed because TaskRun 'build' failed at step 'build-push'"]
Continue with failed TaskRun analysis? (yes/no)
WAIT for user confirmation before proceeding.
Step 3: Analyze Failed TaskRun(s)
Use kubernetes MCP resources_get for each failed TaskRun:
## Failed TaskRun: [taskrun-name]
**TaskRun Info:**
| Field | Value |
|-------|-------|
| Task | [task-name] |
| Pod | [taskrun-name]-pod |
| Status | [Failed] |
| Reason | [reason from conditions] |
**Step Status:**
| Step | Container | Status | Exit Code | Reason |
|------|-----------|--------|-----------|--------|
| [step-1] | step-[step-1] | Completed | 0 | - |
| [step-2] | step-[step-2] | **Terminated** | [code] | [reason] |
| [step-3] | step-[step-3] | - | - | Skipped |
**Workspace Bindings:**
| Workspace | Type | Resource | Status |
|-----------|------|----------|--------|
| [shared-workspace] | PVC | [pvc-name] | [Bound/Pending] |
| [output] | EmptyDir | - | OK |
**Issues Found:**
- [Issue 1 - e.g., "Step 'build-push' failed with exit code 1"]
Continue to view step logs? (yes/no)
Note: Tekton names step containers as step-<step-name> in the TaskRun pod. Use this convention with pod_logs.
WAIT for user confirmation before proceeding.
Step 4: Get TaskRun Pod Logs
Use kubernetes MCP pod_logs for the TaskRun pod, targeting the failed step container (step-<step-name>):
## Step Logs: [step-name] (Pod: [taskrun-name]-pod)
**Failed Step Container:** `step-[step-name]`
[log output from the failed step container]
**Log Analysis:**
**Errors Found:**
- Line [X]: [error description]
Continue to check related resources? (yes/no)
WAIT for user confirmation before proceeding.
Step 5: Check Related Resources
Check resources that could cause pipeline failures:
## Related Resources Analysis
**ServiceAccount:**
| Field | Value | Status |
|-------|-------|--------|
| Name | [sa-name] | [OK] |
| Image Pull Secrets | [secrets] | [OK/MISSING] |
| Linked Secrets | [secrets] | [OK/MISSING] |
**Workspaces/PVCs:**
| PVC | Status | Access Mode | Storage |
|-----|--------|-------------|---------|
| [pvc-name] | [Bound/Pending] | [RWO/RWX] | [size] |
**Secrets:**
| Secret | Type | Referenced By | Status |
|--------|------|---------------|--------|
| [git-creds] | kubernetes.io/basic-auth | git-clone task | [OK/MISSING] |
| [registry-creds] | kubernetes.io/dockerconfigjson | push task | [OK/MISSING] |
**Pipeline/Task Definitions:**
| Resource | Exists | Issues |
|----------|--------|--------|
| Pipeline [name] | [Yes/No] | [none / param mismatch] |
| Task [name] | [Yes/No] | [none / not found] |
[If triggered by EventListener:]
**EventListener:**
| Field | Value | Status |
|-------|-------|--------|
| Name | [el-name] | [Running/NotRunning] |
| TriggerTemplate | [tt-name] | [OK/MISSING] |
| TriggerBinding | [tb-name] | [OK/MISSING] |
**Issues Found:**
- [Issue 1]
Continue to full diagnosis summary? (yes/no)
WAIT for user confirmation before proceeding.
Step 6: Present Diagnosis Summary
## Diagnosis Summary: [pipelinerun-name]
### Root Cause
**Primary Issue:** [Categorized root cause]
| Category | Status | Details |
|----------|--------|---------|
| Pipeline Definition | [OK/FAIL] | [details] |
| TaskRun Execution | [OK/FAIL] | [details] |
| Step Container | [OK/FAIL] | [details] |
| Workspace/PVC | [OK/FAIL] | [details] |
| Authentication | [OK/FAIL] | [details] |
| Resources/Quota | [OK/FAIL] | [details] |
### Detailed Findings
**[Category: e.g., Authentication]**
- Problem: [specific problem]
- Evidence: [from logs/events]
- Impact: [effect on pipeline]
### Recommended Actions
1. **[Action 1]** - [description]
```bash
[command to fix]
- [Action 2] - [description]
[command to fix]
Retry PipelineRun
After fixing the issue:
# Rerun using the same PipelineRun spec
oc create -f <(oc get pipelinerun [name] -n [namespace] -o json | jq 'del(.metadata.resourceVersion, .metadata.uid, .metadata.creationTimestamp, .status) | .metadata.name = .metadata.name + "-retry"') -n [namespace]
# Or using tkn CLI (if available)
tkn pipeline start [pipeline-name] --use-pipelinerun [pipelinerun-name] -n [namespace]
Would you like me to:
- Execute one of the recommended fixes
- Retry the PipelineRun
- Debug the TaskRun pod directly (/debug-pod)
- View Pipeline or Task definition
- Exit debugging
Select an option:
**WAIT for user to select next action.**
## Pipeline Failure Reference
For failure categories, error patterns, and troubleshooting decision trees, see [docs/debugging-patterns.md](../../docs/debugging-patterns.md) (sections: Pipeline/Tekton Failure Patterns, Common Tekton Error Messages).
## Dependencies
### Required MCP Servers
- `openshift` - Kubernetes/OpenShift resource access for PipelineRuns, TaskRuns, and Tekton CRDs
### Related Skills
- `/debug-pod` - To debug TaskRun pods directly
- `/debug-build` - If the pipeline uses OpenShift Build tasks
- `/debug-network` - If pipeline tasks fail due to network issues
- `/validate-environment` - To verify OpenShift and pipeline operator setup
### Reference Documentation
- [docs/debugging-patterns.md](../../docs/debugging-patterns.md) - Common error patterns and pipeline troubleshooting trees
- [docs/prerequisites.md](../../docs/prerequisites.md) - Required tools (oc), cluster access verification