ci-cd-integration by petrkindlmann/qa-skills

Discovery Questions

Which CI platform? GitHub Actions, GitLab CI, CircleCI, Jenkins? This skill focuses on GitHub Actions and GitLab CI.
What test types need to run? Unit, integration, E2E, visual regression, performance? Each has different resource and timing needs.
What is the current CI duration? If over 10 minutes, parallelism and sharding are essential.
How many developers push per day? High-frequency teams need aggressive concurrency controls and caching.
What triggers should run which tests? Not every push needs a full E2E suite.
Check .agents/qa-project-context.md first. Respect existing CI conventions and infrastructure constraints.

Core Principles

Fast feedback: right tests at the right time. Unit tests on every push (under 2 min). E2E on PRs (under 10 min). Full suite on merge and nightly.
Parallel first: shard tests across workers. A 20-minute serial suite becomes 5 minutes across 4 shards. Always worth the runner cost.
Artifacts are evidence. Every CI run must store traces, screenshots, coverage reports, and HTML reports. Without artifacts, CI failures are undebuggable.
Flaky tests need quarantine, not retries. Retrying hides the problem. Move flaky tests to a separate non-blocking job, track them, and fix them.
Quality gates at every stage. Define what must pass before code moves forward. Gates get stricter as code gets closer to production.

Calibrate to your team maturity (set team_maturity in .agents/qa-project-context.md):

startup — Single pipeline job: lint + unit tests + one E2E smoke test on PR. Fast feedback over completeness.

growing — Separate jobs for unit, integration, E2E. Parallelization, artifact uploads, test result publishing, flaky test quarantine.

established — Full matrix: parallelized E2E sharding, multi-environment promotion gates, performance and security scans, deployment-gated quality checks, SLA-backed pipelines.

Pipeline Architecture

Push to branch:
  ┌─────────────┐
  │ lint + types │  (30s)
  └──────┬──────┘
         │
  ┌──────▼──────┐
  │  unit tests  │  (1-2 min)
  └──────┬──────┘
         │ (pass)
         ▼
  PR opened/updated:
  ┌──────────────┐
  │  integration  │  (2-3 min)
  └──────┬───────┘
         │
  ┌──────▼──────────┐
  │  E2E (sharded)   │  (5-8 min)
  └──────┬──────────┘
         │
  ┌──────▼──────┐
  │ merge report │
  └─────────────┘

Merge to main:
  ┌───────────┐  ┌──────────┐  ┌──────────┐
  │  full E2E  │  │  visual   │  │  perf     │
  └─────┬─────┘  └────┬─────┘  └────┬─────┘
        └──────────────┼─────────────┘
                       ▼
                ┌─────────────┐
                │   deploy     │
                └─────────────┘

Nightly (scheduled):
  full suite + security scan + a11y audit + flaky quarantine

What Runs When

Trigger	Tests	Max Duration
Push to branch	lint, type-check, unit	2 min
PR opened/updated	+ integration, E2E smoke	10 min
Merge to main	+ full E2E, visual, perf budget	15 min
Nightly schedule	full suite, security, a11y, flaky quarantine	30 min
Release tag	full suite, smoke against staging	20 min

GitHub Actions Templates

For complete, copy-paste-ready workflow files, see references/github-actions-templates.md.

Key Concepts

Concurrency groups prevent wasted runs when a branch gets multiple pushes:

concurrency:
  group: tests-${{ github.ref }}
  cancel-in-progress: true

Matrix strategy for sharding tests across runners:

strategy:
  fail-fast: false
  matrix:
    shard: [1, 2, 3, 4]
steps:
  - run: npx playwright test --shard=${{ matrix.shard }}/4

Caching to avoid reinstalling on every run:

# Node modules: handled by setup-node's cache option
- uses: actions/setup-node@v4
  with: { node-version: 20, cache: npm }

# Playwright browsers: cache separately
- name: Cache Playwright browsers
  id: playwright-cache
  uses: actions/cache@v4
  with:
    path: ~/.cache/ms-playwright
    key: playwright-${{ runner.os }}-${{ hashFiles('package-lock.json') }}

- name: Install Playwright browsers
  if: steps.playwright-cache.outputs.cache-hit != 'true'
  run: npx playwright install --with-deps chromium

Artifact management for reports and traces:

- uses: actions/upload-artifact@v4
  if: ${{ !cancelled() }}
  with:
    name: test-results-${{ matrix.shard }}
    path: |
      test-results/
      playwright-report/
      coverage/
    retention-days: 7

Merging sharded reports into a single HTML report:

merge-reports:
  needs: e2e
  if: ${{ !cancelled() }}
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with: { node-version: 20, cache: npm }
    - run: npm ci
    - uses: actions/download-artifact@v4
      with: { pattern: 'test-results-*', path: all-results }
    - run: npx playwright merge-reports --reporter=html all-results
    - uses: actions/upload-artifact@v4
      with: { name: playwright-report, path: playwright-report/, retention-days: 14 }

Status Checks Configuration

Required status checks protect your main branch. Configure in GitHub repo settings:

Go to Settings > Branches > Branch protection rules
Enable "Require status checks to pass before merging"
Add these required checks: lint, unit-tests, e2e (all shards)
Enable "Require branches to be up to date before merging"

GitLab CI Templates

# .gitlab-ci.yml
stages: [validate, test, e2e, deploy]

variables:
  NODE_ENV: test
  npm_config_cache: '$CI_PROJECT_DIR/.npm'

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths: [.npm/, node_modules/]

lint:
  stage: validate
  image: node:20-alpine
  script: [npm ci --prefer-offline, npm run lint, npm run type-check]

unit-tests:
  stage: test
  image: node:20-alpine
  script: [npm ci --prefer-offline, 'npm run test:ci -- --coverage']
  coverage: '/All files[^|]*\|[^|]*\s+([\d\.]+)/'
  artifacts:
    when: always
    paths: [coverage/]
    reports:
      junit: junit.xml
      coverage_report: { coverage_format: cobertura, path: coverage/cobertura-coverage.xml }

e2e-tests:
  stage: e2e
  image: mcr.microsoft.com/playwright:v1.49.0-noble
  parallel: 4  # GitLab provides CI_NODE_INDEX and CI_NODE_TOTAL automatically
  script:
    - npm ci --prefer-offline
    - npm run build
    - npm start &
    - npx wait-on http://localhost:3000 --timeout 60000
    - npx playwright test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
  artifacts:
    when: always
    paths: [test-results/, playwright-report/]
    expire_in: 7 days
    reports:
      junit: test-results/junit.xml  # GitLab parses this and shows results in MR UI
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

deploy-staging:
  stage: deploy
  script: [./deploy.sh staging]
  rules: [{ if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH' }]
  needs: [unit-tests, e2e-tests]

Advanced Patterns

Test Result Publishing to PR Comments

Post test results directly on the PR for visibility:

# Add after test step in GitHub Actions
- name: Publish test results
  uses: dorny/test-reporter@v1
  if: ${{ !cancelled() }}
  with:
    name: Test Results
    path: test-results/junit.xml
    reporter: jest-junit  # or java-junit for Playwright

- name: Comment coverage on PR
  uses: marocchino/sticky-pull-request-comment@v2
  if: github.event_name == 'pull_request'
  with:
    header: coverage
    path: coverage/coverage-summary.md

Conditional Test Execution

Only test what changed to save CI time:

- name: Detect changed files
  id: changes
  uses: dorny/paths-filter@v3
  with:
    filters: |
      frontend:
        - 'src/**'
        - 'e2e/**'
      backend:
        - 'api/**'
        - 'lib/**'
      config:
        - 'package.json'
        - 'playwright.config.ts'

- name: Run E2E tests
  if: steps.changes.outputs.frontend == 'true' || steps.changes.outputs.config == 'true'
  run: npx playwright test

- name: Run API tests
  if: steps.changes.outputs.backend == 'true' || steps.changes.outputs.config == 'true'
  run: npm run test:api

Test Timing Optimization

Playwright --shard distributes tests by file, balancing duration from previous runs. For custom balancing, generate timing data: npx playwright test --reporter=json | jq '[.suites[].specs[] | {file: .file, duration: .tests[].results[].duration}]'. For Jest, use jest-slow-test-reporter to identify slow tests.

Flaky Test Quarantine

Separate flaky tests into a non-blocking job:

e2e-stable:
  runs-on: ubuntu-latest
  steps:
    - run: npx playwright test --grep-invert @flaky
  # This job is required for merge

e2e-quarantine:
  runs-on: ubuntu-latest
  continue-on-error: true  # Non-blocking
  steps:
    - run: npx playwright test --grep @flaky
    - name: Report flaky results
      if: failure()
      run: |
        echo "::warning::Quarantined tests failed. Review and fix or remove."

Tag flaky tests in your test files:

test('sometimes fails due to race condition @flaky', async ({ page }) => {
  // This test is quarantined -- runs in CI but doesn't block merges
});

Track flaky tests over time. If a quarantined test passes 10 consecutive runs, remove the @flaky tag.

Cache Strategies

Layer	Path	Cache Key
Node modules	(handled by `setup-node` `cache: npm`)	automatic
Playwright browsers	`~/.cache/ms-playwright`	`pw-{os}-{hash(package-lock.json)}`
Build cache (Next.js)	`.next/cache`	`nextjs-{os}-{hash(lockfile)}-{hash(src)}`
Test fixtures	`e2e/fixtures/.cache`	`test-data-{hash(seed.sql)}`

Use actions/cache@v4 for layers 2-4. Add restore-keys for build caches to allow partial matches.

Slack/Teams Notification on Failure

Use slackapi/slack-github-action@v2.0.0 with webhook-type: incoming-webhook. Condition on if: failure() && github.ref == 'refs/heads/main' so notifications only fire for main branch failures. See references/github-actions-templates.md (Nightly Full Suite) for a complete example.

Quality Gates

Gate Definitions

Gate	When	Required Checks	Blocking?
PR Gate	PR opened/updated	lint, type-check, unit tests	Yes
Merge Gate	Before merge to main	+ E2E smoke suite	Yes
Deploy Gate	Before production deploy	+ full E2E, visual regression, perf budget	Yes
Nightly Gate	Scheduled 2am daily	full suite, security scan, a11y audit	Alert only

PR Gate (fast, under 3 minutes)

pr-gate:
  runs-on: ubuntu-latest
  steps:
    - run: npm run lint
    - run: npm run type-check
    - run: npm test -- --ci --coverage
    - run: |
        COVERAGE=$(npx coverage-summary --json | jq '.total.lines.pct')
        if (( $(echo "$COVERAGE < 80" | bc -l) )); then
          echo "::error::Coverage $COVERAGE% is below 80% threshold"
          exit 1
        fi

Merge Gate (comprehensive, under 10 minutes)

Requires PR Gate + E2E smoke tests. Configure as required status checks in branch protection.

Deploy Gate (full confidence, under 15 minutes)

deploy-gate:
  needs: [unit-tests, e2e-tests, visual-tests]
  runs-on: ubuntu-latest
  steps:
    - name: Check performance budget
      run: |
        npx lighthouse-ci assert --config=lighthouserc.json
    - name: Deploy to production
      if: success()
      run: ./deploy.sh production

Nightly Gate (thorough, up to 30 minutes)

on:
  schedule:
    - cron: '0 2 * * *'  # 2am UTC daily

Runs everything: full E2E suite across all browsers, security scan (npm audit, Snyk), accessibility audit (axe-core), and the flaky quarantine suite. Results go to Slack, not as blocking checks.

Anti-Patterns

1. Running all tests on every commit

A 20-minute full suite on every push destroys developer velocity. Use the pipeline architecture above: fast tests on push, comprehensive tests on PR and merge.

2. No artifact storage

When CI tests fail, developers need traces, screenshots, and logs to debug. Without artifacts, every failure requires a "reproduce locally" cycle that wastes hours.

3. Retrying flaky tests without tracking them

Adding retries: 3 hides flakiness. The test passes on retry, the report is green, but the underlying race condition persists. Quarantine flaky tests, track them in a dashboard, and fix the root cause.

4. CI-only failures without local reproduction steps

If a test only fails in CI, document why (e.g., different timezone, missing env var, screen resolution). Add a Makefile or script that replicates CI conditions locally:

# Reproduce CI environment locally
docker run --rm -v $(pwd):/work -w /work \
  mcr.microsoft.com/playwright:v1.49.0-noble \
  npx playwright test --project=chromium

5. Shared state between CI jobs

Jobs that depend on files from other jobs without using artifacts or proper needs dependencies. Each job starts fresh. Use actions/upload-artifact and actions/download-artifact to pass data.

6. No concurrency controls

Multiple CI runs for the same branch waste resources. Always use concurrency groups with cancel-in-progress: true.

7. Hardcoded secrets in workflow files

Never put tokens, passwords, or API keys in YAML. Use GitHub Actions secrets (${{ secrets.MY_SECRET }}) or GitLab CI/CD variables.

8. Ignoring job timeouts

A stuck test can consume a runner for hours. Always set timeout-minutes on jobs and actionTimeout / navigationTimeout in test configs.

Done When

Pipeline runs unit, integration, and E2E tests on every PR
Flaky tests quarantined in a non-blocking job with a tracking ticket — not silently retried without a plan
Test artifacts (reports, screenshots, traces) uploaded and accessible for every CI run
Concurrency groups configured to prevent redundant runs on the same branch
Secrets managed via the CI secrets store — no tokens or passwords hardcoded in workflow files

Related Skills

playwright-automation -- E2E test framework setup, Page Object Model, and test patterns.
qa-metrics -- Test result dashboards, coverage tracking, and flakiness monitoring.
self-healing-tests -- Strategies for reducing test maintenance and auto-recovering from UI changes.
test-strategy -- Overall test planning, pyramid design, and risk-based test selection.

For complete, copy-paste-ready GitHub Actions workflow files, see references/github-actions-templates.md.