skills/d-oit/do-novelist-ai/ci-optimization-specialist

ci-optimization-specialist

SKILL.md

CI Optimization Specialist

Quick Start

This skill optimizes GitHub Actions workflows for:

  1. Test sharding: Parallel test execution across multiple runners
  2. Caching: pnpm store, Playwright browsers, Vite build cache
  3. Workflow optimization: Job dependencies and concurrency

When to Use

  • CI execution time exceeds 10-15 minutes
  • GitHub Actions costs too high
  • Need faster developer feedback loops
  • Tests not parallelized

Test Sharding Setup

Basic Pattern (Automatic Distribution)

Add matrix strategy to .github/workflows/ci.yml:

e2e-tests:
  name: 🧪 E2E Tests [Shard ${{ matrix.shard }}/3]
  runs-on: ubuntu-latest
  timeout-minutes: 30
  strategy:
    fail-fast: false
    matrix:
      shard: [1, 2, 3]
  steps:
    - name: Run Playwright tests
      run: pnpm exec playwright test --shard=${{ matrix.shard }}/3
      env:
        CI: true

Expected improvement: 60-65% faster for 3 shards

Advanced Pattern (Manual Distribution)

For unbalanced test suites, manually distribute by duration:

matrix:
  include:
    - shard: 1
      pattern: 'ai-generation|project-management' # Heavy tests
    - shard: 2
      pattern: 'project-wizard|settings|publishing' # Medium tests
    - shard: 3
      pattern: 'world-building|versioning|mock-validation' # Light tests

# In step:
run: pnpm exec playwright test --grep "${{ matrix.pattern }}"

Critical Caching Patterns

pnpm Store Cache

ALWAYS cache pnpm store to avoid re-downloading packages:

- name: Get pnpm store directory
  id: pnpm-cache
  shell: bash
  run: echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT

- name: Setup pnpm cache
  uses: actions/cache@v4
  with:
    path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
    key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
    restore-keys: |
      ${{ runner.os }}-pnpm-store-

Playwright Browsers Cache

Cache 500MB+ browser binaries:

- name: Cache Playwright browsers
  uses: actions/cache@v4
  id: playwright-cache
  with:
    path: ~/.cache/ms-playwright
    key: ${{ runner.os }}-playwright-${{ hashFiles('**/pnpm-lock.yaml') }}

- name: Install Playwright browsers
  if: steps.playwright-cache.outputs.cache-hit != 'true'
  run: pnpm exec playwright install --with-deps chromium

- name: Install Playwright system dependencies
  if: steps.playwright-cache.outputs.cache-hit == 'true'
  run: pnpm exec playwright install-deps chromium

Vite Build Cache

For monorepos or frequent builds:

- name: Cache Vite build
  uses: actions/cache@v4
  with:
    path: |
      dist/
      node_modules/.vite/
    key: ${{ runner.os }}-vite-${{ hashFiles('src/**', 'vite.config.ts') }}

Workflow Optimization

Job Dependencies

Use needs to control execution flow:

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    steps:
      - name: Build
        run: pnpm run build
      - name: Run unit tests
        run: pnpm test

  e2e-tests:
    needs: build-and-test # Wait for build to complete
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3]
    steps:
      - name: Run E2E tests
        run: pnpm exec playwright test --shard=${{ matrix.shard }}/3

Concurrency Control

Prevent multiple runs on same branch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Artifact Management

Per-Shard Artifacts

Upload test reports from each shard:

- name: Upload Playwright report
  if: always()
  uses: actions/upload-artifact@v4
  with:
    name: playwright-report-shard-${{ matrix.shard }}-${{ github.sha }}
    path: playwright-report/
    retention-days: 7
    compression-level: 6

Artifact Cleanup

Set short retention for test reports to reduce storage costs:

retention-days: 7 # Default is 90 days
compression-level: 6 # Compress to reduce storage

Performance Monitoring

Expected Benchmarks

Optimization Before After Improvement
Test sharding (3 shards) 27 min 9-10 min 60-65%
pnpm cache hit 2-3 min 10-15s 85-90%
Playwright cache hit 1-2 min 5-10s 90-95%
Vite build cache 1-2 min 5-10s 90-95%

Regression Detection

Set timeout thresholds as guardrails:

timeout-minutes: 30 # Fail if shard exceeds 30 minutes

Monitor shard execution times and rebalance if one shard consistently exceeds others by >2 minutes.

Optimization Workflow

Phase 1: Baseline

  1. Record current CI execution times
  2. Identify slowest jobs
  3. Measure cache hit rates (check Actions logs)

Phase 2: Implement Caching

  1. Add pnpm store cache (highest impact)
  2. Add Playwright browser cache
  3. Add build caches if applicable
  4. Verify cache keys work correctly

Phase 3: Implement Sharding

  1. Calculate optimal shard count (target 3-5 min per shard)
  2. Add matrix strategy to workflow
  3. Test locally: playwright test --shard=1/3
  4. Monitor shard balance in CI

Phase 4: Monitor & Adjust

  1. Track execution times over 5-10 runs
  2. Identify unbalanced shards (>2 min variance)
  3. Adjust shard distribution if needed
  4. Set up alerts for regressions

Common Issues

Shard imbalance (one shard takes 2x longer)

  • Use manual distribution with --grep patterns
  • Group heavy tests together, distribute across shards

Cache misses despite correct key

  • Verify hashFiles glob patterns match actual files
  • Check if lock file changes on every run (shouldn't happen)

Playwright install fails with cache hit

  • Ensure system dependencies installed separately: playwright install-deps

Tests fail in CI but pass locally

  • Check environment variables (CI=true may affect behavior)
  • Verify mock setup works in parallel execution
  • Increase timeouts for slow operations

Success Criteria

  • CI execution time < 15 minutes total
  • Cache hit rate > 85% for dependencies
  • Shard execution time variance < 2 minutes
  • Zero timeout failures from slow tests

References

For detailed examples and templates:

Weekly Installs
3
First Seen
Feb 21, 2026
Installed on
opencode3
gemini-cli3
claude-code3
github-copilot3
codex3
amp3