ci-cd

SKILL.md

CI/CD in Apache Beam

Overview

Apache Beam uses GitHub Actions for CI/CD. Workflows are located in .github/workflows/.

Workflow Types

PreCommit Workflows

  • Run on PRs and merges
  • Validate code changes before merge
  • Naming: beam_PreCommit_*.yml

PostCommit Workflows

  • Run after merge and on schedule
  • More comprehensive testing
  • Naming: beam_PostCommit_*.yml

Scheduled Workflows

  • Run nightly on master
  • Check for external dependency impacts
  • Tag master with nightly-master

Key Workflows

PreCommit

Workflow Description
beam_PreCommit_Java.yml Java build and tests
beam_PreCommit_Python.yml Python tests
beam_PreCommit_Go.yml Go tests
beam_PreCommit_RAT.yml License header checks
beam_PreCommit_Spotless.yml Code formatting

PostCommit - Java

Workflow Description
beam_PostCommit_Java.yml Full Java test suite
beam_PostCommit_Java_ValidatesRunner_*.yml Runner validation tests
beam_PostCommit_Java_Examples_*.yml Example pipeline tests

PostCommit - Python

Workflow Description
beam_PostCommit_Python.yml Full Python test suite
beam_PostCommit_Python_ValidatesRunner_*.yml Runner validation
beam_PostCommit_Python_Examples_*.yml Examples

Load & Performance Tests

Workflow Description
beam_LoadTests_*.yml Load testing
beam_PerformanceTests_*.yml I/O performance

Triggering Tests

Automatic

  • PRs trigger PreCommit tests
  • Merges trigger PostCommit tests

Triggering Specific Workflows

Use trigger files to run specific workflows.

Workflow Dispatch

Most workflows support manual triggering via GitHub UI.

Understanding Test Results

Finding Logs

  1. Go to PR → Checks tab
  2. Click on failed workflow
  3. Expand failed job
  4. View step logs

Common Failure Patterns

Flaky Tests

  • Random failures unrelated to change
  • Solution: Use trigger files to re-run the specific workflow.

Timeout

  • Increase timeout in workflow if justified
  • Or optimize test

Resource Exhaustion

  • GCP quota issues
  • Check project settings

GCP Credentials

Workflows requiring GCP access use these secrets:

  • GCP_PROJECT_ID - Project ID (e.g., apache-beam-testing)
  • GCP_REGION - Region (e.g., us-central1)
  • GCP_TESTING_BUCKET - Temp storage bucket
  • GCP_PYTHON_WHEELS_BUCKET - Python wheels bucket
  • GCP_SA_EMAIL - Service account email
  • GCP_SA_KEY - Base64-encoded service account key

Required IAM roles:

  • Storage Admin
  • Dataflow Admin
  • Artifact Registry Writer
  • BigQuery Data Editor
  • Service Account User

Self-hosted vs GitHub-hosted Runners

Self-hosted (majority of workflows)

  • Pre-configured with dependencies
  • GCP credentials pre-configured
  • Naming: beam_*.yml

GitHub-hosted

  • Used for cross-platform testing (Linux, macOS, Windows)
  • May need explicit credential setup

Workflow Structure

name: Workflow Name
on:
  push:
    branches: [master]
  pull_request:
    branches: [master]
  schedule:
    - cron: '0 0 * * *'
  workflow_dispatch:

jobs:
  build:
    runs-on: [self-hosted, ...]
    steps:
      - uses: actions/checkout@v4
      - name: Run Gradle
        run: ./gradlew :task:name

Local Debugging

Run Same Commands as CI

Check workflow file's run commands:

./gradlew :sdks:java:core:test
./gradlew :sdks:python:test

Common Issues

  • Clean gradle cache: rm -rf ~/.gradle .gradle
  • Remove build directory: rm -rf build
  • Check Java version matches CI

Snapshot Builds

Locations

Release Workflows

Workflow Purpose
cut_release_branch.yml Create release branch
build_release_candidate.yml Build RC
finalize_release.yml Finalize release
publish_github_release_notes.yml Publish notes
Weekly Installs
18
Repository
apache/beam
GitHub Stars
8.5K
First Seen
14 days ago
Installed on
gemini-cli18
opencode18
codebuddy18
github-copilot18
codex18
kimi-cli18