ci-cd
CI/CD in Apache Beam
Overview
Apache Beam uses GitHub Actions for CI/CD. Workflows are located in .github/workflows/.
Workflow Types
PreCommit Workflows
- Run on PRs and merges
- Validate code changes before merge
- Naming:
beam_PreCommit_*.yml
PostCommit Workflows
- Run after merge and on schedule
- More comprehensive testing
- Naming:
beam_PostCommit_*.yml
Scheduled Workflows
- Run nightly on master
- Check for external dependency impacts
- Tag master with
nightly-master
Key Workflows
PreCommit
| Workflow | Description |
|---|---|
beam_PreCommit_Java.yml |
Java build and tests |
beam_PreCommit_Python.yml |
Python tests |
beam_PreCommit_Go.yml |
Go tests |
beam_PreCommit_RAT.yml |
License header checks |
beam_PreCommit_Spotless.yml |
Code formatting |
PostCommit - Java
| Workflow | Description |
|---|---|
beam_PostCommit_Java.yml |
Full Java test suite |
beam_PostCommit_Java_ValidatesRunner_*.yml |
Runner validation tests |
beam_PostCommit_Java_Examples_*.yml |
Example pipeline tests |
PostCommit - Python
| Workflow | Description |
|---|---|
beam_PostCommit_Python.yml |
Full Python test suite |
beam_PostCommit_Python_ValidatesRunner_*.yml |
Runner validation |
beam_PostCommit_Python_Examples_*.yml |
Examples |
Load & Performance Tests
| Workflow | Description |
|---|---|
beam_LoadTests_*.yml |
Load testing |
beam_PerformanceTests_*.yml |
I/O performance |
Triggering Tests
Automatic
- PRs trigger PreCommit tests
- Merges trigger PostCommit tests
Triggering Specific Workflows
Use trigger files to run specific workflows.
Workflow Dispatch
Most workflows support manual triggering via GitHub UI.
Understanding Test Results
Finding Logs
- Go to PR → Checks tab
- Click on failed workflow
- Expand failed job
- View step logs
Common Failure Patterns
Flaky Tests
- Random failures unrelated to change
- Solution: Use trigger files to re-run the specific workflow.
Timeout
- Increase timeout in workflow if justified
- Or optimize test
Resource Exhaustion
- GCP quota issues
- Check project settings
GCP Credentials
Workflows requiring GCP access use these secrets:
GCP_PROJECT_ID- Project ID (e.g.,apache-beam-testing)GCP_REGION- Region (e.g.,us-central1)GCP_TESTING_BUCKET- Temp storage bucketGCP_PYTHON_WHEELS_BUCKET- Python wheels bucketGCP_SA_EMAIL- Service account emailGCP_SA_KEY- Base64-encoded service account key
Required IAM roles:
- Storage Admin
- Dataflow Admin
- Artifact Registry Writer
- BigQuery Data Editor
- Service Account User
Self-hosted vs GitHub-hosted Runners
Self-hosted (majority of workflows)
- Pre-configured with dependencies
- GCP credentials pre-configured
- Naming:
beam_*.yml
GitHub-hosted
- Used for cross-platform testing (Linux, macOS, Windows)
- May need explicit credential setup
Workflow Structure
name: Workflow Name
on:
push:
branches: [master]
pull_request:
branches: [master]
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
jobs:
build:
runs-on: [self-hosted, ...]
steps:
- uses: actions/checkout@v4
- name: Run Gradle
run: ./gradlew :task:name
Local Debugging
Run Same Commands as CI
Check workflow file's run commands:
./gradlew :sdks:java:core:test
./gradlew :sdks:python:test
Common Issues
- Clean gradle cache:
rm -rf ~/.gradle .gradle - Remove build directory:
rm -rf build - Check Java version matches CI
Snapshot Builds
Locations
- Java SDK: https://repository.apache.org/content/groups/snapshots/org/apache/beam/
- SDK Containers: https://gcr.io/apache-beam-testing/beam-sdk
- Portable Runners: https://gcr.io/apache-beam-testing/beam_portability
- Python SDK: gs://beam-python-nightly-snapshots
Release Workflows
| Workflow | Purpose |
|---|---|
cut_release_branch.yml |
Create release branch |
build_release_candidate.yml |
Build RC |
finalize_release.yml |
Finalize release |
publish_github_release_notes.yml |
Publish notes |
More from apache/beam
gradle-build
Guides understanding and using the Gradle build system in Apache Beam. Use when building projects, understanding dependencies, or troubleshooting build issues.
48java-development
Guides Java SDK development in Apache Beam, including building, testing, running examples, and understanding the project structure. Use when working with Java code in sdks/java/, runners/, or examples/java/.
27python-development
Guides Python SDK development in Apache Beam, including environment setup, testing, building, and running pipelines. Use when working with Python code in sdks/python/.
25license-compliance
Ensures all new files include proper Apache 2.0 license headers. Use when creating any new file in the Apache Beam repository.
24contributing
Guides the contribution workflow for Apache Beam, including creating PRs, issue management, code review process, and release cycles. Use when contributing code, creating PRs, or understanding the contribution process.
23beam-concepts
Explains core Apache Beam programming model concepts including PCollections, PTransforms, Pipelines, and Runners. Use when learning Beam fundamentals or explaining pipeline concepts.
23