java-development
Java Development in Apache Beam
Project Structure
Key Directories
sdks/java/core- Core Java SDK (PCollection, PTransform, Pipeline)sdks/java/harness- SDK harness (container entrypoint)sdks/java/io/- I/O connectors (51+ connectors including BigQuery, Kafka, JDBC, etc.)sdks/java/extensions/- Extensions (SQL, ML, protobuf, etc.)runners/- Runner implementations:runners/direct-java- Direct Runner (local execution)runners/flink/- Flink Runnerrunners/spark/- Spark Runnerrunners/google-cloud-dataflow-java/- Dataflow Runner
examples/java/- Java examples including WordCount
Build System
Apache Beam uses Gradle with a custom BeamModulePlugin. Every Java project's build.gradle starts with:
apply plugin: 'org.apache.beam.module'
applyJavaNature( ... )
Common Commands
Build Commands
# Compile a specific project
./gradlew -p sdks/java/core compileJava
# Build a project (compile + tests)
./gradlew :sdks:java:harness:build
# Run WordCount example
./gradlew :examples:java:wordCount
Running Unit Tests
# Run all tests in a project
./gradlew :sdks:java:harness:test
# Run a specific test class
./gradlew :sdks:java:harness:test --tests org.apache.beam.fn.harness.CachesTest
# Run tests matching a pattern
./gradlew :sdks:java:harness:test --tests *CachesTest
# Run a specific test method
./gradlew :sdks:java:harness:test --tests *CachesTest.testClearableCache
Running Integration Tests
Integration tests have filenames ending in IT.java and use TestPipeline.
# Run I/O integration tests on Direct Runner
./gradlew :sdks:java:io:google-cloud-platform:integrationTest
# Run with custom GCP project
./gradlew :sdks:java:io:google-cloud-platform:integrationTest \
-PgcpProject=<project> -PgcpTempRoot=gs://<bucket>/path
# Run on Dataflow Runner
./gradlew :runners:google-cloud-dataflow-java:examplesJavaRunnerV2IntegrationTest \
-PdisableSpotlessCheck=true -PdisableCheckStyle=true -PskipCheckerFramework \
-PgcpProject=<project> -PgcpRegion=us-central1 -PgcsTempRoot=gs://<bucket>/tmp
Code Formatting
# Format Java code
./gradlew spotlessApply
Writing Integration Tests
@Rule public TestPipeline pipeline = TestPipeline.create();
@Test
public void testSomething() {
pipeline.apply(...);
pipeline.run().waitUntilFinish();
}
Set pipeline options via -DbeamTestPipelineOptions='[...]':
-DbeamTestPipelineOptions='["--runner=TestDataflowRunner","--project=myproject","--region=us-central1","--stagingLocation=gs://bucket/path"]'
Using Modified Beam Code
Publish to Maven Local
# Publish a specific module
./gradlew -Ppublishing -p sdks/java/io/kafka publishToMavenLocal
# Publish all modules
./gradlew -Ppublishing publishToMavenLocal
Building SDK Container
# Build Java SDK container (for Runner v2)
./gradlew :sdks:java:container:java11:docker
# Tag and push
docker tag apache/beam_java11_sdk:2.XX.0.dev \
"us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"
docker push "us-docker.pkg.dev/your-project/beam/beam_java11_sdk:custom"
Building Dataflow Worker Jar
./gradlew :runners:google-cloud-dataflow-java:worker:shadowJar
Test Naming Conventions
- Unit tests:
*Test.java - Integration tests:
*IT.java
JUnit Report Location
After running tests, find HTML reports at:
<project>/build/reports/tests/test/index.html
IDE Setup (IntelliJ)
- Open
/beam(the repository root, NOTsdks/java) - Wait for indexing to complete
- Find
examples/java/build.gradleand click Run next to wordCount task to verify setup
More from apache/beam
gradle-build
Guides understanding and using the Gradle build system in Apache Beam. Use when building projects, understanding dependencies, or troubleshooting build issues.
48python-development
Guides Python SDK development in Apache Beam, including environment setup, testing, building, and running pipelines. Use when working with Python code in sdks/python/.
25license-compliance
Ensures all new files include proper Apache 2.0 license headers. Use when creating any new file in the Apache Beam repository.
24ci-cd
Guides understanding and working with Apache Beam's CI/CD system using GitHub Actions. Use when debugging CI failures, understanding test workflows, or modifying CI configuration.
23contributing
Guides the contribution workflow for Apache Beam, including creating PRs, issue management, code review process, and release cycles. Use when contributing code, creating PRs, or understanding the contribution process.
23beam-concepts
Explains core Apache Beam programming model concepts including PCollections, PTransforms, Pipelines, and Runners. Use when learning Beam fundamentals or explaining pipeline concepts.
23