deploying-airflow
Deploying Airflow
This skill covers deploying Airflow DAGs and projects to production, whether using Astro (Astronomer's managed platform) or open-source Airflow on Docker Compose or Kubernetes.
Choosing a path: Astro is a good fit for managed operations and faster CI/CD. For open-source, use Docker Compose for dev and the Helm chart for production.
Astro (Astronomer)
Astro provides CLI commands and GitHub integration for deploying Airflow projects.
Deploy Commands
| Command | What It Does |
|---|---|
astro deploy |
Full project deploy — builds Docker image and deploys DAGs |
astro deploy --dags |
DAG-only deploy — pushes only DAG files (fast, no image build) |
astro deploy --image |
Image-only deploy — pushes only the Docker image (for multi-repo CI/CD) |
astro deploy --dbt |
dbt project deploy — deploys a dbt project to run alongside Airflow |
Full Project Deploy
Builds a Docker image from your Astro project and deploys everything (DAGs, plugins, requirements, packages):
astro deploy
Use this when you've changed requirements.txt, Dockerfile, packages.txt, plugins, or any non-DAG file.
DAG-Only Deploy
Pushes only files in the dags/ directory without rebuilding the Docker image:
astro deploy --dags
This is significantly faster than a full deploy since it skips the image build. Use this when you've only changed DAG files and haven't modified dependencies or configuration.
Image-Only Deploy
Pushes only the Docker image without updating DAGs:
astro deploy --image
This is useful in multi-repo setups where DAGs are deployed separately from the image, or in CI/CD pipelines that manage image and DAG deploys independently.
dbt Project Deploy
Deploys a dbt project to run with Cosmos on an Astro deployment:
astro deploy --dbt
GitHub Integration
Astro supports branch-to-deployment mapping for automated deploys:
- Map branches to specific deployments (e.g.,
main-> production,develop-> staging) - Pushes to mapped branches trigger automatic deploys
- Supports DAG-only deploys on merge for faster iteration
Configure this in the Astro UI under Deployment Settings > CI/CD.
CI/CD Patterns
Common CI/CD strategies on Astro:
- DAG-only on feature branches: Use
astro deploy --dagsfor fast iteration during development - Full deploy on main: Use
astro deployon merge to main for production releases - Separate image and DAG pipelines: Use
--imageand--dagsin separate CI jobs for independent release cycles
Deploy Queue
When multiple deploys are triggered in quick succession, Astro processes them sequentially in a deploy queue. Each deploy completes before the next one starts.
Reference
Open-Source: Docker Compose
Deploy Airflow using the official Docker Compose setup. This is recommended for learning and exploration — for production, use Kubernetes with the Helm chart (see below).
Prerequisites
- Docker and Docker Compose v2.14.0+
- The official
apache/airflowDocker image
Quick Start
Download the official Airflow 3 Docker Compose file:
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'
This sets up the full Airflow 3 architecture:
| Service | Purpose |
|---|---|
airflow-apiserver |
REST API and UI (port 8080) |
airflow-scheduler |
Schedules DAG runs |
airflow-dag-processor |
Parses and processes DAG files |
airflow-worker |
Executes tasks (CeleryExecutor) |
airflow-triggerer |
Handles deferrable/async tasks |
postgres |
Metadata database |
redis |
Celery message broker |
Minimal Setup
For a simpler setup with LocalExecutor (no Celery/Redis), create a docker-compose.yaml:
x-airflow-common: &airflow-common
image: apache/airflow:3 # Use the latest Airflow 3.x release
environment: &airflow-common-env
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/dags
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
depends_on:
postgres:
condition: service_healthy
services:
postgres:
image: postgres:16
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 10s
retries: 5
start_period: 5s
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
command:
- -c
- |
airflow db migrate
airflow users create \
--username admin \
--firstname Admin \
--lastname User \
--role Admin \
--email admin@example.com \
--password admin
depends_on:
postgres:
condition: service_healthy
airflow-apiserver:
<<: *airflow-common
command: airflow api-server
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 30s
airflow-scheduler:
<<: *airflow-common
command: airflow scheduler
airflow-dag-processor:
<<: *airflow-common
command: airflow dag-processor
airflow-triggerer:
<<: *airflow-common
command: airflow triggerer
volumes:
postgres-db-volume:
Airflow 3 architecture note: The webserver has been replaced by the API server (
airflow api-server), and the DAG processor now runs as a standalone process separate from the scheduler.
Common Operations
# Start all services
docker compose up -d
# Stop all services
docker compose down
# View logs
docker compose logs -f airflow-scheduler
# Restart after requirements change
docker compose down && docker compose up -d --build
# Run a one-off Airflow CLI command
docker compose exec airflow-apiserver airflow dags list
Installing Python Packages
Add packages to requirements.txt and rebuild:
# Add to requirements.txt, then:
docker compose down
docker compose up -d --build
Or use a custom Dockerfile:
FROM apache/airflow:3 # Pin to a specific version (e.g., 3.1.7) for reproducibility
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
Update docker-compose.yaml to build from the Dockerfile:
x-airflow-common: &airflow-common
build:
context: .
dockerfile: Dockerfile
# ... rest of config
Environment Variables
Configure Airflow settings via environment variables in docker-compose.yaml:
environment:
# Core settings
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__CORE__PARALLELISM: 32
AIRFLOW__CORE__MAX_ACTIVE_TASKS_PER_DAG: 16
# Email
AIRFLOW__EMAIL__EMAIL_BACKEND: airflow.utils.email.send_email_smtp
AIRFLOW__SMTP__SMTP_HOST: smtp.example.com
# Connections (as URI)
AIRFLOW_CONN_MY_DB: postgresql://user:pass@host:5432/db
Open-Source: Kubernetes (Helm Chart)
Deploy Airflow on Kubernetes using the official Apache Airflow Helm chart.
Prerequisites
- A Kubernetes cluster
kubectlconfiguredhelminstalled
Installation
# Add the Airflow Helm repo
helm repo add apache-airflow https://airflow.apache.org
helm repo update
# Install with default values
helm install airflow apache-airflow/airflow \
--namespace airflow \
--create-namespace
# Install with custom values
helm install airflow apache-airflow/airflow \
--namespace airflow \
--create-namespace \
-f values.yaml
Key values.yaml Configuration
# Executor type
executor: KubernetesExecutor # or CeleryExecutor, LocalExecutor
# Airflow image (pin to your desired version)
defaultAirflowRepository: apache/airflow
defaultAirflowTag: "3" # Or pin: "3.1.7"
# Git-sync for DAGs (recommended for production)
dags:
gitSync:
enabled: true
repo: https://github.com/your-org/your-dags.git
branch: main
subPath: dags
wait: 60 # seconds between syncs
# API server (replaces webserver in Airflow 3)
apiServer:
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
replicas: 1
# Scheduler
scheduler:
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
# Standalone DAG processor
dagProcessor:
enabled: true
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
# Triggerer (for deferrable tasks)
triggerer:
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
# Worker resources (CeleryExecutor only)
workers:
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
replicas: 2
# Log persistence
logs:
persistence:
enabled: true
size: 10Gi
# PostgreSQL (built-in)
postgresql:
enabled: true
# Or use an external database
# postgresql:
# enabled: false
# data:
# metadataConnection:
# user: airflow
# pass: airflow
# host: your-rds-host.amazonaws.com
# port: 5432
# db: airflow
Upgrading
# Upgrade with new values
helm upgrade airflow apache-airflow/airflow \
--namespace airflow \
-f values.yaml
# Upgrade to a new Airflow version
helm upgrade airflow apache-airflow/airflow \
--namespace airflow \
--set defaultAirflowTag="<version>"
DAG Deployment Strategies on Kubernetes
- Git-sync (recommended): DAGs are synced from a Git repository automatically
- Persistent Volume: Mount a shared PV containing DAGs
- Baked into image: Include DAGs in a custom Docker image
Useful Commands
# Check pod status
kubectl get pods -n airflow
# View scheduler logs
kubectl logs -f deployment/airflow-scheduler -n airflow
# Port-forward the API server
kubectl port-forward svc/airflow-apiserver 8080:8080 -n airflow
# Run a one-off CLI command
kubectl exec -it deployment/airflow-scheduler -n airflow -- airflow dags list
Related Skills
- setting-up-astro-project: For initializing a new Astro project
- managing-astro-local-env: For local development with
astro dev - authoring-dags: For writing DAGs before deployment
- testing-dags: For testing DAGs before deployment