sync-to-prod
Sync to Production Skill
This skill provides workflows for synchronizing Kubernetes kustomization configurations from staging to production environment in the simplex-gitops repository.
⚠️ CRITICAL: Production Deployment Policy
生产环境部署必须手动执行,禁止自动同步。
The workflow is:
- ✅ Update kustomization.yaml (can be automated)
- ✅ Commit and push to GitLab (can be automated)
- ⛔ ArgoCD sync to production cluster - MUST BE MANUAL
After pushing changes, inform the user:
- Changes are pushed to the repository
- Production ArgoCD app will detect the changes but will NOT auto-sync
- User must manually trigger sync via ArgoCD UI or CLI when ready
# View pending changes (safe, read-only)
argocd app get simplex-aws-prod
argocd app diff simplex-aws-prod
# Manual sync (ONLY when user explicitly requests)
argocd app sync simplex-aws-prod
NEVER run argocd app sync simplex-aws-prod automatically.
File Locations
kubernetes/overlays/aws-staging/kustomization.yaml # Staging config
kubernetes/overlays/aws-prod/kustomization.yaml # Production config
Quick Commands
View Image Differences
# Using the sync script
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --diff
# Or using make target (if in kubernetes/ directory)
make compare-images
Sync Images
# Sync specific services
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images front,anotherme-agent
# Sync all images (dry-run first)
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --all --dry-run
# Sync all images (apply changes)
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --all
Sync Workflow
Step 1: Compare Environments
Run the diff command to see what's different between staging and production:
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --diff
This shows:
- 🔄 DIFFERENT TAGS: Services with different versions
- ✅ SAME TAGS: Services already in sync
- ⚠️ STAGING ONLY: Services only in staging
- ⚠️ PROD ONLY: Services only in production
Step 2: Review and Select Services
Decide which services to promote. Common patterns:
# Promote a single critical service
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images front --dry-run
# Promote frontend services
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images front,front-homepage --dry-run
# Promote all AI services
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images anotherme-agent,anotherme-api,anotherme-search,anotherme-worker --dry-run
# Promote everything
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --all --dry-run
Step 3: Apply Changes
After reviewing dry-run output, apply the changes:
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --images <services>
Step 4: Commit and Push
cd /path/to/simplex-gitops
git add kubernetes/overlays/aws-prod/kustomization.yaml
git commit -m "chore: promote <services> to production"
git push
重要:推送后 ArgoCD 会检测到变更,但不会自动同步到生产集群。
Step 5: Manual Production Sync (User Action Required)
推送完成后,需要用户手动触发生产环境同步:
# 查看待同步的变更
argocd app get simplex-aws-prod
argocd app diff simplex-aws-prod
# 用户确认后手动同步
argocd app sync simplex-aws-prod
或通过 ArgoCD Web UI 手动点击 Sync 按钮:
- URL: http://192.168.10.117:31006
- 找到
simplex-aws-prod应用 - 点击 "SYNC" 按钮
Configuration Sections That May Need Sync
Beyond image tags, these sections may differ between environments:
1. Image Tags (Primary Sync Target)
Located in the images: section. This is what the sync script handles.
2. ConfigMap Patches
Files in patches/ directory may contain environment-specific values:
| Patch File | Purpose | Sync Consideration |
|---|---|---|
api-cm0-configmap.yaml |
API config | Usually environment-specific, don't sync |
gateway-cm0-configmap.yaml |
Gateway config | Usually environment-specific |
anotherme-agent-env-configmap.yaml |
Agent config | May need selective sync |
anotherme-agent-secrets.yaml |
Agent secrets | Never sync, environment-specific |
anotherme-search-env-configmap.yaml |
Search config | May need selective sync |
simplex-cron-env-configmap.yaml |
Cron config | Usually environment-specific |
simplex-router-cm0-configmap.yaml |
Router config | Usually environment-specific |
frontend-env.yaml |
Frontend env vars | Usually environment-specific |
ingress.yaml |
Ingress rules | Never sync, different domains |
3. Replica Counts
Staging often runs with fewer replicas. Production uses base defaults or higher. This is intentional and should NOT be synced.
4. Node Pool Assignments
- Staging:
karpenter.sh/nodepool: staging/singleton-staging - Production:
karpenter.sh/nodepool: production/singleton-production
These are environment-specific and should NOT be synced.
5. Storage Classes
Both environments use similar patterns but production uses gp3 while staging uses ebs-gp3-auto. Usually no sync needed.
6. High Availability Settings
Production has additional HA configurations:
topologySpreadConstraintsfor cross-AZ distributionterminationGracePeriodSeconds: 60for graceful shutdown
These are production-specific optimizations and should NOT be synced to staging.
Manual Sync Patterns
For configurations not handled by the script:
Sync a Specific ConfigMap Patch
# Compare
diff kubernetes/overlays/aws-staging/patches/anotherme-agent-env-configmap.yaml \
kubernetes/overlays/aws-prod/patches/anotherme-agent-env-configmap.yaml
# Copy if needed (carefully review first!)
cp kubernetes/overlays/aws-staging/patches/anotherme-agent-env-configmap.yaml \
kubernetes/overlays/aws-prod/patches/anotherme-agent-env-configmap.yaml
Sync New Resources
If staging has new resources (PV, PVC, etc.) that production needs:
- Check staging
resources:section for new entries - Copy the resource files to aws-prod
- Add to aws-prod
kustomization.yamlresources section - Adjust environment-specific values (namespace, labels, etc.)
Verification After Sync
Check ArgoCD Status (Read-Only, Safe)
# 查看应用状态和待同步变更
argocd app get simplex-aws-prod
argocd app diff simplex-aws-prod
Manual Sync (User Must Explicitly Request)
# ⛔ 仅在用户明确要求时执行
argocd app sync simplex-aws-prod
Check Deployed Versions
# Production namespace
k1 get pods -n production -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}'
# Staging namespace
k2 get pods -n staging -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}'
Validate Manifests
kubectl kustomize kubernetes/overlays/aws-prod > /tmp/prod-manifests.yaml
kubectl kustomize kubernetes/overlays/aws-staging > /tmp/staging-manifests.yaml
diff /tmp/staging-manifests.yaml /tmp/prod-manifests.yaml
Troubleshooting
Script Not Finding Repository
Ensure you're in the simplex-gitops directory or set the path explicitly:
cd /path/to/simplex-gitops
python3 ~/.claude/skills/sync-to-prod/scripts/sync_images.py --diff
Image Not Found in Staging
The service may use a different image name format (Aliyun vs ECR). Check both formats in the kustomization files.
ArgoCD Not Syncing
# 查看应用状态(只读)
argocd app get simplex-aws-prod --show-operation
# 刷新应用检测最新变更(只读,安全)
argocd app refresh simplex-aws-prod
# ⛔ 手动同步 - 仅在用户明确要求时执行
argocd app sync simplex-aws-prod
Service Categories Reference
| Category | Services |
|---|---|
| AI Core | anotherme-agent, anotherme-api, anotherme-search, anotherme-worker |
| Frontend | front, front-homepage |
| Backend | simplex-cron, simplex-gateway-api, simplex-gateway-worker |
| Data | data-search-api, crawler |
| Infrastructure | litellm, node-server, simplex-router, simplex-router-backend, simplex-router-fronted |