Helm Chart Creation & Maintenance (Lerian Conventions)

Overview

This skill enforces Lerian's Helm chart conventions across all services. Every Helm chart MUST follow these patterns to ensure consistency, security, and operability across the platform.

Reference standards: dev-team/docs/standards/helm/ Executor agent: ring:helm-engineer

CRITICAL: Role Clarification

Who	Responsibility
This Skill (ring:dev-helm)	Orchestrates the workflow: validates input, dispatches agent, verifies output
Agent (ring:helm-engineer)	Executes: reads app source, creates chart files, validates with helm lint

Step 1: Validate Input

<verify_before_proceed>

service_name is provided
chart_type is one of: single, multi-component, umbrella
components list is not empty </verify_before_proceed>

REQUIRED INPUT:
- service_name: name of the service
- chart_type: single | multi-component | umbrella
- components: list of component names

OPTIONAL INPUT:
- dependencies: [postgresql, mongodb, rabbitmq, valkey, keda]
- has_worker: true/false
- namespace: target namespace

if any REQUIRED input is missing:
  → STOP and report: "Missing required input: [field]"

Step 2: Scaffold Chart Structure

MUST create the standard directory structure. See helm/conventions.md for:

Chart naming convention (-helm suffix rule and exceptions)
Chart.yaml template with all required fields
Directory structure (per-component directories, common/ for shared resources)
Image repository naming convention
Service type rule (always ClusterIP)
Port allocation ranges

Naming Convention Quick Reference

CHART NAME RULES:
- Default: {service_name}-helm (e.g., reporter-helm, tracer-helm)
- Exception: plugin-access-manager (no -helm suffix)
- Exception: otel-collector-lerian (no -helm suffix)

if service_name is NOT in exception list:
  → chart name = {service_name}-helm

Step 3: Create _helpers.tpl

MUST define helper functions per component. See helm/templates.md for:

Required helper functions (name, fullname, chart, labels, selectorLabels, versionLabelValue)
Mandatory Kubernetes labels (app.kubernetes.io/*)
Multi-component additional labels (component, part-of)

Step 4: Create values.yaml

<cannot_skip> values.yaml MUST follow the exact Lerian structure. </cannot_skip>

See helm/values.md for:

Complete top-level structure (global overrides, per-component config, common shared, dependencies)
ConfigMap vs Secrets classification rules
Mandatory environment variable groups (app config, telemetry, health, auth, database)

<block_condition> HARD GATE: MUST read the application's .env.example or config.go to extract ALL expected environment variables. Do NOT guess. Missing env vars are the #1 cause of CrashLoopBackOff. </block_condition>

Step 5: Dispatch Agent

<dispatch_required agent="ring:helm-engineer"> Create/update Helm chart following Lerian conventions. </dispatch_required>

Task:
  subagent_type: "ring:helm-engineer"
  description: "Create Helm chart for {service_name}"
  prompt: |
    ⛔ MANDATORY: Create Helm chart following Lerian conventions.

    ## Context
    - **Service:** {service_name}
    - **Components:** {components}
    - **Dependencies:** {dependencies}
    - **Chart Type:** {chart_type}

    ## Required Steps
    1. Read application .env.example and config struct
    2. Verify health check endpoints in application source
    3. Create chart structure per Lerian conventions
    4. Map ALL env vars to configmap/secrets
    5. Validate with helm lint and helm template

    ## Required Output
    - Standards Verification (FIRST)
    - Env Var Coverage table (100% coverage required)
    - Validation Results (helm lint MUST pass)

if agent returns env_vars_missing > 0:
  → FAIL: "Chart has missing env vars. Fix before proceeding."

if agent returns helm_lint_status == FAIL:
  → Re-dispatch agent with specific lint errors

if all checks PASS:
  → Proceed to Step 6 (Worker) or Step 7 (Validation)

Step 6: Worker Component (if has_worker)

<verify_before_proceed>

Is there a background worker/consumer component?
Does it use KEDA ScaledJob or standard Deployment? </verify_before_proceed>

See helm/worker-patterns.md for:

Dual-mode pattern (KEDA default + Deployment fallback)
Template guards for mode selection
ScaledJob template requirements
Worker Deployment template requirements

Step 7: Validate Chart

<cannot_skip> ALL checks MUST pass before chart is considered complete. </cannot_skip>

Automated Validation

RUN in order:

1. helm lint .
   → MUST pass with 0 failures

2. helm template test .
   → MUST render without errors

3. helm template test . --set keda.enabled=false
   → MUST render without errors (if worker exists)

4. Verify ALL application env vars are covered:
   → Read app's .env.example or config struct
   → Compare with configmap + secrets in values.yaml
   → REPORT any missing vars

5. Verify health check paths:
   → Read app's health endpoint registration code
   → Compare with probe paths in deployment template
   → REPORT any mismatches

Manual Checklist

CHECK each item:

[ ] Chart.yaml name has -helm suffix (unless exception)
[ ] All values quoted in ConfigMap ({{ $value | quote }})
[ ] No hardcoded credentials in values.yaml (use placeholders)
[ ] Security context: runAsNonRoot: true, drop ALL capabilities
[ ] Service type is ClusterIP (never NodePort or LoadBalancer)
[ ] HPA enabled by default with CPU and memory metrics
[ ] PDB enabled by default
[ ] Probes match actual application health endpoints
[ ] initContainers wait for all infrastructure dependencies
[ ] Secrets support useExistingSecret pattern
[ ] All env vars from app's .env.example are present
[ ] OTEL injection is conditional on ENABLE_TELEMETRY
[ ] AWS IAM sidecar is conditional on aws.rolesAnywhere.enabled
[ ] Ingress disabled by default

Pressure Resistance

See shared-patterns/shared-pressure-resistance.md

User Says	Your Response
"Skip the security context"	"Security context is MANDATORY. All containers MUST run as non-root."
"We don't need PDB"	"PDB is REQUIRED for production readiness. Adding it now."
"Just use the default health path"	"MUST verify health paths against application code. Wrong paths cause CrashLoopBackOff."
"We'll add env vars later"	"Missing env vars are the #1 cause of deployment failures. Reading .env.example now."
"No need for existing secret support"	"MANDATORY for production. Teams use external secret managers."

Anti-Rationalization Table

Rationalization	Why It's WRONG	Required Action
"Health probes are the same for all services"	Each service has unique endpoints. Wrong paths = CrashLoopBackOff	MUST read application source code
"We can use NodePort for testing"	Lerian convention: always ClusterIP. Ingress handles external access	Set service.type: ClusterIP
"Secrets can use default values"	Default passwords in values.yaml are a security risk	Use empty strings or placeholders
"One configmap for everything"	Sensitive data MUST be in Secrets, not ConfigMap	Split per ConfigMap vs Secrets rule
"The chart works, so it's done"	Must validate against app env vars AND lint AND template render	Run ALL validation steps
"initContainers are overkill"	Without dependency checks, pods crash before DB is ready	Add wait-for-dependencies

Execution Report Format

## Helm Chart Report: {service_name}

**Status:** [PASS|FAIL|PARTIAL]
**Chart Type:** [single|multi-component|umbrella]
**Components:** [list]
**Dependencies:** [list]

## Files Created/Modified
| File | Action | Status |
|------|--------|--------|
| Chart.yaml | CREATED | OK |
| values.yaml | CREATED | OK |
| templates/_helpers.tpl | CREATED | OK |
| templates/{component}/deployment.yaml | CREATED | OK |
| ... | ... | ... |

## Env Var Coverage
| Source (.env.example) | In ConfigMap | In Secrets | Status |
|-----------------------|-------------|------------|--------|
| SERVER_PORT | YES | - | OK |
| DB_PASSWORD | - | YES | OK |
| MISSING_VAR | NO | NO | MISSING |

## Validation Results
| Check | Status |
|-------|--------|
| helm lint | PASS/FAIL |
| helm template (default) | PASS/FAIL |
| helm template (no keda) | PASS/FAIL |
| Health paths verified | PASS/FAIL |
| All env vars covered | PASS/FAIL |
| Security context | PASS/FAIL |
| No hardcoded secrets | PASS/FAIL |

## Notes
- [Any deviations from standard with justification]

ring:dev-helm