Env & Secrets Manager

Tier: POWERFUL Category: Engineering / Security Maintainer: Claude Skills Team

Overview

Complete environment variable and secrets management lifecycle: .env file structure across dev/staging/production, .env.example auto-generation that strips sensitive values, required-variable validation at startup, secret leak detection in git history, credential rotation playbooks, environment drift detection, and integration with HashiCorp Vault, AWS SSM, 1Password CLI, and Doppler.

Keywords

secrets management, environment variables, .env, secret rotation, HashiCorp Vault, AWS SSM, 1Password, Doppler, secret leak detection, credential rotation, environment drift

Core Capabilities

1. .env Lifecycle Management

Structured .env layout with categorized sections
Auto-generation of .env.example from .env (strips sensitive values)
Environment-specific files (.env.local, .env.staging, .env.production)
Validation scripts that fail fast on missing required variables

2. Secret Leak Detection

Regex scan of git history for exposed credentials
Pre-commit hook integration to block secret commits
Pattern matching for API keys, tokens, passwords, private keys
Working tree and staged file scanning

3. Credential Rotation

Step-by-step rotation playbooks per secret type
Scope analysis (find everywhere a secret is used)
Zero-downtime rotation with dual-read period
Post-rotation verification and monitoring

4. Secret Manager Integration

HashiCorp Vault KV v2 with OIDC authentication
AWS SSM Parameter Store with KMS encryption
1Password CLI with template injection
Doppler with project/config management

When to Use

Setting up a new project — scaffold .env.example and validation
Before every commit — scan for accidentally staged secrets
Post-incident — rotate leaked credentials systematically
Onboarding developers — provide complete environment setup
Auditing — detect environment drift between staging and production
Compliance — demonstrate secret management practices

.env File Structure

Canonical Layout

# ─── Application ───────────────────────────────────
APP_NAME=myapp
APP_ENV=development              # development | staging | production
APP_PORT=3000
APP_URL=http://localhost:3000    # REQUIRED: public base URL
APP_SECRET=                      # REQUIRED: min 32 chars, used for signing

# ─── Database ──────────────────────────────────────
DATABASE_URL=                    # REQUIRED: full connection string
DATABASE_POOL_MIN=2
DATABASE_POOL_MAX=10
DATABASE_SSL=false               # true in staging/production

# ─── Authentication ────────────────────────────────
AUTH_JWT_SECRET=                  # REQUIRED: min 32 chars
AUTH_JWT_EXPIRY=3600             # seconds
AUTH_REFRESH_SECRET=             # REQUIRED: min 32 chars
AUTH_REFRESH_EXPIRY=604800       # 7 days in seconds

# ─── Third-Party Services ─────────────────────────
STRIPE_SECRET_KEY=               # REQUIRED in production
STRIPE_WEBHOOK_SECRET=           # REQUIRED in production
STRIPE_PUBLISHABLE_KEY=          # REQUIRED (public, safe to expose)
SENDGRID_API_KEY=                # REQUIRED for email features
SENTRY_DSN=                      # Optional: error tracking

# ─── Storage ───────────────────────────────────────
AWS_ACCESS_KEY_ID=               # Prefer IAM roles in production
AWS_SECRET_ACCESS_KEY=
AWS_REGION=us-east-1
S3_BUCKET=

# ─── Monitoring ────────────────────────────────────
DD_API_KEY=
LOG_LEVEL=debug                  # debug | info | warn | error

File Hierarchy

.env.example        → Committed to git. Keys only, no values. Safe defaults noted.
.env                → Local development. NEVER committed. In .gitignore.
.env.local          → Local overrides. NEVER committed.
.env.test           → Test environment. May be committed if no secrets.
.env.staging        → Reference only. Actual values in secret manager.
.env.production     → NEVER exists on disk. Pulled from secret manager at runtime.

.gitignore Patterns (Required)

# Environment files
.env
.env.local
.env.*.local
.env.development
.env.staging
.env.production

# Secret files
*.pem
*.key
*.p12
*.pfx
secrets.json
secrets.yaml
credentials.json
service-account*.json

# Cloud credentials
.aws/credentials
.gcloud/

# Terraform state (may contain secrets)
*.tfstate
*.tfstate.backup

Startup Validation Script

#!/usr/bin/env python3
"""Validate required environment variables at application startup."""

import os
import sys
import re

REQUIRED_VARS = {
    "APP_SECRET": {"min_length": 32, "description": "Application signing secret"},
    "DATABASE_URL": {"pattern": r"^postgres(ql)?://", "description": "PostgreSQL connection string"},
    "AUTH_JWT_SECRET": {"min_length": 32, "description": "JWT signing secret"},
}

REQUIRED_IN_PRODUCTION = {
    "STRIPE_SECRET_KEY": {"pattern": r"^sk_(live|test)_", "description": "Stripe secret key"},
    "STRIPE_WEBHOOK_SECRET": {"pattern": r"^whsec_", "description": "Stripe webhook secret"},
    "SENDGRID_API_KEY": {"pattern": r"^SG\.", "description": "SendGrid API key"},
    "SENTRY_DSN": {"pattern": r"^https://", "description": "Sentry DSN"},
}

def validate() -> list[str]:
    errors = []
    env = os.environ.get("APP_ENV", "development")

    vars_to_check = dict(REQUIRED_VARS)
    if env == "production":
        vars_to_check.update(REQUIRED_IN_PRODUCTION)

    for var_name, rules in vars_to_check.items():
        value = os.environ.get(var_name, "")

        if not value:
            errors.append(f"MISSING: {var_name} — {rules['description']}")
            continue

        if "min_length" in rules and len(value) < rules["min_length"]:
            errors.append(
                f"TOO SHORT: {var_name} is {len(value)} chars, need {rules['min_length']}+"
            )

        if "pattern" in rules and not re.match(rules["pattern"], value):
            errors.append(
                f"INVALID FORMAT: {var_name} does not match expected pattern"
            )

    return errors

if __name__ == "__main__":
    errors = validate()
    if errors:
        print("Environment validation FAILED:", file=sys.stderr)
        for e in errors:
            print(f"  {e}", file=sys.stderr)
        sys.exit(1)
    print("Environment validation passed.")

Secret Leak Detection

Git History Scanner

#!/bin/bash
# Scan git history for leaked secrets

echo "Scanning git history for potential secrets..."

PATTERNS=(
  'AKIA[0-9A-Z]{16}'                          # AWS Access Key
  'AIza[0-9A-Za-z\-_]{35}'                    # Google API Key
  'sk_(live|test)_[0-9a-zA-Z]{24,}'           # Stripe Secret Key
  'ghp_[0-9a-zA-Z]{36}'                       # GitHub Personal Access Token
  'glpat-[0-9a-zA-Z\-]{20,}'                  # GitLab Personal Access Token
  'xoxb-[0-9]{10,}-[0-9]{10,}-[a-zA-Z0-9]{24}' # Slack Bot Token
  'SG\.[0-9A-Za-z\-_]{22}\.[0-9A-Za-z\-_]{43}' # SendGrid API Key
  '-----BEGIN (RSA |EC )?PRIVATE KEY-----'      # Private Keys
  'password\s*=\s*["\x27][^"\x27]{8,}["\x27]'  # Hardcoded passwords
)

FOUND=0
for pattern in "${PATTERNS[@]}"; do
  MATCHES=$(git log -p --all -S "$pattern" --format="%H %an %ad %s" 2>/dev/null | head -20)
  if [ -n "$MATCHES" ]; then
    echo ""
    echo "FOUND pattern: $pattern"
    echo "$MATCHES"
    FOUND=$((FOUND + 1))
  fi
done

if [ "$FOUND" -gt 0 ]; then
  echo ""
  echo "WARNING: Found $FOUND potential secret patterns in git history."
  echo "Run 'git filter-repo' or BFG Repo-Cleaner to remove them."
  exit 1
else
  echo "No secrets detected in git history."
fi

Pre-Commit Hook

#!/bin/bash
# .git/hooks/pre-commit — block commits containing secrets

PATTERNS=(
  'AKIA[0-9A-Z]{16}'
  'sk_(live|test)_[0-9a-zA-Z]{24,}'
  'ghp_[0-9a-zA-Z]{36}'
  '-----BEGIN (RSA |EC )?PRIVATE KEY-----'
)

FILES=$(git diff --cached --name-only --diff-filter=ACM)

for file in $FILES; do
  for pattern in "${PATTERNS[@]}"; do
    if git diff --cached -- "$file" | grep -qE "$pattern"; then
      echo "BLOCKED: Potential secret detected in $file"
      echo "Pattern: $pattern"
      echo "Remove the secret and try again."
      exit 1
    fi
  done
done

Credential Rotation Playbook

Step 1: Scope the Secret

# Find everywhere a secret is referenced
SECRET_NAME="STRIPE_SECRET_KEY"

# In code
grep -r "$SECRET_NAME" src/ lib/ app/ --include="*.ts" --include="*.py" -l

# In CI/CD
grep -r "$SECRET_NAME" .github/ .gitlab-ci.yml docker-compose.yml -l

# In infrastructure
grep -r "$SECRET_NAME" terraform/ k8s/ helm/ -l 2>/dev/null

# In secret managers
vault kv get -field=$SECRET_NAME secret/myapp/prod 2>/dev/null
aws ssm get-parameter --name "/myapp/prod/$SECRET_NAME" 2>/dev/null
doppler secrets get $SECRET_NAME --project myapp --config prod 2>/dev/null

Step 2: Generate New Secret

# Generic secret (32 bytes, base64)
openssl rand -base64 32

# JWT secret (64 bytes for HS256)
openssl rand -base64 64

# API key format (alphanumeric)
openssl rand -hex 32

Step 3: Dual-Write Period

Timeline:
─────────────────────────────────────────────────────
T+0:   Generate new secret
T+1:   Deploy code that accepts BOTH old and new secrets
T+2:   Update secret in ALL locations to new value
T+3:   Verify all services work with new secret
T+4:   Deploy code that accepts ONLY new secret
T+5:   Invalidate/revoke old secret
T+6:   Monitor for 24 hours for any auth failures
─────────────────────────────────────────────────────

Step 4: Verify and Monitor

# Check for auth failures in logs (24 hours after rotation)
# Replace with your actual log query
grep -i "unauthorized\|auth.*fail\|invalid.*token" /var/log/app/*.log | tail -20

# Verify new credentials work
curl -s -o /dev/null -w "%{http_code}" \
  -H "Authorization: Bearer $NEW_TOKEN" \
  https://api.myapp.com/health

Secret Manager Integration

HashiCorp Vault

# Authenticate via OIDC
export VAULT_ADDR="https://vault.company.com"
vault login -method=oidc

# Store secrets
vault kv put secret/myapp/prod \
  DATABASE_URL="postgres://user:pass@host/db" \
  APP_SECRET="$(openssl rand -base64 32)" \
  STRIPE_SECRET_KEY="sk_live_..."

# Read secrets into environment
eval $(vault kv get -format=json secret/myapp/prod | \
  jq -r '.data.data | to_entries[] | "export \(.key)=\(.value|@sh)"')

# Rotate a single secret
vault kv patch secret/myapp/prod \
  APP_SECRET="$(openssl rand -base64 32)"

AWS SSM Parameter Store

# Store as encrypted parameter
aws ssm put-parameter \
  --name "/myapp/prod/DATABASE_URL" \
  --value "postgres://..." \
  --type "SecureString" \
  --key-id "alias/myapp-secrets" \
  --overwrite

# Read all parameters for an environment
aws ssm get-parameters-by-path \
  --path "/myapp/prod/" \
  --with-decryption \
  --query "Parameters[*].[Name,Value]" \
  --output text

Doppler

# Set up project
doppler setup --project myapp --config prod

# Run with secrets injected (recommended for production)
doppler run -- node server.js

# Download for local dev
doppler secrets download --no-file --format env > .env.local

Environment Drift Detection

#!/bin/bash
# Compare environment variable keys between staging and production

STAGING_KEYS=$(doppler secrets --project myapp --config staging --format json | \
  jq -r 'keys[]' | sort)
PROD_KEYS=$(doppler secrets --project myapp --config prod --format json | \
  jq -r 'keys[]' | sort)

ONLY_STAGING=$(comm -23 <(echo "$STAGING_KEYS") <(echo "$PROD_KEYS"))
ONLY_PROD=$(comm -13 <(echo "$STAGING_KEYS") <(echo "$PROD_KEYS"))

if [ -n "$ONLY_STAGING" ]; then
  echo "DRIFT: Keys in STAGING but NOT in PROD:"
  echo "$ONLY_STAGING" | sed 's/^/  /'
fi

if [ -n "$ONLY_PROD" ]; then
  echo "DRIFT: Keys in PROD but NOT in STAGING:"
  echo "$ONLY_PROD" | sed 's/^/  /'
fi

[ -z "$ONLY_STAGING" ] && [ -z "$ONLY_PROD" ] && echo "No drift detected."

Common Pitfalls

Committing .env to git — add .env to .gitignore on day 1; use pre-commit hooks as a safety net
Echoing secrets in CI logs — never echo $SECRET; mask variables in CI settings
Rotating in only one location — secrets exist in CI, hosting, Docker, K8s; update ALL locations
Weak secrets — APP_SECRET=mysecret is not a secret; use openssl rand -base64 32
Shared secrets across environments — dev and prod must have different secrets, always
No monitoring after rotation — watch for auth failures for 24 hours after rotating credentials
.env.example with real values — example files are public; strip everything sensitive
Long-lived credentials — prefer short-lived tokens (OIDC, instance roles) over permanent API keys

Best Practices

Secret manager is source of truth — .env files are for local dev only; never in production
Rotate on a schedule — quarterly minimum for long-lived keys, not just after incidents
Principle of least privilege — each service gets its own API key with minimal permissions
Validate at startup — fail fast on missing required variables before serving traffic
Never log secrets — add middleware that redacts known secret patterns from log output
Use short-lived credentials — prefer OIDC/instance roles over long-lived access keys
Audit access — log every secret read in Vault/SSM; alert on anomalous access patterns
Document rotation playbooks — write them before an incident, not during one

Troubleshooting

Problem	Cause	Solution
Startup validation fails with MISSING for a set variable	Variable is set in `.env` but the app reads from a different file (e.g., `.env.local` overrides it to empty)	Check file hierarchy load order; ensure the correct `.env.*` file is loaded and no override blanks the value
Pre-commit hook passes but CI detects a leaked secret	Pre-commit patterns list is out of date or does not cover the token format CI scans for	Sync the regex pattern list between the pre-commit hook and CI scanner; add the missing pattern
`vault kv get` returns "permission denied"	OIDC token expired or the Vault policy does not grant read access to the target path	Re-authenticate with `vault login -method=oidc` and verify the policy includes `read` capability on the secret path
Environment drift detection shows false positives	One environment uses a prefix convention (e.g., `NEXT_PUBLIC_`) that the other does not	Add an exclusion list of known environment-specific keys to the drift script
Secret rotation causes service outage	Code was deployed without the dual-read period; only the new secret is accepted immediately	Always deploy the dual-read code change first, then update the secret value, then remove old-secret support
`.env.example` accidentally contains real credentials	Developer copied `.env` to `.env.example` without stripping values	Run the auto-generation script to rebuild `.env.example` from `.env` with values stripped; add a CI check that `.env.example` values match safe defaults only
AWS SSM `put-parameter` fails with AccessDeniedException	IAM role lacks `ssm:PutParameter` or `kms:Encrypt` permissions for the target key	Attach the required IAM policy granting `ssm:PutParameter` and `kms:Encrypt` on the KMS key alias used for SecureString

Success Criteria

Zero secrets in git history — secret leak scanner reports 0 findings across all branches
100% startup validation coverage — every required variable is declared in the validation script; no production deploy starts with missing vars
Rotation completed within SLA — credential rotation finishes within 4 hours of incident detection, including dual-write period and verification
Environment drift below 5% — staging and production variable key sets differ by no more than 5% (intentional differences documented)
Pre-commit hook adoption at 100% — every contributor has the secret-blocking pre-commit hook installed and active
Quarterly rotation compliance — all long-lived credentials are rotated at least once per quarter with audit trail in the secret manager
Post-rotation monitoring green — zero authentication failures attributed to stale credentials in the 24-hour window after each rotation

Scope & Limitations

This skill covers:

.env file scaffolding, hierarchy, and validation for any language/framework
Secret leak detection in git history, staged files, and working tree
Credential rotation playbooks with zero-downtime dual-read strategy
Integration patterns for HashiCorp Vault, AWS SSM, 1Password CLI, and Doppler

This skill does NOT cover:

Runtime secret injection in Kubernetes (see engineering/ci-cd-pipeline-builder for deployment pipeline secrets)
Infrastructure-as-code for provisioning Vault clusters or SSM policies (see engineering/ci-cd-pipeline-builder)
Application-level encryption at rest or in transit (see engineering/api-design-reviewer for API security patterns)
Identity and access management (IAM) role design or SSO/OIDC provider configuration (see ra-qm-team/ compliance skills for access control frameworks)

Integration Points

Skill	Integration	Data Flow
`engineering/ci-cd-pipeline-builder`	Inject secrets from Vault/SSM/Doppler into CI/CD pipeline stages	Rotation playbook outputs feed pipeline secret-update steps
`engineering/dependency-auditor`	Flag dependencies that bundle or require hardcoded credentials	Dependency audit findings trigger secret leak scans on affected repos
`engineering/skill-security-auditor`	Validate that no skill packages ship embedded secrets or credentials	Security audit references this skill's regex patterns for detection
`engineering/codebase-onboarding`	Include `.env.example` setup and secret-manager access in onboarding checklists	Onboarding workflow consumes the `.env` hierarchy and validation script
`engineering/observability-designer`	Monitor authentication failures post-rotation; alert on anomalous secret access	Post-rotation verification metrics flow into observability dashboards
`ra-qm-team/soc2-compliance-auditor`	Demonstrate secret management controls for SOC 2 CC6.1 and CC6.6 criteria	Rotation audit logs and access policies serve as SOC 2 evidence artifacts

env-secrets-manager