cloud-ops

Installation
SKILL.md

Cloud Operations & Deployment

Quick Start (Development)

# 1. Infrastructure (PostgreSQL, Valkey, Geo, Filegate)
bun run infra

# 2. Core dev stack — 7 containers, enough to log in, see the dashboard, and manage accounts
bun run dev

# 3. Open the platform
open http://localhost:3000

To stop: bun run dev:down (includes extras via --profile extra) and bun run infra:down.

Dev Stack Shape

The compose file uses profiles so bun run dev stays light. Full spin-up is opt-in.

Command What it does
bun run dev Core set only — gateway, app-core, app-dashboard, app-accounts, app-logging, app-settings, app-notifications (7 containers)
bun run dev:full Core + all extras via --profile extra (20 containers total)
bun run dev:app <name> Start one extra app into the running stack — joins the existing network automatically
bun run dev:app stop <name> / logs <name> Stop / tail that app
bun run dev:down Tear down the dev stack

Why the split: the core set gives you login + dashboard + admin panel + log viewer + settings UI; extras (notebooks, files, spaces, weather, …) are spun up only when a specific app is under development.

Container Architecture

┌─────────────────────────────────────────────────┐
│                 docker compose                  │
│                                                 │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐      │
│  │ gateway  │  │ app-core │  │app-files │ ...   │  ← app containers (compose.dev.yml)
│  │ :3000    │  │ :3000    │  │ :3000    │       │
│  └────┬─────┘  └──────────┘  └──────────┘      │
│       │ proxy                                   │
│       └────────────────┐                        │
│  ┌──────────┐  ┌───────┴──┐  ┌──────────┐      │
│  │ postgres │  │  valkey  │  │ filegate │      │  ← infrastructure (compose.yml)
│  │ :5432    │  │  :6379   │  │ :4000    │      │
│  └──────────┘  └──────────┘  └──────────┘      │
└─────────────────────────────────────────────────┘

Network & Discovery

Both compose files share the same Docker Compose project name (= folder name cloud), which means they share the default network. That's the mechanism that lets an ad-hoc dev:app <name> container reach ipa_postgres, ipa_valkey, and gateway without any explicit network config. Don't override the project name with -p unless you're running parallel stacks.

Every app registers itself in Redis via createHeartbeat (60s interval, 2min TTL), carrying id, nav metadata, and baseUrl (e.g. http://app-files:3000). The gateway watches the registry and rebuilds its prefix-trie route table on change — usually within ≤5s of a new container appearing.

Gateway source: packages/gateway/src/index.ts.

Infrastructure Services (compose.yml)

Started with bun run infra:

Service Image Port Purpose
ipa_postgres postgres:15-alpine 5432 Primary database (max 300 connections)
ipa_valkey valkey/valkey:8-alpine 6379 Sessions, service registry, pub/sub
geo ghcr.io/valentinkolb/geo 8081 Geolocation service
filegate ghcr.io/valentinkolb/filegate 4000 File proxy with token auth

Persistent volumes:

  • ipa_postgres_data — PostgreSQL data
  • ipa_valkey_data — Valkey/Redis data
  • filegate_homes, filegate_groups — File storage

App Containers (compose.dev.yml)

Every app container:

  • Uses Dockerfile.dev (single-stage, oven/bun:1 base)
  • Mounts source for hot reload via --watch
  • Runs the CSS preload: --preload=/app/packages/cloud/scripts/preload.ts
  • Shares env via YAML anchors (x-env, x-app)

Core set (7, no profile — started by bun run dev): gateway, app-core, app-dashboard, app-accounts, app-logging, app-settings, app-notifications.

Extras (13, profiles: [extra]bun run dev:full or ad-hoc via dev:app): app-api-docs, app-notebooks, app-contacts, app-faq, app-files, app-ipa-hosts, app-oauth, app-proxy-auth, app-quotes, app-spaces, app-tools, app-ui-lab, app-weather.

Volume Mounts (Dev)

volumes:
  - ./packages/cloud/src:/app/packages/cloud/src       # shared core library
  - ./packages/{appId}/src:/app/packages/{appId}/src   # app source
  - ./styles.css:/app/styles.css                        # global styles entry

Changes to source files trigger automatic restart via Bun's --watch.

Adding a New App Container

  1. Add a service block in compose.dev.yml (extras go under profiles: [extra]):
app-my-app:
  <<: *app
  container_name: app-my-app
  environment: { <<: *env, APP_ID: my-app }
  profiles: [extra]           # omit for core-set apps
  volumes:
    - ./packages/cloud/src:/app/packages/cloud/src
    - ./packages/my-app/src:/app/packages/my-app/src
    - ./styles.css:/app/styles.css
  command: bun run --preload=/app/packages/cloud/scripts/preload.ts --watch packages/my-app/src/index.ts
  1. Add a COPY packages/my-app/package.json packages/my-app/ line in Dockerfile.dev so the install layer caches the new workspace.
  2. Start it standalone during development: bun run dev:app my-app. The app self-registers in Redis via createHeartbeat() on startup; the gateway picks it up within ~5 s without any central registration step.

Environment Variables

Full reference → references/env-reference.md

Required

DATABASE_URL=postgresql://ipa:ipa@ipa_postgres:5432/ipa
REDIS_URL=redis://ipa_valkey:6379
APP_SECRET=dev-secret-change-me-in-production    # encrypts settings at rest

APP_URL defaults to localhost:3000 if not set.

FreeIPA (optional for local dev)

FreeIPA settings are primarily managed via the runtime settings system (DB-backed, editable in admin UI), not env vars. Environment variables provide initial bootstrap values only:

FREEIPA_URL=freeipa.example.com          # default: freeipa.ipa.example.com
FREEIPA_SVC_USER=svc-cloud               # default: svc-cloud
FREEIPA_SVC_PASSWORD=change-me
GROUPS_ADMIN=admins                      # default: admins
GROUPS_BASE_SYNC=users                   # default: users
GROUPS_BASE_IPA_REALM=cloud              # default: cloud
GROUPS_EXCLUDED=editors,trust admins,admins

Note: These env vars are legacy bootstrap values. The authoritative configuration lives in the runtime settings system (DB-backed, editable in admin UI under freeipa.* keys). The env vars provide initial seed values on first startup and act as fallbacks if no DB value exists.

Development Shortcuts

ADMIN_LOGIN_TOKEN=dev-admin  # Emergency local admin login (any username + this as password)

Note: skipSetup (skip migrations) is an app.start() option, not an environment variable. There is no DISABLE_APPS env var implemented.

Build Process

CSS/Asset Building

Two paths, same Tailwind oxide scanner config:

  • Devpackages/cloud/scripts/preload.ts runs at process start, builds CSS into <workspace>/public/. Bun-plugin-tailwind scans the whole workspace.
  • Prod (docker)packages/cloud/scripts/build.ts runs at image-build time, emits everything into dist/. Generic over APP_ID. Apps that need extra build-time artefacts ship packages/<id>/scripts/build-extras.ts (only core does — global.css, logo.svg, katex.css served at /public/<plain-name>). The post-build pass walks packages/cloud + packages/<APP_ID> for *.{island,client}.tsx and removes any island chunks the SSR plugin emitted from other apps.

Each app's app.css ships at /public/<id>/app.css; shared assets at /public/<plain-name> are served from core.

TypeScript Checking

bun run typecheck    # Runs all checks in sequence:
# 1. check:skills              — validate skill files
# 2. check:boundaries          — enforce package boundaries
# 3. check:cycles              — detect circular dependencies
# 4. check:service-api-contracts — validate service/API contracts
# 5. check:biome               — format + lint (Biome)
# 6. per-package typecheck      — TypeScript compilation check

Linting

bun run lint         # Check only
bun run lint:fix     # Auto-fix
bun run format       # Format only

CI/CD

Two workflows, separate tag namespaces so they don't collide.

.github/workflows/docker.yml — per-app docker images

One single parametrised Dockerfile (3 stages: deps → build → runtime, oven/bun:1-alpine, --build-arg APP_ID=<id>). Multi-arch (linux/amd64 + linux/arm64). 19 apps produce images: gateway, core, plus app-<id> for the rest (including app-api-docs). ui-lab is dev-only and intentionally skipped. The standalone reference app lives in cloud-template.

Trigger What's built Image tags
push to main only apps with changed source. Changes to packages/cloud, Dockerfile, .dockerignore, bun.lock, package.json, styles.css or this workflow file fan out to ALL 19 :sha-<short>, :main
tag cloud-<image>-v<X.Y.Z> (e.g. cloud-app-notebooks-v0.1.2, cloud-gateway-v0.1.2) only that one image, validated against the 19-app allowlist :v<X.Y.Z>, :latest
workflow_dispatch all 19 on demand :sha-<short>

Pushed to ghcr.io/valentinkolb/cloud-<image>. Bulk-tag-push gotcha: GitHub Actions silently drops events past the first 3 tags in a single git push --tags. For multi-app releases, push tags one at a time with a small delay (for tag in ...; do git push origin "$tag"; sleep 3; done).

.github/workflows/npm.yml@valentinkolb/cloud to npm

OIDC trusted publisher (no NPM_TOKEN secret). The trusted publisher is configured once on npmjs.org for the package — Repository: ValentinKolb/cloud, Workflow: npm.yml.

Trigger Behaviour
tag npm-cloud-v<X.Y.Z> publishes that version with --provenance --access public
workflow_dispatch (input: version) manual emergency publish

Bump procedure:

  1. Edit packages/cloud/package.json → bump version
  2. Commit + push to main
  3. git tag npm-cloud-vX.Y.Z && git push origin npm-cloud-vX.Y.Z
  4. CI publishes with provenance

Why npm pkg set and not npm version: npm version triggers an internal install/lockfile update that walks workspace siblings and chokes on their workspace:* deps. npm pkg set version=X.Y.Z is a pure JSON edit — same result, no resolve.

Production Deployment

compose.prod.yml at the repo root pulls all 19 images from ghcr. Companion .env.prod.example.

Shape:

  • One YAML anchor x-shared-env declares DATABASE_URL/REDIS_URL/APP_SECRET; merged into every service's environment via x-app-defaults.
  • Two networks: cloud-internal (apps talk to each other) and traefik (external, only the gateway joins it with routing labels).
  • postgres, valkey, filegate are deliberately not defined in the file — deployments often run them on a separate host or VM. Add the services you need alongside.
  • APP_ID, FILEGATE_URL, FILEGATE_TOKEN are not set: container entrypoint already pins the app, and filegate config moved into runtime settings.

Infrastructure Details

PostgreSQL

  • Version: 15 (Alpine)
  • Max connections: 300 (configured in compose)
  • Schemas: One per app domain (auth.*, logging.*, settings.*, notifications.*, plus app-specific)
  • Migrations: Run on every app startup via lifecycle.setup() (idempotent DDL)
  • Connection: Via Bun's native sql template tag (no connection pool library needed — Bun manages it)

Valkey (Redis)

  • Version: 8 (Alpine) — drop-in Redis replacement
  • Persistence: --save 30 1 (snapshot every 30s if 1+ write)
  • Usage:
    • Session storage (session:{userId}:{token} with TTL)
    • App registry (apps/{appId} with 2min TTL + heartbeat)
    • Rate limiting state
    • Pub/sub for real-time features
  • Client: Bun's native Redis client or @valentinkolb/sync primitives

FreeIPA

External service (not containerized). Provides:

  • User authentication (Kerberos/form-based)
  • Group management
  • Password policies

The cloud communicates via JSON-RPC at https://{freeipa_url}/ipa/session/json.

Filegate

File proxy service for secure file access:

  • Token-based authentication (FILEGATE_TOKEN)
  • Path restrictions (ALLOWED_BASE_PATHS)
  • Redis integration for state
  • Volumes: filegate_homes (user files), filegate_groups (group files)
Related skills

More from valentinkolb/cloud

Installs
2
First Seen
9 days ago