cloud-ops
Cloud Operations & Deployment
Quick Start (Development)
# 1. Infrastructure (PostgreSQL, Valkey, Geo, Filegate)
bun run infra
# 2. Core dev stack — 7 containers, enough to log in, see the dashboard, and manage accounts
bun run dev
# 3. Open the platform
open http://localhost:3000
To stop: bun run dev:down (includes extras via --profile extra) and bun run infra:down.
Dev Stack Shape
The compose file uses profiles so bun run dev stays light. Full spin-up is opt-in.
| Command | What it does |
|---|---|
bun run dev |
Core set only — gateway, app-core, app-dashboard, app-accounts, app-logging, app-settings, app-notifications (7 containers) |
bun run dev:full |
Core + all extras via --profile extra (20 containers total) |
bun run dev:app <name> |
Start one extra app into the running stack — joins the existing network automatically |
bun run dev:app stop <name> / logs <name> |
Stop / tail that app |
bun run dev:down |
Tear down the dev stack |
Why the split: the core set gives you login + dashboard + admin panel + log viewer + settings UI; extras (notebooks, files, spaces, weather, …) are spun up only when a specific app is under development.
Container Architecture
┌─────────────────────────────────────────────────┐
│ docker compose │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ gateway │ │ app-core │ │app-files │ ... │ ← app containers (compose.dev.yml)
│ │ :3000 │ │ :3000 │ │ :3000 │ │
│ └────┬─────┘ └──────────┘ └──────────┘ │
│ │ proxy │
│ └────────────────┐ │
│ ┌──────────┐ ┌───────┴──┐ ┌──────────┐ │
│ │ postgres │ │ valkey │ │ filegate │ │ ← infrastructure (compose.yml)
│ │ :5432 │ │ :6379 │ │ :4000 │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────┘
Network & Discovery
Both compose files share the same Docker Compose project name (= folder name cloud), which means they share the default network. That's the mechanism that lets an ad-hoc dev:app <name> container reach ipa_postgres, ipa_valkey, and gateway without any explicit network config. Don't override the project name with -p unless you're running parallel stacks.
Every app registers itself in Redis via createHeartbeat (60s interval, 2min TTL), carrying id, nav metadata, and baseUrl (e.g. http://app-files:3000). The gateway watches the registry and rebuilds its prefix-trie route table on change — usually within ≤5s of a new container appearing.
Gateway source: packages/gateway/src/index.ts.
Infrastructure Services (compose.yml)
Started with bun run infra:
| Service | Image | Port | Purpose |
|---|---|---|---|
ipa_postgres |
postgres:15-alpine |
5432 | Primary database (max 300 connections) |
ipa_valkey |
valkey/valkey:8-alpine |
6379 | Sessions, service registry, pub/sub |
geo |
ghcr.io/valentinkolb/geo |
8081 | Geolocation service |
filegate |
ghcr.io/valentinkolb/filegate |
4000 | File proxy with token auth |
Persistent volumes:
ipa_postgres_data— PostgreSQL dataipa_valkey_data— Valkey/Redis datafilegate_homes,filegate_groups— File storage
App Containers (compose.dev.yml)
Every app container:
- Uses
Dockerfile.dev(single-stage,oven/bun:1base) - Mounts source for hot reload via
--watch - Runs the CSS preload:
--preload=/app/packages/cloud/scripts/preload.ts - Shares env via YAML anchors (
x-env,x-app)
Core set (7, no profile — started by bun run dev): gateway, app-core, app-dashboard, app-accounts, app-logging, app-settings, app-notifications.
Extras (13, profiles: [extra] — bun run dev:full or ad-hoc via dev:app): app-api-docs, app-notebooks, app-contacts, app-faq, app-files, app-ipa-hosts, app-oauth, app-proxy-auth, app-quotes, app-spaces, app-tools, app-ui-lab, app-weather.
Volume Mounts (Dev)
volumes:
- ./packages/cloud/src:/app/packages/cloud/src # shared core library
- ./packages/{appId}/src:/app/packages/{appId}/src # app source
- ./styles.css:/app/styles.css # global styles entry
Changes to source files trigger automatic restart via Bun's --watch.
Adding a New App Container
- Add a service block in
compose.dev.yml(extras go underprofiles: [extra]):
app-my-app:
<<: *app
container_name: app-my-app
environment: { <<: *env, APP_ID: my-app }
profiles: [extra] # omit for core-set apps
volumes:
- ./packages/cloud/src:/app/packages/cloud/src
- ./packages/my-app/src:/app/packages/my-app/src
- ./styles.css:/app/styles.css
command: bun run --preload=/app/packages/cloud/scripts/preload.ts --watch packages/my-app/src/index.ts
- Add a
COPY packages/my-app/package.json packages/my-app/line inDockerfile.devso the install layer caches the new workspace. - Start it standalone during development:
bun run dev:app my-app. The app self-registers in Redis viacreateHeartbeat()on startup; the gateway picks it up within ~5 s without any central registration step.
Environment Variables
Full reference →
references/env-reference.md
Required
DATABASE_URL=postgresql://ipa:ipa@ipa_postgres:5432/ipa
REDIS_URL=redis://ipa_valkey:6379
APP_SECRET=dev-secret-change-me-in-production # encrypts settings at rest
APP_URL defaults to localhost:3000 if not set.
FreeIPA (optional for local dev)
FreeIPA settings are primarily managed via the runtime settings system (DB-backed, editable in admin UI), not env vars. Environment variables provide initial bootstrap values only:
FREEIPA_URL=freeipa.example.com # default: freeipa.ipa.example.com
FREEIPA_SVC_USER=svc-cloud # default: svc-cloud
FREEIPA_SVC_PASSWORD=change-me
GROUPS_ADMIN=admins # default: admins
GROUPS_BASE_SYNC=users # default: users
GROUPS_BASE_IPA_REALM=cloud # default: cloud
GROUPS_EXCLUDED=editors,trust admins,admins
Note: These env vars are legacy bootstrap values. The authoritative configuration lives in the runtime settings system (DB-backed, editable in admin UI under freeipa.* keys). The env vars provide initial seed values on first startup and act as fallbacks if no DB value exists.
Development Shortcuts
ADMIN_LOGIN_TOKEN=dev-admin # Emergency local admin login (any username + this as password)
Note: skipSetup (skip migrations) is an app.start() option, not an environment variable. There is no DISABLE_APPS env var implemented.
Build Process
CSS/Asset Building
Two paths, same Tailwind oxide scanner config:
- Dev —
packages/cloud/scripts/preload.tsruns at process start, builds CSS into<workspace>/public/. Bun-plugin-tailwind scans the whole workspace. - Prod (docker) —
packages/cloud/scripts/build.tsruns at image-build time, emits everything intodist/. Generic overAPP_ID. Apps that need extra build-time artefacts shippackages/<id>/scripts/build-extras.ts(onlycoredoes — global.css, logo.svg, katex.css served at/public/<plain-name>). The post-build pass walkspackages/cloud + packages/<APP_ID>for*.{island,client}.tsxand removes any island chunks the SSR plugin emitted from other apps.
Each app's app.css ships at /public/<id>/app.css; shared assets at /public/<plain-name> are served from core.
TypeScript Checking
bun run typecheck # Runs all checks in sequence:
# 1. check:skills — validate skill files
# 2. check:boundaries — enforce package boundaries
# 3. check:cycles — detect circular dependencies
# 4. check:service-api-contracts — validate service/API contracts
# 5. check:biome — format + lint (Biome)
# 6. per-package typecheck — TypeScript compilation check
Linting
bun run lint # Check only
bun run lint:fix # Auto-fix
bun run format # Format only
CI/CD
Two workflows, separate tag namespaces so they don't collide.
.github/workflows/docker.yml — per-app docker images
One single parametrised Dockerfile (3 stages: deps → build → runtime, oven/bun:1-alpine, --build-arg APP_ID=<id>). Multi-arch (linux/amd64 + linux/arm64). 19 apps produce images: gateway, core, plus app-<id> for the rest (including app-api-docs). ui-lab is dev-only and intentionally skipped. The standalone reference app lives in cloud-template.
| Trigger | What's built | Image tags |
|---|---|---|
push to main |
only apps with changed source. Changes to packages/cloud, Dockerfile, .dockerignore, bun.lock, package.json, styles.css or this workflow file fan out to ALL 19 |
:sha-<short>, :main |
tag cloud-<image>-v<X.Y.Z> (e.g. cloud-app-notebooks-v0.1.2, cloud-gateway-v0.1.2) |
only that one image, validated against the 19-app allowlist | :v<X.Y.Z>, :latest |
workflow_dispatch |
all 19 on demand | :sha-<short> |
Pushed to ghcr.io/valentinkolb/cloud-<image>. Bulk-tag-push gotcha: GitHub Actions silently drops events past the first 3 tags in a single git push --tags. For multi-app releases, push tags one at a time with a small delay (for tag in ...; do git push origin "$tag"; sleep 3; done).
.github/workflows/npm.yml — @valentinkolb/cloud to npm
OIDC trusted publisher (no NPM_TOKEN secret). The trusted publisher is configured once on npmjs.org for the package — Repository: ValentinKolb/cloud, Workflow: npm.yml.
| Trigger | Behaviour |
|---|---|
tag npm-cloud-v<X.Y.Z> |
publishes that version with --provenance --access public |
workflow_dispatch (input: version) |
manual emergency publish |
Bump procedure:
- Edit
packages/cloud/package.json→ bumpversion - Commit + push to
main git tag npm-cloud-vX.Y.Z && git push origin npm-cloud-vX.Y.Z- CI publishes with provenance
Why npm pkg set and not npm version: npm version triggers an internal install/lockfile update that walks workspace siblings and chokes on their workspace:* deps. npm pkg set version=X.Y.Z is a pure JSON edit — same result, no resolve.
Production Deployment
compose.prod.yml at the repo root pulls all 19 images from ghcr. Companion .env.prod.example.
Shape:
- One YAML anchor
x-shared-envdeclaresDATABASE_URL/REDIS_URL/APP_SECRET; merged into every service'senvironmentviax-app-defaults. - Two networks:
cloud-internal(apps talk to each other) andtraefik(external, only the gateway joins it with routing labels). postgres,valkey,filegateare deliberately not defined in the file — deployments often run them on a separate host or VM. Add the services you need alongside.APP_ID,FILEGATE_URL,FILEGATE_TOKENare not set: container entrypoint already pins the app, and filegate config moved into runtime settings.
Infrastructure Details
PostgreSQL
- Version: 15 (Alpine)
- Max connections: 300 (configured in compose)
- Schemas: One per app domain (
auth.*,logging.*,settings.*,notifications.*, plus app-specific) - Migrations: Run on every app startup via
lifecycle.setup()(idempotent DDL) - Connection: Via Bun's native
sqltemplate tag (no connection pool library needed — Bun manages it)
Valkey (Redis)
- Version: 8 (Alpine) — drop-in Redis replacement
- Persistence:
--save 30 1(snapshot every 30s if 1+ write) - Usage:
- Session storage (
session:{userId}:{token}with TTL) - App registry (
apps/{appId}with 2min TTL + heartbeat) - Rate limiting state
- Pub/sub for real-time features
- Session storage (
- Client: Bun's native Redis client or
@valentinkolb/syncprimitives
FreeIPA
External service (not containerized). Provides:
- User authentication (Kerberos/form-based)
- Group management
- Password policies
The cloud communicates via JSON-RPC at https://{freeipa_url}/ipa/session/json.
Filegate
File proxy service for secure file access:
- Token-based authentication (
FILEGATE_TOKEN) - Path restrictions (
ALLOWED_BASE_PATHS) - Redis integration for state
- Volumes:
filegate_homes(user files),filegate_groups(group files)