sync-system-bus
Sync System Bus Worker
Deploy system-bus-worker to the local joelclaw k8s cluster (Talos v1.12.4 / k8s v1.35.0).
Important: .github/workflows/system-bus-worker-deploy.yml has a deploy job on self-hosted. That runner does not exist, so deploys must be completed locally.
Quick Deploy
The publish script handles everything — build, auth, push, k8s apply, rollout, verification:
cd ~/Code/joelhooks/joelclaw
k8s/publish-system-bus-worker.sh
Optional: pass a tag (defaults to timestamp):
k8s/publish-system-bus-worker.sh a6de1e0
GHCR Auth Order
publish-system-bus-worker.sh now authenticates in this order:
GHCR_TOKENenv var (if provided)secrets lease ghcr_pat(agent-secrets)gh auth tokenfallback
If your gh auth token lacks read:packages/write:packages, push will 403. Use ghcr_pat.
What the Script Does
- Builds ARM64 Docker image (required — Talos/Colima node is aarch64)
- Authenticates to GHCR (prefers
agent-secretsleaseghcr_pat; falls back togh auth token) with temp Docker config - Pushes
ghcr.io/joelhooks/system-bus-worker:${TAG}and:latest - Updates the image ref in
k8s/system-bus-worker.yaml kubectl applythe manifest- Waits for rollout (
--timeout=180s) - Probes the new pod's health endpoint
Post-Deploy Verification
joelclaw refresh # Re-register functions with Inngest
joelclaw functions | grep "<new-function>" # Verify new function appears
joelclaw status # Full health check
joelclaw runs --count 3 # Confirm runs are flowing
Restart Safety (ADR-0156)
The worker is stateless between Inngest steps. Each step is a separate HTTP call; Inngest stores step output server-side. This means k8s rolling restarts are safe — Inngest retries the in-flight step against the new pod.
Critical rule: NEVER set retries: 0 on Inngest functions. With retries: 0, a worker restart during step execution kills the run permanently. With retries ≥ 1, Inngest retries and hits the new pod.
Current story-pipeline has retries: 2 specifically to survive the ~1s restart window during deploys.
What happens during deploy
Step executing on old pod → old pod terminates → step fails (SDK unreachable)
→ Inngest retries after backoff → new pod handles retry → step completes
All previously completed steps are memoized. Only the in-flight step reruns.
Long-running steps (codex implement: 5-10 min)
If a deploy kills a codex step mid-execution, the step reruns from scratch on the new pod (5-10 min wasted but not fatal). For time-critical deploys during active loops, check joelclaw loop status first and deploy between stories.
Manual Steps (if script fails)
Build
cd ~/Code/joelhooks/joelclaw
TAG=$(git rev-parse --short HEAD)
IMAGE="ghcr.io/joelhooks/system-bus-worker:${TAG}"
docker build --platform linux/arm64 -t "$IMAGE" -t ghcr.io/joelhooks/system-bus-worker:latest -f packages/system-bus/Dockerfile .
Push
gh auth token | docker login ghcr.io -u $(gh api user -q .login) --password-stdin
docker push "$IMAGE"
docker push ghcr.io/joelhooks/system-bus-worker:latest
Deploy
kubectl -n joelclaw set image deployment/system-bus-worker system-bus-worker="$IMAGE"
kubectl -n joelclaw rollout status deployment/system-bus-worker --timeout=180s
Verify
joelclaw refresh
joelclaw status
Log
slog write --action deploy --tool system-bus-worker --detail "deployed ${IMAGE}" --reason "sync worker changes"
Talon Rebuild (Adding Secrets / Changing Worker Supervision)
Talon is a Rust binary that supervises the worker process. It leases secrets from agent-secrets and injects them as env vars. When adding new webhook secrets or changing supervision behavior:
# 1. Add secret to agent-secrets
secrets add my_new_secret --value "the-secret-value"
# 2. Update Talon source — add mapping to SECRET_MAPPINGS array
# File: ~/Code/joelhooks/joelclaw/infra/talon/src/worker.rs
# ("my_new_secret", "MY_NEW_SECRET_ENV_VAR"),
# 3. Recompile (fast — ~3s incremental)
export PATH="$HOME/.cargo/bin:$PATH"
cd ~/Code/joelhooks/joelclaw/infra/talon
cargo build --release
# 4. Install + re-sign (macOS kills unsigned binaries)
cp target/release/talon ~/.local/bin/talon
codesign -fs - ~/.local/bin/talon
# 5. Restart via launchd
launchctl bootout gui/$(id -u)/com.joel.talon
sleep 1
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.joel.talon.plist
sleep 12
# 6. Verify
curl -s http://localhost:3111/ | jq '.status'
curl -X PUT http://localhost:3111/api/inngest # Force function sync
Current SECRET_MAPPINGS (worker.rs)
| Secret Name | Env Var |
|---|---|
claude_oauth_token |
CLAUDE_CODE_OAUTH_TOKEN |
todoist_client_secret |
TODOIST_CLIENT_SECRET |
todoist_api_token |
TODOIST_API_TOKEN |
front_rules_webhook_secret |
FRONT_WEBHOOK_SECRET |
front_api_token |
FRONT_API_TOKEN |
vercel_webhook_secret |
VERCEL_WEBHOOK_SECRET |
joelclaw_webhook_secret |
JOELCLAW_WEBHOOK_SECRET |
revalidation_secret |
REVALIDATION_SECRET |
Talon Key Paths
| What | Path |
|---|---|
| Binary | ~/.local/bin/talon |
| Source | ~/Code/joelhooks/joelclaw/infra/talon/src/ |
| LaunchAgent plist | ~/Library/LaunchAgents/com.joel.talon.plist |
| Logs | ~/.local/log/talon.log / talon.err |
| ADR | ~/Vault/docs/decisions/0159-talon-worker-manager.md |
Gotcha: codesign -fs - is required
After cargo build, the binary has adhoc linker-signed signature. macOS launchd may SIGKILL:9 it. Re-signing with codesign -fs - fixes this.
Common Gotchas
| Problem | Cause | Fix |
|---|---|---|
exec format error in pod |
Built for amd64, not arm64 | Rebuild with --platform linux/arm64 |
GHCR push fails with 403 Forbidden on blob HEAD |
gh auth token missing package scopes |
Use ghcr_pat via agent-secrets or export GHCR_TOKEN with package scope |
docker-credential-desktop error |
Docker config has credsStore | Script uses temp config dir — if manual, remove "credsStore": "desktop" |
| Function missing after deploy | Not in index file | Add to both index.host.ts AND index.cluster.ts |
| Function still missing | Stale Inngest registration | joelclaw refresh then check again |
| "Unable to reach SDK URL" | Worker pod not ready | Wait for rollout, then joelclaw refresh |
| Runs stuck after deploy | retries: 0 on the function |
Set retries: 2 minimum (ADR-0156) |
| Stale app registrations | Multiple apps registered | Delete old registrations in Inngest dashboard (:8289) |
Key Paths
| What | Path |
|---|---|
| Publish script | k8s/publish-system-bus-worker.sh |
| Dockerfile | packages/system-bus/Dockerfile |
| k8s manifest | k8s/system-bus-worker.yaml |
| Host function index | packages/system-bus/src/inngest/functions/index.host.ts |
| Cluster function index | packages/system-bus/src/inngest/functions/index.cluster.ts |
| Worker entry | packages/system-bus/src/serve.ts |
| GH Actions workflow | .github/workflows/system-bus-worker-deploy.yml |
| ADR-0156 | ~/Vault/docs/decisions/0156-graceful-worker-restart.md |
More from joelhooks/joelclaw
cli-design
Design and build agent-first CLIs with HATEOAS JSON responses, context-protecting output, and self-documenting command trees. Use when creating new CLI tools, adding commands to existing CLIs (joelclaw, slog), or reviewing CLI design for agent-friendliness. Triggers on 'build a CLI', 'add a command', 'CLI design', 'agent-friendly output', or any task involving command-line tool creation.
129k8s
>-
88docker-sandbox
Create, manage, and execute agent tools (claude, codex) inside Docker sandboxes for isolated code execution. Use when running agent loops, spawning tool subprocesses, or any task requiring process isolation. Triggers on "sandbox", "isolated execution", "docker sandbox", "safe agent execution", or when working on agent loop infrastructure.
86joel-writing-style
Joel's writing voice and style guide for joelclaw.com content. Use when writing, editing, or reviewing any blog post, essay, book chapter, or prose content for joelclaw.com. Also use when asked to 'write like Joel,' 'match Joel's voice,' 'draft a post,' 'write content for the blog,' or 'review this for voice.' This skill captures Joel's specific writing patterns derived from ~90,000 words of published content spanning 2012–2026. Cross-reference with copy-editing and copywriting skills for marketing-specific copy.
81task-management
Manage Joel's task system in Todoist. Triggers on: 'add a task', 'create a todo', 'what's on my list', 'today's tasks', 'what do I need to do', 'remind me to', 'inbox', 'complete', 'mark done', 'weekly review', 'groom tasks', 'what's next', or when actionable items emerge from other work. Also triggers when Joel mentions something he needs to do in passing — capture it.
54skill-review
Audit and maintain the joelclaw skill inventory. Use when checking skill health, fixing broken symlinks, finding stale skills, or running the skill garden. Triggers: 'skill audit', 'check skills', 'stale skills', 'skill health', 'skill garden', 'broken skill', 'skill review', 'fix skills', 'garden skills', or any task involving skill inventory maintenance.
49