agentdeploy-deploy
AgentDeploy Deploy
Use this skill when the user wants an application deployed onto AgentDeploy, or when an existing AgentDeploy deployment needs to be updated or debugged.
What this skill covers
- infer the right split between
SharedInfraandService - choose the correct workload type and minimum infrastructure
- validate and dry-run before changing live state
- deploy with
agentdeploythrough the Platform API when available, then poll structured status - debug policy, auth, infrastructure, and rollout failures
Read references/service-contract.md when writing or editing SharedInfra or Service.
Read references/operations.md when running the CLI or handling failures.
Use the templates in assets/ as the starting point:
assets/shared-infra.yamlassets/service-web.yamlassets/service-api.yamlassets/service-worker.yamlassets/service-cron.yaml
Prerequisite: CLI availability
Before using this skill, make sure agentdeploy is installed and on PATH.
Current supported user install path:
command -v brew
# if brew is missing, stop and ask the user to install Homebrew themselves from:
# https://brew.sh/
# continue only after brew is available on PATH
command -v gh
# if gh is missing:
brew install gh
gh auth login
gh auth setup-git
gh auth status
brew tap elementx-ai/tap https://github.com/elementx-ai/homebrew-tap
brew install --HEAD elementx-ai/tap/agentdeploy
# or, if it is already installed:
brew upgrade --fetch-HEAD elementx-ai/tap/agentdeploy
This is the current private macOS install path. If brew is missing, direct the user to brew.sh and wait for them to finish that install themselves before continuing. If gh, GitHub auth, or agentdeploy is still unavailable after that, stop and report the install blocker before attempting deploy commands.
Before debugging any feature mismatch between docs and the installed CLI, run:
agentdeploy version
The current CLI already carries the prototype Platform API URL and Entra scope by default. On the first API-backed command it will run its own Entra device-code login flow and cache the session token until it expires. You only need to set API flags or environment variables when overriding the installation defaults.
Treat AGENTDEPLOY_CONFIG_REPO_REMOTE as an explicit fallback only for intentional direct GitOps mode without the Platform API.
For the current ElementX prototype, the shared container registry is:
acragdpelementxprototype.azurecr.io
Members of the managed deployers group agentdeploy-elementx-prototype-deployers should have Azure AcrPush on that registry. If the user wants to publish their own image instead of using a prebuilt digest or AgentDeploy build:, point them at that registry first. If a recent group membership change is not reflected yet, have them refresh their Azure login before debugging the push path further.
Recommended registry auth path for a human deployer:
az login --tenant edd2663e-acb4-4eb1-9cc3-51ce8979cc55
az account set --subscription ab282a21-8afb-4389-9fb0-4711a3a92450
az acr login --name acragdpelementxprototype
That ACR login flow is separate from agentdeploy CLI auth. The CLI no longer depends on Azure CLI, but Docker-to-ACR auth in this prototype still does. Only fall back to token-based docker login if az acr login is not viable in the user's environment.
Workflow
-
Confirm the CLI is available, then inspect the app before writing contracts.
- Run
command -v agentdeploybefore using the workflow. - Look for whether it serves HTTP, runs background work, or is purely scheduled.
- Check whether it already expects
PORT,DATABASE_URL,REDIS_URL, or migration commands. - Prefer reusing an existing immutable image digest. Use a
build:block only when the user wants AgentDeploy to build from source. - If the user needs to push a custom image for the current prototype, prefer the shared registry
acragdpelementxprototype.azurecr.io.
- Run
-
Choose the application shape.
- If the app has exactly one service, no shared state, no
valueFrom.infrastructureRef, and novalueFrom.serviceRef, a singleServiceis enough. - In that standalone shape, AgentDeploy can bootstrap
Namespace,ResourceQuota, andLimitRangedirectly from theServicerecord. - If multiple services need to share a namespace, PostgreSQL, Redis, object storage, or service-to-service wiring, create a
SharedInfrarecord first and then point each service at it throughmetadata.application. - Treat
SharedInfraas the only place that owns built-in DB, Redis, or object-storage resources. Do not putspec.infrastructureon aService; that model is gone.
- If the app has exactly one service, no shared state, no
-
Choose the workload shape for each service.
web: browser-facing app with HTTP ingress.api: browser-consumed backend with HTTP ingress.worker: no ingress, background process.cron: scheduled job, no ingress.
-
Ask about app visibility before you write ingress settings, unless the user has already made it explicit.
- Ask in simple terms which of these they want:
internal: only reachable on the private company/internal network and private DNS path. In the current prototype that means anapps.elementx.internalURL, which usually will not resolve from a normal home or public internet browser.public: reachable on the public internet domain for the prototype, usuallyapps.elx.ai. This can still stay behind shared auth unless the user explicitly asks for no auth.
- Do not silently choose
internalwhen the user's intended audience is still unclear. - If the user wants coworkers to open the app in a normal browser without private-network setup, they probably want
public. - If the app is only for private/internal network access, choose
internal.
- Ask in simple terms which of these they want:
-
Create or update the contracts.
- Start from
assets/shared-infra.yamlwhen the app needs shared DB/Redis or multiple services. - Start each service from the closest service template in
assets/. - Keep the app name DNS-safe.
- If the repo does not already declare deploy metadata, derive a unique app name and subdomain from the repo or directory name so the deployment does not collide with an existing app.
- If
owneris missing from repo context, prefer a real maintainer email from git config, docs, or project context. Only invent a synthetic owner for an explicit smoke test. - If
teamis missing from repo context, prefer an obvious team name from the repo, parent directory, or surrounding project docs. If none exists, choose a clearly temporary team name for a smoke test and call out the assumption. - Default to
dataClassification: internalunless the user explicitly says the data is more sensitive. - Keep
metadata.applicationexplicit on services when more than one service shares the same app. - For a tiny standalone app, letting
metadata.applicationdefault tometadata.nameis fine. - If the user already answered the visibility question, use that answer directly.
- Only default to
visibility: internalwhen you truly cannot get a visibility answer and a conservative private default is still appropriate for the task. - Default to
authorization.mode: group-basedunless the user explicitly wantsorg-wideand policy allows it. - Prefer dedicated Entra security groups for app access. Use broader team-wide groups only when the whole team should be able to use the app.
- For smoke tests or lab installs without real group IDs,
org-wideis acceptable only forinternalapps and only when the installation policy allows it. Do not use it for sensitive data. - If the user explicitly wants a fully public app with no shared auth at all, warn them first that the app will be reachable anonymously on the public internet and will not receive any shared identity headers.
- For that mode, use:
spec.dataClassification: publicspec.access.auth: nonespec.access.authorization.mode: none
- In the current policy set, unauthenticated app access is allowed only for
dataClassification: public. Call out that the app will not receiveX-Auth-Request-*identity headers in that mode. - If the app needs PostgreSQL, declare it on
SharedInfraunderspec.infrastructure.databases.<name>and wire one of the DB env variants fromvalueFrom.infrastructureRef:DATABASE_URLorDATABASE_URL_SYNCfor libpq / sync clientsDATABASE_URL_ASYNCfor common async Python stacks
- If the app needs Redis, declare it on
SharedInfraunderspec.infrastructure.redis.<name>and wireREDIS_URLorREDIS_URL_TLSfromvalueFrom.infrastructureRef. - If the app needs shared upload or document handoff storage across services, declare it on
SharedInfraunderspec.infrastructure.objects.<name>and wire it fromvalueFrom.infrastructureRefwithkind: objectStorage. - For split API/worker document flows, prefer object storage plus object keys over local-path handoff. Keep local disk only for scratch via
runtime.filesystem.writablePaths. - If one service needs to call another service in the same application, wire that URL with
valueFrom.serviceRefinstead of hardcoding domains or patching manifests. - In the current prototype, Redis is only supported for
dataClassification: internal. - Make sure the process actually listens on
runtime.port. If the app expects aPORTenv var, set it explicitly. runtime.commandandruntime.argsare supported. Use them when the workload needs an explicit startup command instead of baking a wrapper image only for process launch.- If the app needs writable ephemeral directories, use
runtime.filesystem.writablePathsinstead of asking users to patch manifests by hand. - In the current prototype,
spec.builddoes not build or publish an image during deploy. If you includebuild, still setruntime.imageto the prebuilt immutable digest that should actually run. - If the user is building their own image, remind them that the platform enforces
runAsNonRootwith norunAsUseroverride. The image should use a numeric non-rootUSERsuch asUSER 1000. - For common web images such as
nginx, expect to add/tmptoruntime.filesystem.writablePathsunless the image is already prepared for a read-only root filesystem. - Only set
runtime.filesystem.readOnlyRootFilesystem: falsewhen the app genuinely cannot work with explicit writable mounts. Treat that as a security tradeoff and call it out.
- Start from
-
Validate before deploying.
- Prefer the API-owned path when the installation provides it. The current CLI already defaults to the prototype API URL and Entra scope, so only set flags or environment variables when you need to override those defaults.
- Validate
SharedInfrafirst when present, then validate each dependentService. - Run
agentdeploy validate --file <contract>.yaml. - If you intentionally want offline local-engine validation instead of the hosted API path, pass
--api-url=. - Inspect
effective_serviceoreffective_infra, plusmanifest_filesandwarnings, in the response. They are the fastest way to catch dropped or mismatched fields before a deploy. - Treat
BUILD_IMAGE_NOT_RESOLVEDas a contract problem, not as a rollout problem. In the current prototype it meansspec.buildwas provided without a resolvedruntime.imagedigest. - For a standalone service-only app,
manifest_filesshould includenamespace.yaml,resourcequota.yaml, andlimitrange.yaml. If those files are absent, the app is no longer on the standalone bootstrap path. - Treat
QUOTA_*errors as pre-flight failures against the current rendered namespace limits, not as generic rollout failures. - Treat capacity-related warnings as best-effort scheduler signals. They do not block deploy by themselves, but they mean the cluster may be too full to place the new pods.
- Fix errors by following the exact
field,allowed_values, andsuggested_valuein the JSON response. - Then run
agentdeploy deploy --file <contract>.yaml --dry-run. - In the current prototype,
deploy --dry-runcan still returnstatus: acceptedand anoperation_id. Treat it as preview-only. Reviewpreview_only,effective_serviceoreffective_infra,manifest_files, andwarningsrather than assuming a live deployment started.
-
Deploy for real.
- Deploy
SharedInfrafirst when present, then deploy each dependentService. - Run
agentdeploy deploy --file <contract>.yaml. - Capture
operation_id, the reported record name, and the initial phase. - Do not assume
git_commitor revision are returned immediately. Live deploys are queued and executed asynchronously. - If local CLI mode returns
DEPLOY_NO_LIVE_TARGET, stop. Use the Platform API path or configure a real GitOps remote. Only useAGENTDEPLOY_ALLOW_LOCAL_GITOPS=truewhen the user explicitly wants local-only GitOps testing. - If the deploy returns
DEPLOY_MISSING_SHARED_INFRA, the app is not a true standalone service. Either deploySharedInfrafirst or remove the extra shared-state / service-ref coupling. - If the deploy is rejected with
DEPLOY_OPERATION_ALREADY_IN_PROGRESS, check whether the active operation is still desirable. If it is stuck or obsolete, runagentdeploy cancel <record>and then submit the replacement deploy. - Poll with
agentdeploy status <record>until the phase isliveor an error is returned.
- Deploy
-
Verify the result.
- Start with the aggregate application view for multi-service apps:
agentdeploy applicationsagentdeploy app-status <team> <application>agentdeploy app-explain <team> <application>
- Use the aggregate view to confirm
SharedInfraplus all dependent services are converging together before drilling into a single record. - Use the URLs returned by
statusorexplainfor services. Do not hardcode domains because each installation can differ. - For a complete state dump, run
agentdeploy explain <record>. - For runtime debugging without
kubectl, use:agentdeploy describe <record>for pod names, restart counts, waiting or termination reasons, image digests, requests, limits, and service or endpoint visibilityagentdeploy events <record>for missing secrets, quota failures, probe failures, and scheduling errorsagentdeploy logs <record> [--follow] [--previous] [--pod <name>] [--container <name>] [--tail N]for live or previous container logs
- In
explain, inspect the liveinfrastructureRefandserviceRefsections when secret wiring or same-namespace traffic looks wrong. - Compare
requested_revisionagainstobserved_revision. If they differ, the control plane has accepted a newer revision than ArgoCD has actually reconciled in the cluster. - Treat a stale-reconciliation warning as a real GitOps signal. The platform now requests a targeted Argo refresh automatically, and you can also run
agentdeploy refresh <app>if the warning persists. - If the app depends on PostgreSQL, confirm
SharedInfrais healthy, the service injects the DB variant it actually uses, and the app-level readiness check matches the app’s real dependencies. - If the app depends on Redis, confirm
SharedInfrais healthy, the service injectsREDIS_URLorREDIS_URL_TLS, and the app-level readiness check actually exercises Redis. - If the app depends on shared object storage, confirm
SharedInfrais healthy and the service injects the object-store keys it actually uses. PreferOBJECT_STORE_*for portable app wiring and fall back toAZURE_STORAGE_*only when the runtime still needs provider-specific compatibility. - Remember that
list,status, andexplainare usually filtered by team-scoped control-plane RBAC, not by app owner alone.
- Start with the aggregate application view for multi-service apps:
Default decisions
- Prefer immutable image digests over tags.
- Prefer the smallest viable CPU and memory values; only raise them when the app clearly needs more.
- Prefer PostgreSQL only when the app actually needs persistent relational storage.
- Prefer Redis only when the app actually needs cache or ephemeral key-value state.
- Prefer one service per app unless there is a clear need for a multi-service application with shared namespace and shared infra.
- Prefer the standalone
Servicebootstrap only for a true one-service app. The moment the app needs shared state or a sibling service, switch to explicitSharedInfra. - Prefer internal ingress and group-based authorization for enterprise apps.
High-value gotchas
- Mutable image tags are rejected. Use
repo@sha256:.... allowedGroupsmust contain stable group IDs, not human-readable names.- Only
internaldata classification can bepublic. confidentialandrestrictedcannot useorg-wide.apimeans browser-consumed HTTP API in this platform, not general service-to-service auth.- Public apps use shared auth by default, but an app can explicitly opt out with
spec.dataClassification: public,spec.access.auth: none, andspec.access.authorization.mode: none. - For
group-basedaccess,allowedGroupsshould be the stable Entra object IDs of the groups that should be able to pass the shared auth proxy. - If more than one user set should be allowed, list all of their group object IDs in
allowedGroupsand keep the scope intentional. Prefer app-specific access groups over broad org-wide groups. - The app receives identity through
X-Auth-Request-*headers, not raw Entra bearer tokens, by default. - The concrete headers are
X-Auth-Request-Email,X-Auth-Request-Groups,X-Auth-Request-Preferred-Username, andX-Auth-Request-User. - Shared ingress auth no longer forwards bearer
Authorizationheaders into apps by default. If an app expects raw OAuth access tokens, call that out as a platform mismatch instead of assuming they will be present. auth: nonemeans no shared ingress auth and no injected identity headers. In the current policy set, that is allowed only fordataClassification: public.- If PostgreSQL is declared, it must live on
SharedInfra, notService. Wire the correct DB env variant withvalueFrom.infrastructureRef.DATABASE_URL/DATABASE_URL_SYNCare libpq-style, whileDATABASE_URL_ASYNCis meant for common async Python stacks. - If Redis is declared, it must live on
SharedInfra, notService. WireREDIS_URLorREDIS_URL_TLSwithvalueFrom.infrastructureRef. The platform now includesssl_cert_reqs=required, so mostredis-pyand Celery clients should not need app-side query rewriting. - If one service needs another service's base URL, use
valueFrom.serviceRef. That resolves to a stable in-namespace URL likehttp://expense-api/apiand avoids hardcoding installation domains. - A single standalone
Servicecan bootstrap its own namespace policy, but that only works when there are noinfrastructureReforserviceRefbindings and no other services in the same application. DEPLOY_MISSING_SHARED_INFRAmeans the app has outgrown the standalone path and now needs an explicitSharedInfraowner.- In the current prototype, Redis is an
internal-only infrastructure option and is exposed over TLS on port6380. runtime.filesystem.writablePathscreatesemptyDirmounts at those paths. Existing image contents at those paths will be hidden at runtime.runtime.filesystem.readOnlyRootFilesystemdefaults totrue. Turning it off is a real security relaxation and should be deliberate.spec.buildis metadata only in the current prototype. It does not replaceruntime.image, and deploys still run the immutable digest fromruntime.image.- The platform enforces
runAsNonRootwith norunAsUseroverride. Images should use a numeric non-rootUSERsuch asUSER 1000. - Common web images such as
nginxoften need writable/tmpunder the default read-only root filesystem, so expect to add/tmptoruntime.filesystem.writablePaths. - If the app does not listen on
runtime.port, the deployment will roll out but ingress health and readiness will still fail. validate,deploy --dry-run, andexplainexpose the effective normalized contract. Use that output to verify that infrastructure ownership, env wiring, and runtime overrides survived normalization.- In the intended product mode, deployers should use the Platform API path. They should not need direct Git push access or direct Kubernetes read access for normal lifecycle commands.
- Team visibility is usually team-scoped, not owner-scoped. A caller typically sees all apps for teams they can view.
- A second real deploy for the same app may be rejected while another non-terminal operation is queued or running.
agentdeploy cancel <app>is the current escape hatch for a stuck or obsolete live operation. It cancels the active operation record so a replacement deploy can be accepted.requested_revisionis the last revision the control plane accepted.observed_revisionis the revision ArgoCD currently reports from the cluster. Treat them as different signals.describe,events, andlogsdepend on a recent hostedplatform-apibuild when you are usingAGENTDEPLOY_API_URL. If they returnHTTP_NOT_FOUND, the CLI is newer than the live control plane.
Debug loop
- Read
agentdeploy status <record>first. - If the phase is not obviously actionable, read
agentdeploy explain <record>. - For rollout or runtime failures, read
agentdeploy describe <record>. - If the cause still is not obvious, read
agentdeploy events <record>. - Use
agentdeploy logs <record> --previousfor crash loops andagentdeploy logs <record> --followfor live request or worker debugging. - Use the error code prefix to choose the next action:
SCHEMA_*: fix the contract.POLICY_*: change the requested shape or access mode.AUTH_*: fix group IDs or auth assumptions.INFRA_*: inspect database or Redis claim and secret readiness.DEPLOY_*: inspect the workload rollout and health checks.QUOTA_*: lower requests or ask for a higher app tier.
Feedback loop
- If a real deployment exposes a high-value platform bug, contract gap, or reliability issue, raise that feedback rather than treating it as one-off local friction.
- If you have access to
elementx-ai/agentdeploy, open or update a GitHub issue with:- the affected app, record type, and workload type
- the relevant
SharedInfraorServiceshape - operation ID, requested revision, and observed revision when available
- the exact failure mode, impact, and the smallest useful fix
- Prefer issues for meaningful fixes or improvements. Do not create noise for already-documented prototype limitations unless the observed behavior is worse than documented.
Output expectations
When doing deployment work with this skill:
- keep the contracts small and explicit
- explain which workload type, application shape, and data classification you chose
- surface the exact CLI commands you ran
- quote the operation ID first, then the revision or Git commit once
statusorexplainreports it - prefer actionable remediation over generic advice