model-deployment

Installation
SKILL.md

Model Deployment

This skill enables an AI agent to deploy trained machine learning models into production environments. It covers packaging models into serving APIs with FastAPI or Flask, containerizing with Docker, orchestrating with Kubernetes, and deploying to serverless platforms. The agent handles model versioning, health checks, input validation, logging, and monitoring to ensure reliable and scalable inference in production.

Workflow

  1. Serialize and package the model: Export the trained model to a portable format such as ONNX, TorchScript, SavedModel, or joblib pickle. Bundle the model artifact with its preprocessing pipeline and any required configuration files so inference is self-contained.

  2. Build the serving API: Create a REST API using FastAPI or Flask that loads the model at startup and exposes prediction endpoints. Include a health check endpoint, request/response schemas with input validation (Pydantic models), structured logging, and error handling that returns meaningful HTTP status codes.

  3. Containerize with Docker: Write a Dockerfile that installs dependencies from a pinned requirements.txt, copies the model artifact and serving code, and sets the entrypoint to the API server. Use multi-stage builds to minimize image size and avoid including training-only dependencies.

  4. Configure orchestration and scaling: Define Kubernetes Deployment and Service manifests (or equivalent for your platform) with resource requests/limits, readiness and liveness probes pointing at the health check endpoint, and a Horizontal Pod Autoscaler to scale based on CPU, memory, or custom metrics like request latency.

  5. Deploy and verify: Push the container image to a registry, apply the Kubernetes manifests or deploy to the serverless platform, and run smoke tests against the live endpoint. Validate that responses match expected outputs for a set of known inputs.

  6. Monitor and iterate: Integrate with monitoring tools like Prometheus and Grafana to track request latency, error rates, throughput, and model-specific metrics like prediction distribution drift. Set up alerts for anomalies and establish a redeployment workflow for updated model versions using blue-green or canary strategies.

Supported Technologies

Related skills
Installs
9
GitHub Stars
78
First Seen
Mar 19, 2026