vertex-engine-inspector
Vertex Engine Inspector
Overview
Inspect and validate Vertex AI Agent Engine deployments across seven categories: runtime configuration, Code Execution Sandbox, Memory Bank, A2A protocol compliance, security posture, performance metrics, and monitoring observability. This skill generates weighted production-readiness scores (0-100%) with actionable recommendations for each deployment.
Prerequisites
gcloudCLI authenticated withroles/aiplatform.viewerandroles/monitoring.viewerIAM roles- Access to the target Google Cloud project hosting the Agent Engine deployment
curlorgcloudfor A2A protocol endpoint testing (AgentCard, Task API, Status API)- Cloud Monitoring API enabled for performance metrics retrieval
- Familiarity with Vertex AI Agent Engine concepts: Code Execution Sandbox, Memory Bank, Model Armor
Instructions
- Connect to the Agent Engine deployment by retrieving agent metadata via
gcloud ai agents describe - Parse the runtime configuration: model selection (Gemini 2.5 Pro/Flash), tools enabled, VPC settings, and scaling policies
- Validate Code Execution Sandbox settings: confirm state TTL is 7-14 days, sandbox type is
SECURE_ISOLATED, and IAM permissions are scoped to required GCP services only - Check Memory Bank configuration: verify enabled status, retention policy (min 100 memories), Firestore encryption, indexing enabled, and auto-cleanup active
- Test A2A protocol compliance by probing
/.well-known/agent-card,POST /v1/tasks:send, andGET /v1/tasks/{task_id}endpoints for correct responses - Audit security posture: validate IAM least-privilege roles, VPC Service Controls perimeter, Model Armor activation, encryption at rest and in transit, and absence of hardcoded credentials
- Query Cloud Monitoring for performance metrics: request count, error rate (target < 5%), latency percentiles (p50/p95/p99), token usage, and cost estimates over the last 24 hours
- Assess monitoring and observability: confirm Cloud Monitoring dashboards, alerting policies, structured logging, OpenTelemetry tracing, and Cloud Error Reporting are configured
- Calculate weighted scores across all categories and determine overall production readiness status
- Generate a prioritized list of recommendations with estimated score improvement per remediation
See ${CLAUDE_SKILL_DIR}/references/inspection-workflow.md for the phased inspection process and ${CLAUDE_SKILL_DIR}/references/inspection-categories.md for detailed check criteria.
Output
- Inspection report in YAML format with per-category scores and overall readiness percentage
- Runtime configuration summary: model, tools, VPC, scaling settings
- A2A protocol compliance matrix: pass/fail for AgentCard, Task API, Status API
- Security posture score with breakdown: IAM, VPC-SC, Model Armor, encryption, secrets
- Performance metrics dashboard: error rate, latency percentiles, token usage, daily cost estimate
- Prioritized recommendations with estimated score improvement per item
See ${CLAUDE_SKILL_DIR}/references/example-inspection-report.md for a complete sample report.
Error Handling
| Error | Cause | Solution |
|---|---|---|
| Agent metadata not accessible | Insufficient IAM permissions or incorrect agent ID | Verify roles/aiplatform.viewer granted; confirm agent ID with gcloud ai agents list |
| A2A AgentCard endpoint 404 | Agent not configured for A2A protocol or endpoint path incorrect | Check agent configuration for A2A enablement; verify /.well-known/agent-card path |
| Cloud Monitoring metrics empty | Monitoring API not enabled or no recent traffic | Run gcloud services enable monitoring.googleapis.com; generate test traffic first |
| VPC-SC perimeter blocking access | Inspector running outside VPC Service Controls perimeter | Add inspector service account to access level; use VPC-SC bridge or access policy |
| Code Execution TTL out of range | State TTL set below 1 day or above 14 days | Adjust TTL to 7-14 days for production; values above 14 days are rejected by Agent Engine |
See ${CLAUDE_SKILL_DIR}/references/errors.md for additional error scenarios.
Examples
Scenario 1: Pre-Production Readiness Check -- Inspect a newly deployed ADK agent before production launch. Run all 28 checklist items across security, performance, monitoring, compliance, and reliability. Target: overall score above 85% before approving production traffic.
Scenario 2: Security Audit After IAM Change -- Re-inspect security posture after modifying service account roles. Validate that least-privilege is maintained (target: IAM score 95%+), VPC-SC perimeter is intact, and Model Armor remains active.
Scenario 3: Performance Degradation Investigation -- Inspect an agent showing elevated error rates. Query 24-hour performance metrics, identify latency spikes at p95/p99, check auto-scaling behavior, and correlate with token usage patterns to isolate the root cause.
Resources
- Vertex AI Agent Engine Documentation -- deployment and configuration
- A2A Protocol Specification -- AgentCard, Task API, protocol compliance
- Cloud Monitoring API -- metrics queries and dashboard configuration
- VPC Service Controls -- perimeter setup and access policies
- Model Armor -- prompt injection protection configuration