agents-optimize
optimize
Measure and improve your AgentCore agent's quality through evaluation, monitoring, and observability.
When to use
- You want to know if your agent is giving good answers
- You want to set up continuous quality monitoring in production
- You want to add a quality gate to your CI/CD pipeline
- You want to understand agent behavior through logs, metrics, and traces
- You want to set up CloudWatch dashboards or X-Ray tracing
Do NOT use for:
- Debugging a specific broken agent (wrong answers, errors) → use
agents-debug - Production security hardening (IAM, auth) → use
agents-harden
Input
$ARGUMENTS can be:
- An eval goal: "add a quality gate", "set up monitoring"
- An observability goal: "set up CloudWatch dashboard", "understand my traces"
- A specific evaluator: "llm-as-a-judge", "code-based"
- Empty — the skill will guide based on project context
Process
Step 0: Verify CLI version
Run agentcore --version. This skill requires v0.9.0 or later.
Step 1: Read project context
Read agentcore/agentcore.json to understand existing evaluators, online eval configs, and agent setup.
If agentcore/agentcore.json is not found:
"This skill requires an AgentCore project. Use
agents-get-startedto create one."
Step 2: Determine the workflow
| Developer intent | Action |
|---|---|
| Measure quality, add evaluator, run eval, CI/CD gate, online monitoring | Load references/evals.md and follow its workflow |
| Set up observability, CloudWatch, X-Ray, logs, metrics, dashboards | Load references/observability.md and follow its workflow |
| Understand or reduce AgentCore costs | Load references/cost.md |
| Both — "I want to understand and improve my agent" | Start with observability setup, then add evals |
Step 3: Follow the loaded reference
The reference file contains the full procedure. Follow it step by step.
Cross-references
- After setting up evals, suggest
agents-hardenfor production readiness - If eval results reveal agent issues, suggest
agents-debugfor root cause analysis - If the developer needs to add capabilities first, suggest
agents-build
Output
Depends on the workflow — see the loaded reference for specific outputs.
Quality criteria
- Evaluator configuration uses only valid CLI flags
- Online eval sampling rate is appropriate (not 100% in production without discussion)
- CI/CD quality gate has a clear pass/fail threshold
- Observability setup includes both tracing and logging
- The developer understands the eval data delay: ~10 seconds put-to-get, end-to-end — one ingestion step covers both trace reads and eval queries; there is no separate indexing wait
More from aws/agent-toolkit-for-aws
aws-iam
Verified corrections for IAM behaviors that AI agents frequently get\
222aws-serverless
Builds, deploys, manages, debugs, configures, and optimizes serverless applications on AWS using Lambda, API Gateway, Step Functions, EventBridge, and SAM/CDK. Covers cold starts, CORS debugging, event source mappings, troubleshooting, concurrency, SnapStart, Powertools, function URLs, EventBridge Scheduler, Lambda layers, Durable Functions, durable execution, checkpoint-and-replay, and production readiness. Use when the user mentions Lambda, API Gateway, Step Functions, SAM templates, CDK serverless stacks, DynamoDB stream triggers, SQS event sources, cold starts, timeouts, 502/504 errors, throttling, concurrency, CORS, Powertools, Durable Functions, durable execution, checkpoint-and-replay, or any event-driven architecture on AWS, even if they don't say "serverless." Do NOT use for EC2, ECS/Fargate containers, or Amplify hosting.
202aws-sdk-python-usage
|
195aws-cloudformation
Author, validate, and troubleshoot AWS CloudFormation templates. Covers template authoring with secure defaults, pre-deployment validation (cfn-lint, cfn-guard, change sets), and root-cause diagnosis of failed stacks using CloudFormation events and CloudTrail correlation.
194aws-cdk
Authors, deploys, and troubleshoots AWS infrastructure using CDK with TypeScript or Python. Covers best practices, stack architecture, and construct patterns. Always use when writing CDK constructs, bootstrapping environments, running cdk deploy/synth/diff, fixing CDK or CloudFormation errors, planning stack structure, importing existing resources, resolving drift, or refactoring stacks without resource replacement.
193aws-messaging-and-streaming
>
162