The Agent Skills Directory

[COMMAND_EXECUTION]: The evaluation script is designed to execute local MCP server implementations for testing purposes.
Evidence: The script scripts/evaluation.py allows users to specify a command and arguments via -c/--command and -a/--args flags to launch a local server process through the mcp library's stdio transport. This is the intended purpose of the test harness and is limited to local execution of the developer's own code.
[EXTERNAL_DOWNLOADS]: The skill references official documentation and SDKs from the protocol's authoritative sources.
Evidence: SKILL.md contains instructions for the agent to fetch documentation from modelcontextprotocol.io and the modelcontextprotocol organization on GitHub. These are recognized as trusted, well-known services within the MCP ecosystem.
[PROMPT_INJECTION]: The evaluation loop processes external data which serves as a surface for indirect prompt injection.
Ingestion points: scripts/evaluation.py reads questions from a user-provided XML file and results from the MCP server being tested.
Boundary markers: The EVALUATION_PROMPT enforces the use of XML tags (<summary>, <feedback>, <response>) to structure the assistant's output and maintain separation from data.
Capability inventory: The script is capable of subprocess execution for local MCP servers and making network requests to the Anthropic API.
Sanitization: No explicit sanitization is performed on the questions or tool outputs before they are passed to the model, which is common in testing utilities of this nature.

mcp-builder