The Agent Skills Directory

[COMMAND_EXECUTION]: The skill utilizes shell commands to perform automated testing and integration tasks.\n
BinaryTestsGrader.ts executes test commands (e.g., pytest, bun test) against local codebases to verify functionality.\n
StaticAnalysisGrader.ts runs analysis tools like linters and type-checkers to assess code quality.\n
AlgorithmBridge.ts executes CLI commands to interact with the THEALGORITHM skill for reporting results.\n- [DATA_INGESTION]: The skill manages its operational state and configurations through local file access.\n
It reads task configurations, evaluation suites, and agent transcripts from the local filesystem to provide context for grading.\n
It maintains a local failure log in Data/failures.jsonl to track and convert agent errors into test cases.\n- [PROMPT_INJECTION]: The skill's use of LLM-based grading introduces a surface for indirect prompt injection from the content being evaluated.\n
Ingestion points: The LLMRubricGrader and NaturalLanguageAssertGrader receive untrusted output from other agent runs as input for grading.\n
Boundary markers: The prompts used for LLM judges do not implement strict delimiters to separate the grading instructions from the content being analyzed.\n
Capability inventory: While the skill can execute shell commands, these are triggered by deterministic logic in the runner based on task definitions, not directly by the LLM judge's output.\n
Sanitization: Content under evaluation is passed to the LLM judge without prior sanitization or escaping.

Evals