validate-agent
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- COMMAND_EXECUTION (HIGH): The skill constructs a shell command in Step 2:
uv run python scripts/braintrust_analyze.py --rag-judge --plan-file <plan-path>. The<plan-path>variable is a dynamic input. If this path contains shell metacharacters (e.g.,path/to/file.md; rm -rf /), it could lead to arbitrary command execution on the host environment. - PROMPT_INJECTION (LOW): Category 8: Indirect Prompt Injection. The skill is designed to process external technical plans which are untrusted data sources.
- Ingestion points: Technical plans are read from the file system and their content is processed in Steps 1, 2, and 3.
- Boundary markers: Absent. The instructions do not provide delimiters or warnings to the agent to disregard instructions found within the technical plan.
- Capability inventory: The agent has the ability to execute subprocesses (via
braintrust_analyze.py), perform web searches, and write files to the handoff directory. - Sanitization: Absent. There is no evidence of input validation or escaping for the plan content or the plan path.
Recommendations
- AI detected serious security threats
Audit Metadata