evaluating-skills-with-models
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- PROMPT_INJECTION (LOW): The skill is susceptible to Indirect Prompt Injection due to how it processes test scenario files.
- Ingestion points: The skill ingests untrusted data from
tests/scenarios.mdlocated within target skill directories. - Boundary markers: The prompt template used in Step 2 interpolates the
{evaluation_query}directly without using delimiters or instructions to ignore embedded commands. - Capability inventory: The skill has the capability to spawn sub-agents that execute code, read files via Glob/Read, and modify local README files.
- Sanitization: There is no evidence of sanitization or validation of the queries or expected behaviors parsed from the scenario files.
- SAFE (SAFE): No instances of data exfiltration, hardcoded credentials, obfuscation, or persistence mechanisms were detected. The skill's functionality is consistent with its stated purpose of evaluating AI model performance.
Audit Metadata