NYC
skills/smithery/ai/agentic-eval/Gen Agent Trust Hub

agentic-eval

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • PROMPT_INJECTION (LOW): Indirect Prompt Injection Surface. The skill implements 'Refine' patterns where untrusted output from one LLM call is interpolated directly into subsequent prompts (e.g., in reflect_and_refine).
  • Ingestion points: Variables output, critique, and failed derived from LLM responses in SKILL.md.
  • Boundary markers: Absent. No delimiters are used to separate the system instructions from the potentially adversarial data being refined.
  • Capability inventory: The agent has the capability to generate new prompts and execute generated code via hypothetical helper functions.
  • Sanitization: Absent. The patterns do not include logic to sanitize or escape the LLM output before interpolation.
  • COMMAND_EXECUTION (LOW): Dynamic Execution. Pattern 3 (CodeReflector) references a run_tests(code, tests) function. This pattern encourages the execution of generated code and unit tests, which can lead to arbitrary code execution if the environment is not properly sandboxed.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:45 PM