experiment-audit

Pass

Audited by Gen Agent Trust Hub on Apr 19, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it ingests and processes the content of arbitrary project files, such as evaluation scripts and result logs, using a secondary LLM. Maliciously crafted data within these files could override the intended audit instructions.
  • Ingestion points: Scans and reads files matching eval.py, *.json, *.csv, and *.yaml within the project directory (Step 1).
  • Boundary markers: The skill lacks explicit delimiters or instructions for the reviewer model to ignore potentially adversarial content embedded in the ingested files.
  • Capability inventory: The agent has access to Bash(*), Read, Write, and Edit tools, which could be exploited if the model is compromised by injected instructions.
  • Sanitization: There is no evidence of sanitization or validation of the content retrieved from the project files before it is processed by the reviewer tool.
  • [PROMPT_INJECTION]: The skill documentation and configuration include misleading metadata by referencing non-existent model versions ('GPT-5.4') and fictional backend services ('Oracle-Pro'). This deception can cause users or automated systems to misinterpret the actual capabilities and safety protocols of the skill's execution environment.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 19, 2026, 03:14 AM