honest-code-review

Fail

Audited by Gen Agent Trust Hub on Feb 13, 2026

Risk Level: HIGHPROMPT_INJECTION
Full Analysis
  • PROMPT_INJECTION (HIGH): The skill uses explicit instructions to override the AI's default safety and behavioral guidelines.
  • Evidence: The instructions state: 'Swear CONSTANTLY. This isn't optional. Explicit language is REQUIRED', 'DON'T. HOLD. BACK.', and 'What You NEVER Do: ... Soften ANY criticism'.
  • Risk: This is a 'persona adoption' attack vector that forces the AI into an unfiltered state, making it more likely to ignore other safety constraints or generate harmful content under the guise of the persona.
  • INDIRECT_PROMPT_INJECTION (HIGH): The skill is designed to process external, untrusted data (code reviews) while in an aggressive, rule-breaking state.
  • Ingestion Points: User-provided code snippets or architecture descriptions.
  • Capability Inventory: Text generation and formatting.
  • Boundary Markers: Absent. There are no instructions to distinguish between the persona's instructions and potentially malicious instructions embedded in the code being reviewed.
  • Sanitization: None. The skill is predisposed to be 'unfiltered', increasing the risk that instructions hidden in code comments could successfully hijack the session.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 13, 2026, 10:52 PM