silent-failure-detection

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • EXTERNAL_DOWNLOADS (LOW): The PerplexityDetector class in references/gibberish-detection.md uses the transformers library to download the GPT-2 model from Hugging Face. While Hugging Face is a trusted organization, the download of external model weights is a significant network operation.
  • PROMPT_INJECTION (LOW): The detect_gibberish function in SKILL.md implements an 'LLM-as-judge' pattern that is vulnerable to indirect prompt injection. Untrusted data from the agent's response is placed directly into a prompt template used for scoring.
  • Ingestion points: The response argument in the detect_gibberish function in SKILL.md.
  • Boundary markers: Absent; the response text is concatenated directly into the judge_prompt string.
  • Capability inventory: The generated prompt is processed by llm.generate(), which influences the monitoring system's alerting and scoring decisions.
  • Sanitization: No sanitization or escaping of the input response is performed before interpolation.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:23 PM