silent-failure-detection
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- EXTERNAL_DOWNLOADS (LOW): The
PerplexityDetectorclass inreferences/gibberish-detection.mduses thetransformerslibrary to download the GPT-2 model from Hugging Face. While Hugging Face is a trusted organization, the download of external model weights is a significant network operation. - PROMPT_INJECTION (LOW): The
detect_gibberishfunction inSKILL.mdimplements an 'LLM-as-judge' pattern that is vulnerable to indirect prompt injection. Untrusted data from the agent's response is placed directly into a prompt template used for scoring. - Ingestion points: The
responseargument in thedetect_gibberishfunction inSKILL.md. - Boundary markers: Absent; the response text is concatenated directly into the
judge_promptstring. - Capability inventory: The generated prompt is processed by
llm.generate(), which influences the monitoring system's alerting and scoring decisions. - Sanitization: No sanitization or escaping of the input response is performed before interpolation.
Audit Metadata