guardrails-safety-filter-builder
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- PROMPT_INJECTION (SAFE): The
classify_topicfunction inSKILL.mdinterpolates user-provided queries directly into an LLM prompt template. While this is a common surface for indirect prompt injection, the risk is mitigated by the skill's primary purpose as a safety filter and its multi-layered detection approach. • Ingestion points:queryparameter inclassify_topic. • Boundary markers: Absent in the prompt template. • Capability inventory: The script facilitates tool calls tollm()but does not include file system, network, or shell execution capabilities. • Sanitization: Theapply_guardrailspipeline implementsdetect_prompt_injectionregex checks as a primary layer of defense before classification.
Audit Metadata