guardrails-safety-filter-builder

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • PROMPT_INJECTION (SAFE): The classify_topic function in SKILL.md interpolates user-provided queries directly into an LLM prompt template. While this is a common surface for indirect prompt injection, the risk is mitigated by the skill's primary purpose as a safety filter and its multi-layered detection approach. • Ingestion points: query parameter in classify_topic. • Boundary markers: Absent in the prompt template. • Capability inventory: The script facilitates tool calls to llm() but does not include file system, network, or shell execution capabilities. • Sanitization: The apply_guardrails pipeline implements detect_prompt_injection regex checks as a primary layer of defense before classification.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:43 PM