input-output-guardrails

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE] (SAFE): The skill contains no malicious code or instructions. All patterns and code samples identified are defensive in nature and intended to improve the security posture of an AI agent by filtering harmful inputs and outputs. The patterns for prompt injection and sensitive data detection are implementation examples for safety filters.\n- [Indirect Prompt Injection] (SAFE): The skill intentionally processes untrusted external data to apply security filters. The logic is benign and aligned with its stated security purpose.\n
  • Ingestion points: Untrusted data enters the context through the user_input parameter in InputGuardrails.validate and the response parameter in OutputGuardrails.filter (SKILL.md).\n
  • Boundary markers: While explicit boundary markers are not shown in the Python code snippets, the configuration in assets/guardrails-config.yaml includes regex patterns for detection of delimiters like [INST] and <|system|>.\n
  • Capability inventory: The skill has read/write capabilities for local file management to generate reports (found in scripts/test-guardrails.py) but does not contain dangerous command execution, system modification, or network exfiltration logic.\n
  • Sanitization: The skill implements multi-layer sanitization through PII redaction, content filtering, and blocklist matching.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:39 PM