constitutional-ai
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [Indirect Prompt Injection] (LOW): The skill demonstrates workflows where untrusted data (prompts and initial model responses) are interpolated directly into critique and revision templates. 1. Ingestion points: User-provided prompts and generated initial_responses in SKILL.md. 2. Boundary markers: None identified; templates use simple string interpolation without delimiters or guard instructions for the nested content. 3. Capability inventory: Uses transformers for inference and trl for model training (SFT, Reward modeling, PPO). 4. Sanitization: None present; the skill assumes inputs are benign training data.
Audit Metadata