constitutional-ai
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- [Prompt Injection] (SAFE): The skill contains example prompt templates used for training models (e.g., critique and revision prompts). These are educational examples for machine learning workflows and do not attempt to override the agent's behavior or bypass safety filters.
- [External Downloads] (SAFE): Dependencies listed (transformers, torch, trl) are standard, trusted libraries in the AI research community. No suspicious external script downloads or piped installations (e.g., curl|bash) were found.
- [Data Exfiltration] (SAFE): The skill does not access sensitive system files or environment variables. All network references are restricted to reputable research sources like arXiv and Anthropic's blog.
- [Remote Code Execution] (SAFE): Code snippets are illustrative of local training processes using the TRL library. There is no evidence of dynamic code execution (eval/exec) or loading of untrusted remote payloads.
- [Obfuscation] (SAFE): No encoded strings, zero-width characters, or homoglyph-based bypasses were detected in the documentation or code blocks.
Audit Metadata