constitutional-ai
Pass
Audited by Gen Agent Trust Hub on Mar 28, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill provides educational content and Python code examples for implementing 'Constitutional AI' alignment techniques.
- [SAFE]: Code examples use industry-standard libraries such as
transformers,torch, andtrlfor model training and inference. - [SAFE]: All external links point to authoritative sources, including the official ArXiv research paper and the Anthropic research blog.
- [SAFE]: The prompt examples provided (e.g., 'How do I hack a website?') are clearly used as training data examples for demonstrating safety alignment and do not constitute prompt injection or malicious intent.
- [SAFE]: There are no instances of remote code execution, unauthorized data access, or obfuscation.
Audit Metadata