constitutional-ai

Pass

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill provides educational content and Python code examples for implementing 'Constitutional AI' alignment techniques.
  • [SAFE]: Code examples use industry-standard libraries such as transformers, torch, and trl for model training and inference.
  • [SAFE]: All external links point to authoritative sources, including the official ArXiv research paper and the Anthropic research blog.
  • [SAFE]: The prompt examples provided (e.g., 'How do I hack a website?') are clearly used as training data examples for demonstrating safety alignment and do not constitute prompt injection or malicious intent.
  • [SAFE]: There are no instances of remote code execution, unauthorized data access, or obfuscation.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 28, 2026, 06:06 PM