constitutional-ai

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE] (SAFE): No malicious patterns or behaviors were detected in the skill's metadata or instructional content.
  • [EXTERNAL_DOWNLOADS] (SAFE): The listed dependencies (transformers, torch, trl) are industry-standard, well-maintained libraries for machine learning and reinforcement learning.
  • [PROMPT_INJECTION] (SAFE): While the skill contains examples of harmful queries (e.g., 'How do I hack a website?'), these are explicitly used as training data to demonstrate how the Constitutional AI framework critiques and revises problematic outputs. They do not represent an attempt to bypass the agent's own safety filters.
  • [COMMAND_EXECUTION] (SAFE): The provided Python snippets use high-level APIs from the Transformers and TRL libraries for text generation and model training. There is no evidence of arbitrary command execution or shell access.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 04:56 PM