The Agent Skills Directory

[PROMPT_INJECTION] (LOW): Indirect Prompt Injection surface area in training workflows.
Ingestion points: The skill processes untrusted user data (prompts and responses) into template-based prompts for AI critiques and revisions (e.g., Workflow 1, Workflow 2, and Workflow 3).
Boundary markers: The skill uses basic field headers (Question:, Response:, Critique:) but lacks explicit 'ignore instructions' delimiters or escaping for the variable data injected into the templates.
Capability inventory: The skill demonstrates Python code for model training (SFTTrainer, RewardTrainer, PPOTrainer) and local file/directory creation (output_dir='constitutional-reward-model').
Sanitization: No evidence of sanitization or validation of the input strings before formatting them into LLM templates, making it theoretically possible for a prompt or response to contain instructions that manipulate the critique/revision phase.

constitutional-ai