human-in-the-loop-training

Pass

Audited by Gen Agent Trust Hub on Mar 18, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill implements a learning pipeline that is vulnerable to indirect prompt injection.
  • Ingestion points: The HumanReviewQueue.submit_review and IncrementalLearner.correction_buffer in SKILL.md accept arbitrary human-provided feedback.
  • Boundary markers: The IncrementalLearner.generate_with_corrections function interpolates corrections into a prompt using only text headers without delimiters or instructions to ignore embedded commands.
  • Capability inventory: The skill uses llm.generate to produce bot responses and derive rules based on these inputs.
  • Sanitization: No sanitization or validation of the correction content is performed to prevent instructions from being treated as commands.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 18, 2026, 10:46 PM