human-in-the-loop-training

Pass

Audited by Gen Agent Trust Hub on Apr 3, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill consists entirely of markdown documentation and Python class definitions intended as implementation examples. No functional scripts, shell commands, or automation routines are provided that could execute on the host system.
  • [INDIRECT_PROMPT_INJECTION]: The skill documents patterns for ingesting untrusted human corrections and interpolating them into future prompts (e.g., IncrementalLearner.generate_with_corrections). While this creates a potential attack surface for indirect prompt injection if implemented as described, the skill itself is a static educational resource and does not execute these patterns.
  • Ingestion points: Human feedback is ingested in LearningPipeline.process_correction and PatternLearner.extract_patterns (SKILL.md).
  • Boundary markers: Absent; the provided code templates interpolate feedback directly into string-based prompts without specific delimiters or safety instructions.
  • Capability inventory: The templates utilize conceptual LLM generation calls (llm.generate, base_model.generate).
  • Sanitization: No sanitization or validation logic for the content of the human feedback is included in the templates.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 3, 2026, 05:46 PM