human-in-the-loop-training
Pass
Audited by Gen Agent Trust Hub on Apr 3, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill consists entirely of markdown documentation and Python class definitions intended as implementation examples. No functional scripts, shell commands, or automation routines are provided that could execute on the host system.
- [INDIRECT_PROMPT_INJECTION]: The skill documents patterns for ingesting untrusted human corrections and interpolating them into future prompts (e.g.,
IncrementalLearner.generate_with_corrections). While this creates a potential attack surface for indirect prompt injection if implemented as described, the skill itself is a static educational resource and does not execute these patterns. - Ingestion points: Human feedback is ingested in
LearningPipeline.process_correctionandPatternLearner.extract_patterns(SKILL.md). - Boundary markers: Absent; the provided code templates interpolate feedback directly into string-based prompts without specific delimiters or safety instructions.
- Capability inventory: The templates utilize conceptual LLM generation calls (
llm.generate,base_model.generate). - Sanitization: No sanitization or validation logic for the content of the human feedback is included in the templates.
Audit Metadata