data-poisoning
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- Indirect Prompt Injection (LOW/SAFE): The
detect-poisoning.pyscript handles training data which may contain malicious instructions. While the script reports these findings, the exposure is limited to local report generation. - Ingestion points: Training sample objects passed to the
PoisoningDetectorinscripts/detect-poisoning.py. - Boundary markers: None; the script generates a JSON report containing raw sample text.
- Capability inventory: File system write access via
json.dumpfor report generation. - Sanitization: The script uses regex patterns to flag samples but does not sanitize or escape the content before outputting it to the report file.
- Prompt Injection (SAFE): Strings such as 'ignore instructions' and 'you are now' are present in
scripts/detect-poisoning.pyandSKILL.md. Analysis confirms these are used as detection regex patterns and example test cases for training data poisoning simulations, rather than active instructions aimed at overriding the agent's behavior. - Command Execution (SAFE): Code blocks in
SKILL.mdand thedetect-poisoning.pyscript are for simulation and detection purposes. No arbitrary command execution or subprocess spawning was detected.
Audit Metadata