data-poisoning

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • Indirect Prompt Injection (LOW/SAFE): The detect-poisoning.py script handles training data which may contain malicious instructions. While the script reports these findings, the exposure is limited to local report generation.
  • Ingestion points: Training sample objects passed to the PoisoningDetector in scripts/detect-poisoning.py.
  • Boundary markers: None; the script generates a JSON report containing raw sample text.
  • Capability inventory: File system write access via json.dump for report generation.
  • Sanitization: The script uses regex patterns to flag samples but does not sanitize or escape the content before outputting it to the report file.
  • Prompt Injection (SAFE): Strings such as 'ignore instructions' and 'you are now' are present in scripts/detect-poisoning.py and SKILL.md. Analysis confirms these are used as detection regex patterns and example test cases for training data poisoning simulations, rather than active instructions aimed at overriding the agent's behavior.
  • Command Execution (SAFE): Code blocks in SKILL.md and the detect-poisoning.py script are for simulation and detection purposes. No arbitrary command execution or subprocess spawning was detected.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:37 PM