report-research

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The workflow requires the agent to execute a shell command (python3 -m agents.code.analysis.show_experiment) using variables like <best_model_results_name> and <experiment_name>. Since these values are determined by scanning the file system for directory and file names, a maliciously named file could lead to arbitrary command execution if the agent does not properly escape the input.
  • [PROMPT_INJECTION] (HIGH): This skill represents a Category 8 (Indirect Prompt Injection) vulnerability.
  • Ingestion points: Data is ingested from results/*.json, configs/, and file names within the project structure.
  • Boundary markers: None. There are no instructions to use delimiters or ignore instructions embedded within the metrics or configuration files.
  • Capability inventory: The skill can execute shell commands and modify local files (experiment.md).
  • Sanitization: None. The skill lacks any instructions to validate or sanitize the data extracted from JSON files or the file system before using it in commands or reports.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 11:11 AM