report-research
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION] (HIGH): The workflow requires the agent to execute a shell command (
python3 -m agents.code.analysis.show_experiment) using variables like<best_model_results_name>and<experiment_name>. Since these values are determined by scanning the file system for directory and file names, a maliciously named file could lead to arbitrary command execution if the agent does not properly escape the input. - [PROMPT_INJECTION] (HIGH): This skill represents a Category 8 (Indirect Prompt Injection) vulnerability.
- Ingestion points: Data is ingested from
results/*.json,configs/, and file names within the project structure. - Boundary markers: None. There are no instructions to use delimiters or ignore instructions embedded within the metrics or configuration files.
- Capability inventory: The skill can execute shell commands and modify local files (
experiment.md). - Sanitization: None. The skill lacks any instructions to validate or sanitize the data extracted from JSON files or the file system before using it in commands or reports.
Recommendations
- AI detected serious security threats
Audit Metadata