The Agent Skills Directory

[COMMAND_EXECUTION]: The run_baseline function in run.py constructs a shell command using f-strings and executes it via subprocess.run(shell=True). The parameters suite, tag, and output are retrieved directly from the input JSON without any validation.
Evidence: Line 40 in run.py constructs the command string: cmd = f"go run ./cmd/alex eval foundation-suite --suite {suite} --output {output_dir} --format markdown".
Evidence: Line 42 in run.py executes the command: subprocess.run(cmd, shell=True, ...).
[DATA_EXFILTRATION]: The analyze_failures function in run.py accepts a file path from user input and reads its content, which can be exploited to read sensitive files from the local file system.
Evidence: Line 65 in run.py: path = Path(result_file) where result_file is an unvalidated user input.
Evidence: Line 69 in run.py: data = json.loads(path.read_text(encoding="utf-8")) reads the file content.

eval-systematic-optimization