ais-bench

Warn

Audited by Gen Agent Trust Hub on Feb 25, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The utility scripts run_accuracy_test.sh and run_performance_test.sh use the sed command to programmatically modify Python source files (.py) at runtime to inject configuration parameters like host IPs and ports.
  • Evidence: sed -i.bak operations in scripts/run_accuracy_test.sh targeting files returned by the ais_bench --search command.
  • [EXTERNAL_DOWNLOADS]: The skill requires downloading the benchmark tool and various datasets from external sources including GitHub and Alibaba Cloud OSS.
  • Evidence: git clone https://github.com/AISBench/benchmark.git and dataset URLs from opencompass.oss-cn-shanghai.aliyuncs.com.
  • [REMOTE_CODE_EXECUTION]: The tool exposes a configuration option to trust remote code from model repositories, which allows the execution of arbitrary code bundled with model weights during the loading process.
  • Evidence: The trust_remote_code parameter is present and configurable in assets/model_config_template.py and references/model-configs.md.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 25, 2026, 03:18 PM