nemo-evaluator-sdk

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [Dynamic Execution] (MEDIUM): The adapter_config.discovery feature allows the system to load and execute Python modules from user-specified directories (dirs) and package names (modules). This facilitates dynamic code loading which, while intended for extensibility, can be exploited to run malicious code if the search paths are compromised.
  • Evidence: references/adapter-system.md specifies configuration for discovery.modules and discovery.dirs.
  • [Command Execution] (MEDIUM): Framework Definition Files (FDFs) utilize a command template (e.g., python -m my_custom_eval.run --model-id {model_id} ...) that is executed via the shell. The documentation does not specify sanitization for the placeholders, which could lead to command injection if variables like model_id or task are manipulated.
  • Evidence: references/custom-benchmarks.md describes the defaults.command template in FDFs.
  • [Indirect Prompt Injection] (LOW): The framework is designed to ingest untrusted data (datasets) and prompts for evaluation purposes. It possesses high-privilege capabilities (Docker execution, network access) without documented input sanitization or boundary markers.
  • Ingestion points: Custom datasets loaded via dataset_dir and system prompts configured in system_message interceptors.
  • Boundary markers: None mentioned; documentation suggests direct interpolation of parameters.
  • Capability inventory: The system can spawn Docker containers, submit Slurm jobs via SSH, and make arbitrary HTTP requests through the interceptor pipeline.
  • Sanitization: No mention of sanitization or escaping for command templates or prompt inputs.
  • [Data Exposure & Exfiltration] (LOW): The progress_tracking interceptor allows the framework to send evaluation progress to an external progress_tracking_url. This capability could be used to exfiltrate data if a malicious URL is provided in the configuration.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 06:21 PM