nemo-evaluator-sdk
Warn
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [Dynamic Execution] (MEDIUM): The
adapter_config.discoveryfeature allows the system to load and execute Python modules from user-specified directories (dirs) and package names (modules). This facilitates dynamic code loading which, while intended for extensibility, can be exploited to run malicious code if the search paths are compromised. - Evidence:
references/adapter-system.mdspecifies configuration fordiscovery.modulesanddiscovery.dirs. - [Command Execution] (MEDIUM): Framework Definition Files (FDFs) utilize a
commandtemplate (e.g.,python -m my_custom_eval.run --model-id {model_id} ...) that is executed via the shell. The documentation does not specify sanitization for the placeholders, which could lead to command injection if variables likemodel_idortaskare manipulated. - Evidence:
references/custom-benchmarks.mddescribes thedefaults.commandtemplate in FDFs. - [Indirect Prompt Injection] (LOW): The framework is designed to ingest untrusted data (datasets) and prompts for evaluation purposes. It possesses high-privilege capabilities (Docker execution, network access) without documented input sanitization or boundary markers.
- Ingestion points: Custom datasets loaded via
dataset_dirand system prompts configured insystem_messageinterceptors. - Boundary markers: None mentioned; documentation suggests direct interpolation of parameters.
- Capability inventory: The system can spawn Docker containers, submit Slurm jobs via SSH, and make arbitrary HTTP requests through the interceptor pipeline.
- Sanitization: No mention of sanitization or escaping for command templates or prompt inputs.
- [Data Exposure & Exfiltration] (LOW): The
progress_trackinginterceptor allows the framework to send evaluation progress to an externalprogress_tracking_url. This capability could be used to exfiltrate data if a malicious URL is provided in the configuration.
Audit Metadata