nemo-evaluator-sdk
Warn
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION] (MEDIUM): In references/custom-benchmarks.md, Framework Definition Files (FDFs) utilize command templates (e.g., python -m ...) that are executed at runtime. This allows arbitrary command execution if an attacker provides a malicious FDF.
- [REMOTE_CODE_EXECUTION] (MEDIUM): In references/adapter-system.md, the adapter_config.discovery feature enables the system to discover and load Python modules from arbitrary directories or module paths, which can lead to code execution within the evaluator's process.
- [COMMAND_EXECUTION] (MEDIUM): The Local Executor described in references/execution-backends.md allows passing raw docker_args. This could be exploited to include high-risk flags like --privileged or unauthorized volume mounts, potentially leading to container escape.
- [DATA_EXFILTRATION] (LOW): The progress_tracking interceptor in references/adapter-system.md sends evaluation data to user-defined external URLs.
- [PROMPT_INJECTION] (LOW): In references/adapter-system.md, the reasoning interceptor extracts reasoning tokens from model output. 1. Ingestion points: model response processing. 2. Boundary markers: tags. 3. Capability inventory: subprocess execution (docker/sbatch), network operations, and file writing across all execution backends. 4. Sanitization: Absent, extracted tokens are tracked and logged without mentioned validation.
Audit Metadata