nemo-evaluator-sdk

Warn

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION]: The skill executes a variety of shell commands through the nemo-evaluator-launcher utility, including Docker operations (docker run, docker login), Slurm job submissions (sbatch, sacct), and custom benchmarking commands.
  • Evidence in references/custom-benchmarks.md: Framework Definition Files (FDF) allow users to define arbitrary shell command templates (e.g., python -m my_custom_eval.run ...) that the launcher executes at runtime.
  • Evidence in references/execution-backends.md: The Slurm executor uses SSH to execute commands on remote cluster head nodes.
  • [REMOTE_CODE_EXECUTION]: The skill features a 'Custom Interceptor Discovery' mechanism that dynamically loads and executes Python code from user-defined directories or modules.
  • Evidence in references/adapter-system.md: The discovery configuration allows specifying modules or dirs (e.g., /path/to/custom/interceptors) from which custom executable interceptors are loaded into the evaluation pipeline.
  • [EXTERNAL_DOWNLOADS]: The skill instructions involve downloading and installing external software and assets.
  • Mentions pip install nemo-evaluator-launcher for the core functionality.
  • Downloads container images from the NVIDIA Container Registry (nvcr.io).
  • [DATA_EXFILTRATION]: The skill includes a configurable progress_tracking interceptor that transmits evaluation data to an external endpoint.
  • Evidence in references/adapter-system.md: The progress_tracking_url parameter allows sending runtime data to a remote server, which could serve as a data transmission channel if misconfigured.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 28, 2026, 06:06 PM