The Agent Skills Directory

[COMMAND_EXECUTION]: The skill implements an adapter system that supports custom interceptor discovery, allowing the launcher to load and execute arbitrary Python modules from user-specified directories or module paths provided in the configuration.
[COMMAND_EXECUTION]: Through the Framework Definition File (FDF) system, the skill allows for the definition of custom shell commands that are executed via subprocess to run evaluation harnesses.
[CREDENTIALS_UNSAFE]: The skill is designed to handle and manage sensitive credentials, including NVIDIA NGC API keys, Hugging Face tokens (HF_TOKEN), and SSH private keys required for remote Slurm cluster communication.
[EXTERNAL_DOWNLOADS]: The skill fetches and executes Docker containers from external registries, including NVIDIA's official container registry (nvcr.io) and potentially unverified third-party registries for custom benchmarks.
[PROMPT_INJECTION]: The skill acts as a harness that processes untrusted data from over 100 academic benchmarks, presenting an indirect prompt injection attack surface.
Ingestion points: Automated ingestion of data from academic harnesses like MMLU, HumanEval, and GPQA Diamond within SKILL.md and configuration files.
Boundary markers: The provided documentation and examples do not define explicit delimiters or boundary markers to prevent the model from obeying instructions embedded within benchmark samples.
Capability inventory: The environment has extensive capabilities, including subprocess execution (Docker/Slurm), file system access (writing results/logs), and network access (NVIDIA Build, Lepton AI, MLflow, Weights & Biases).
Sanitization: No evidence of sanitization or filtering of benchmark content is present in the reference documentation.

nemo-evaluator-sdk