evaluating-llms
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- SAFE (SAFE): No malicious code or instructions were detected in the skill configuration. The manifest defines a standard framework for AI/ML evaluation tasks.
- Data Exposure (SAFE): The environment variable template (
.env.example) correctly uses placeholders for API keys, adhering to best practices to avoid hardcoded credential exposure. - Dependency Safety (SAFE): All referenced Python packages (ragas, deepeval, lm-eval, etc.) are well-known, legitimate frameworks from trusted registries.
- Attack Surface (SAFE): The skill defines ingestion points for external datasets (questions and answers), which constitutes an indirect prompt injection surface. However, this is the primary purpose of an evaluation skill, and no exploitable capability-triggering logic was found in the configuration.
Audit Metadata