skill-testing-framework

Warn

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: MEDIUMCOMMAND_EXECUTION
Full Analysis
  • COMMAND_EXECUTION (MEDIUM): The skill is architected to execute arbitrary local scripts defined in test configuration files. While the primary execution script (run_tests.py) is missing from the provided files, the documentation (SKILL.md) and examples (assets/test_template.json, references/test_patterns.md) explicitly detail fields for script and args to be run by the framework. If an attacker can influence a test suite's content, they can achieve arbitrary command execution on the host.
  • INDIRECT_PROMPT_INJECTION (MEDIUM): The skill possesses a Category 8 attack surface as it is designed to ingest and act upon external test definitions and baseline output files.
  • Ingestion points: Files processed by generate_test_template.py, validate_test_results.py, and the referenced run_tests.py.
  • Boundary markers: None identified in the provided templates or scripts to distinguish between test data and malicious instructions.
  • Capability inventory: File system read/write (validate_test_results.py), directory creation, and script execution (referenced in documentation).
  • Sanitization: None. The scripts perform direct file operations and pattern matching using values directly from the input files.
  • DATA_EXPOSURE (LOW): The validate_test_results.py script includes a --create-baseline feature that uses shutil.copy2 to duplicate files. If mismanaged, this could be used to copy sensitive files into a baseline directory for later exfiltration or unauthorized access.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 16, 2026, 12:55 PM