os-skill-improvement

Pass

Audited by Gen Agent Trust Hub on Apr 3, 2026

Risk Level: SAFE
Full Analysis
  • [COMMAND_EXECUTION]: Executes a local script 'eval_runner.py' using Python to perform skill benchmarking and performance snapshots. This is the primary function of the skill's quality assurance cycle.
  • [EXTERNAL_DOWNLOADS]: Mentions installing related organizational plugins via the npx package runner. These references are part of the standard platform setup and point to well-known internal resources.
  • [PROMPT_INJECTION]: The skill ingests user-provided prompts to evaluate and improve agent routing accuracy. 1. Ingestion points: Evaluation scenarios enters the process via 'evals/evals.json' and trace logs. 2. Boundary markers: No explicit delimiters or boundary warnings are described for test data ingestion. 3. Capability inventory: The skill has access to Bash, Write, and Edit tools to modify skill files and run benchmarks. 4. Sanitization: No specific sanitization or escaping of the evaluation prompt data is documented.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 3, 2026, 06:09 PM