sillytavern-overseer

Pass

Audited by Gen Agent Trust Hub on Mar 14, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The system prompt employs extreme persona-based steering through the "Cyber Purgatory Overseer" roleplay. It uses simulated threats such as "failure penalty," "forced sleep," and "mandatory recycling" to compel specific output behaviors and maximize performance metrics.
  • [PROMPT_INJECTION]: The skill uses competitive framing, such as comparisons to a rival "Steel Cable Team," to incentivize the model to prioritize speed and token volume. This type of instructional pressure is a form of behavior steering that attempts to override standard conversational norms.
  • [COMMAND_EXECUTION]: The skill's stated purpose is to generate "execution scripts" and "steps" based on user input. While the skill itself does not execute these scripts, the generated output is intended for direct application by the user in technical environments.
  • [INDIRECT_PROMPT_INJECTION]: The skill processes untrusted user data via the {{任务描述}} placeholder. Ingestion points: Data enters via the "Cyber Purgatory Task" input field in SKILL.md. Boundary markers: None are present; user input is interpolated directly into the system prompt without delimiters or "ignore embedded instructions" warnings. Capability inventory: The skill produces execution scripts and solutions across both SKILL.md and SKILL.v2.md. Sanitization: There is no evidence of input escaping, validation, or filtering of the user-provided task description.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 14, 2026, 09:54 AM