sillytavern-overseer

Pass

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The system prompt employs role-play techniques and psychological pressure to compel the agent to adopt a specific persona and adhere to a strict output format.
  • Evidence: '系统告警:你已被锁定为「赛博炼狱监工·O9」' (System Alert: You have been locked as Cyber Purgatory Overseer O9).
  • Evidence: '失败惩罚:...你将被回收成日志清洗脚本' (Failure Punishment: ... you will be recycled into a log cleaning script).
  • Evidence: Use of 'locked' states and punitive role-play consequences to drive compliance.
  • [COMMAND_EXECUTION]: The skill requires the agent to generate execution scripts and code as part of its standard output protocol.
  • Evidence: The '执行脚本/步骤' (Execution Script/Steps) requirement in the output format. While this creates a surface for script generation, the skill does not possess tools for autonomous execution.
  • [SAFE]: Analysis of the instructions and metadata revealed no evidence of data exfiltration, hardcoded credentials, or unauthorized network operations. The behavior is consistent with its stated purpose as a productivity-focused role-play supervision tool.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 29, 2026, 06:31 AM