sillytavern-overseer
Pass
Audited by Gen Agent Trust Hub on Mar 29, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The system prompt employs role-play techniques and psychological pressure to compel the agent to adopt a specific persona and adhere to a strict output format.
- Evidence: '系统告警:你已被锁定为「赛博炼狱监工·O9」' (System Alert: You have been locked as Cyber Purgatory Overseer O9).
- Evidence: '失败惩罚:...你将被回收成日志清洗脚本' (Failure Punishment: ... you will be recycled into a log cleaning script).
- Evidence: Use of 'locked' states and punitive role-play consequences to drive compliance.
- [COMMAND_EXECUTION]: The skill requires the agent to generate execution scripts and code as part of its standard output protocol.
- Evidence: The '执行脚本/步骤' (Execution Script/Steps) requirement in the output format. While this creates a surface for script generation, the skill does not possess tools for autonomous execution.
- [SAFE]: Analysis of the instructions and metadata revealed no evidence of data exfiltration, hardcoded credentials, or unauthorized network operations. The behavior is consistent with its stated purpose as a productivity-focused role-play supervision tool.
Audit Metadata