bouncer-feed-filter
Pass
Audited by Gen Agent Trust Hub on Apr 13, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill implements an architecture that processes untrusted text from Twitter/X posts by directly interpolating it into prompts for AI classification. This creates a surface for indirect prompt injection, where a post could contain instructions designed to bypass the filter or manipulate the agent's behavior.
- Ingestion points: The
extractPostfunction insrc/adapters/twitter.tsextracts text directly from the DOM of social media pages. - Boundary markers: The prompt template in
src/models/classify.tsuses basic labels like 'Post text:' but lacks robust delimitation or instructions to ignore embedded commands. - Capability inventory: The extension has network access (
fetch) and storage access (chrome.storage). - Sanitization: No explicit sanitization or escaping of the extracted social media text is shown before it is sent to the LLM.
- [EXTERNAL_DOWNLOADS]: The installation guide instructs users to clone the source code from a remote repository at
github.com/imbue-ai/bouncer.git. While this is standard for open-source development, it involves downloading and executing third-party code. - [DATA_EXFILTRATION]: To perform its primary function, the skill transmits feed content to external AI providers (such as OpenAI, Google, and Anthropic). Users should be aware that their viewed social media content is shared with these third-party services for classification.
- [CREDENTIALS_UNSAFE]: The skill demonstrates positive security practices by explicitly directing users to store API keys in
chrome.storage.localrather than hardcoding them, and includes code examples for secure retrieval.
Audit Metadata