bouncer-feed-filter

Pass

Audited by Gen Agent Trust Hub on Apr 13, 2026

Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The skill implements an architecture that processes untrusted text from Twitter/X posts by directly interpolating it into prompts for AI classification. This creates a surface for indirect prompt injection, where a post could contain instructions designed to bypass the filter or manipulate the agent's behavior.
  • Ingestion points: The extractPost function in src/adapters/twitter.ts extracts text directly from the DOM of social media pages.
  • Boundary markers: The prompt template in src/models/classify.ts uses basic labels like 'Post text:' but lacks robust delimitation or instructions to ignore embedded commands.
  • Capability inventory: The extension has network access (fetch) and storage access (chrome.storage).
  • Sanitization: No explicit sanitization or escaping of the extracted social media text is shown before it is sent to the LLM.
  • [EXTERNAL_DOWNLOADS]: The installation guide instructs users to clone the source code from a remote repository at github.com/imbue-ai/bouncer.git. While this is standard for open-source development, it involves downloading and executing third-party code.
  • [DATA_EXFILTRATION]: To perform its primary function, the skill transmits feed content to external AI providers (such as OpenAI, Google, and Anthropic). Users should be aware that their viewed social media content is shared with these third-party services for classification.
  • [CREDENTIALS_UNSAFE]: The skill demonstrates positive security practices by explicitly directing users to store API keys in chrome.storage.local rather than hardcoding them, and includes code examples for secure retrieval.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 13, 2026, 01:04 AM