biomni
Fail
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSCREDENTIALS_UNSAFE
Full Analysis
- COMMAND_EXECUTION (HIGH): The skill documentation explicitly states that it executes LLM-generated code with 'full system privileges'. This capability allows the agent to perform any operation the host user can, including modifying system files, installing software, or accessing sensitive data.
- REMOTE_CODE_EXECUTION (HIGH): The core functionality involves 'dynamic analysis pipeline creation' where the agent autonomously writes and runs Python code. This is a direct RCE vector; if an attacker can influence the agent's input (Indirect Prompt Injection via research data), they can execute arbitrary code on the user's machine.
- EXTERNAL_DOWNLOADS (MEDIUM): The framework automatically downloads approximately 11GB of biomedical databases on first use. This large-scale data ingestion from external sources increases the attack surface for data poisoning and supply chain attacks.
- CREDENTIALS_UNSAFE (LOW): The setup instructions involve exporting 'ANTHROPIC_API_KEY' and other LLM provider keys. Given the agent's ability to execute code, these credentials are at risk of exfiltration if the agent is compromised.
- PROMPT_INJECTION (LOW): The skill is highly vulnerable to Indirect Prompt Injection (Category 8).
- Ingestion points: Processes external GWAS files, h5ad datasets, and PubMed literature indices.
- Boundary markers: None mentioned in the skill definition to separate data from instructions.
- Capability inventory: Full Python code execution and system access via
agent.go(). - Sanitization: Documentation suggests running in Docker/VMs as a mitigation, acknowledging the lack of internal sanitization.
Recommendations
- AI detected serious security threats
Audit Metadata