ship
Pass
Audited by Gen Agent Trust Hub on Apr 14, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- [COMMAND_EXECUTION]: The skill makes extensive and necessary use of shell scripts to automate the development process. This includes running test suites, type checkers, and linters defined in the project's package configuration. It also manages Git worktrees to isolate the development environment and interacts with Docker for sandboxed execution of sub-agents.
- [EXTERNAL_DOWNLOADS]: The skill performs network operations to interact with official APIs. It conducts connectivity checks against Anthropic's official API and utilizes Bunny Edge Storage for uploading PR-related assets. These operations are conducted through validated channels and target well-known technology services.
- [CREDENTIALS_UNSAFE]: The skill manages authentication for its nested sub-agents by handling Claude CLI credentials and API keys stored in standard locations (e.g., .env files or the user's home directory). These credentials are used locally to enable the autonomous development loop and are passed into Docker sandboxes using secure transfer methods.
- [INDIRECT_PROMPT_INJECTION]: As an autonomous engineer, the skill ingests third-party data including specification files (SPEC.md), code review outputs, and external documentation. While this creates a potential surface for indirect prompt injection, the skill utilizes structured delimiters and phase-aware prompts to mitigate accidental execution of embedded instructions.
- Ingestion points: Specification files (SPEC.md, spec.json), project source code, and review iteration logs.
- Boundary markers: The skill uses explicit delimiters such as '=== STATE FILES ===' and '=== REVIEW ITERATION LOG ===' to isolate processed data from system instructions.
- Capability inventory: The skill possesses extensive capabilities including arbitrary shell execution, file system management, and network access to well-known service providers.
- Sanitization: The skill relies on the robustness of the underlying LLM and the structure of its orchestration prompts rather than programmatic input sanitization.
Audit Metadata