systematic-debugging

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHDATA_EXFILTRATIONPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [Data Exposure & Exfiltration] (HIGH): The skill provides diagnostic bash examples that expose sensitive system information to the agent's context and logs.
  • Evidence: Phase 1, Step 4 (File: SKILL.md) contains commands to dump the environment (env | grep IDENTITY) and list macOS keychain identities (security list-keychains, security find-identity).
  • Risk: Environment variables and keychains frequently contain API keys, signing certificates, and other secrets which would be exposed in plain text to the LLM context.
  • [Indirect Prompt Injection] (HIGH): The skill creates a vulnerable pipeline where the agent is instructed to ingest untrusted data and act upon it.
  • Ingestion points: Error messages, component inputs, and outputs (SKILL.md, Phase 1, Steps 1 & 4).
  • Boundary markers: Absent. The skill provides no instructions for delimiting or ignoring instructions inside log data.
  • Capability inventory: The agent has capabilities for bash execution (diagnostic examples) and file modification (Phase 4: 'Implement Single Fix').
  • Sanitization: Absent. The skill lacks validation or escaping for the processed logs/errors.
  • Risk: A malicious payload inside an error message or API response could leverage the 'debugging' context to execute arbitrary commands or modify files.
  • [Command Execution] (LOW): The skill utilizes shell commands for system diagnostics and reproduction.
  • Evidence: SKILL.md provides templates for bash, security, and codesign commands.
  • Risk: While common in debugging, these provide the mechanism for the exposures and injections noted above.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 06:04 AM