extract-test-set

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTION
Full Analysis
  • REMOTE_CODE_EXECUTION (HIGH): The skill instructions explicitly require the agent to 'Create an ad-hoc Python script' and subsequently 'Run the script'. Executing dynamically generated code is a critical security risk as it can be exploited to run arbitrary code on the host system.
  • COMMAND_EXECUTION (HIGH): The skill directs the agent to 'run the generated test case' using Pytest. This involves executing a command-line tool on files that were just generated based on potentially untrusted user input.
  • INDIRECT PROMPT INJECTION (HIGH): The skill possesses a high-risk capability tier by processing external data to generate executable scripts. Ingestion points: Untrusted data enters the context via smart contract addresses and blockchain explorer links. Boundary markers: There are no boundary markers or instructions to ignore embedded commands within the processed data. Capability inventory: The skill allows file creation, Python script execution, and shell command execution. Sanitization: There is no evidence of sanitization or validation of the user-provided inputs before they are interpolated into the generated Python script code.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 08:51 AM