python-refactor

Warn

Audited by Gen Agent Trust Hub on Mar 14, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [REMOTE_CODE_EXECUTION]: The script scripts/benchmark_changes.py uses the importlib library to dynamically load and execute Python modules from arbitrary file paths provided as arguments. Specifically, the functions load_module_from_file, importlib.util.spec_from_file_location, and spec.loader.exec_module are used to run code from the 'before', 'after', and 'test' modules provided for benchmarking. This behavior allows for the execution of arbitrary Python code during the validation phase of the refactoring workflow.
  • [COMMAND_EXECUTION]: Several scripts in the skill utilize subprocess.run to invoke external command-line tools for static analysis and complexity measurement. Specifically, scripts/analyze_multi_metrics.py executes complexipy and radon, while scripts/analyze_with_flake8.py executes flake8. These tools are run on code paths provided to the agent, which represents a capability to execute system commands based on the skill's operational context.
  • [PROMPT_INJECTION]: The skill presents a surface for indirect prompt injection as it is designed to ingest and process untrusted Python source code from users.
  • Ingestion points: The agent reads user-provided Python files for analysis and refactoring in SKILL.md (Phase 1: Analysis).
  • Boundary markers: The skill documentation includes a 'Regression Prevention' guide and strict workflow phases to guide the agent, though no explicit 'ignore embedded instructions' delimiters are used when reading code.
  • Capability inventory: The skill includes scripts capable of reading/writing files, executing command-line tools via subprocess (e.g., in analyze_with_flake8.py), and dynamically executing Python code (e.g., in benchmark_changes.py).
  • Sanitization: There is no explicit sanitization of the content of the code files being processed before they are analyzed or benchmarked.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 14, 2026, 01:11 PM