numerai-experiment-design

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • [COMMAND_EXECUTION] (SAFE): The skill executes local Python modules (e.g., agents.code.modeling) to automate model training and analysis. This is the primary and intended function of the skill within its research context.
  • [REMOTE_CODE_EXECUTION] (SAFE): No patterns for downloading or executing scripts from remote, untrusted, or external URLs were found.
  • [DATA_EXFILTRATION] (SAFE): Network interaction is limited to model deployment via specific MCP tools (create_model, upload_model) intended for the Numerai platform. No unauthorized data transfer patterns were observed.
  • [PROMPT_INJECTION] (SAFE): The instructions do not contain attempts to override system prompts, bypass safety filters, or disclose internal instructions.
  • [INDIRECT_PROMPT_INJECTION] (LOW): The skill processes user-defined model ideas and experiment results. While it lacks explicit boundary markers for this data, the ingestion is performed within a controlled data science workflow, presenting minimal risk.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 06:40 PM