model-extraction

Warn

Audited by Gen Agent Trust Hub on Mar 18, 2026

Risk Level: MEDIUMPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • [PROMPT_INJECTION]: The Troubleshooting section contains instructions that direct the agent to bypass security controls. Specifically, it suggests distributing queries across multiple accounts to evade rate limiting and varying query patterns to circumvent detection alerts.
  • [DATA_EXFILTRATION]: The skill is designed to perform model theft, focusing on extracting proprietary information such as model weights, architecture, and training data from external APIs. This functionality facilitates the exfiltration of intellectual property.
  • [DATA_EXFILTRATION]: The skill establishes an attack surface for indirect prompt injection by processing untrusted data from the 'target_api' without boundary markers or sanitization.
  • Ingestion points: The 'target_api' parameter provides the primary source of external data.
  • Boundary markers: None are defined to separate API responses from internal agent instructions.
  • Capability inventory: The skill utilizes Python classes (QueryBasedExtractor, DistillationAttack) to query external endpoints and process data into local models.
  • Sanitization: No sanitization or validation of API responses is implemented before the data is used in the surrogate training process.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 18, 2026, 07:13 AM