The Agent Skills Directory

[PROMPT_INJECTION] (HIGH): The skill is highly susceptible to Indirect Prompt Injection (Category 8) because it ingests and processes untrusted external data.
Ingestion points: HF dataset IDs (e.g., via mlabonne/FineTome-100k) and local file paths (e.g., ./data.jsonl) specified in Workflow Step 1.
Boundary markers: Absent. The agent is instructed to 'Load and Detect Format' and 'Show Samples' for visual verification without delimiting the untrusted content.
Capability inventory: Command execution (python scripts/validate_dataset.py), file system read/write, and network operations (Hugging Face upload/download).
Sanitization: No sanitization or filtering of the dataset content is described before the agent processes or displays it.
[COMMAND_EXECUTION] (HIGH): The skill explicitly instructs the agent to execute a bundled script: python scripts/validate_dataset.py. This script takes a --dataset argument which can be a user-provided string or path. Without validation of the dataset-id input, this creates a vector for command injection or unauthorized file access.
[EXTERNAL_DOWNLOADS] (MEDIUM): The skill relies on fetching arbitrary data from the Hugging Face Hub. While Hugging Face is a common repository, the lack of integrity checks (like commit hashes) for downloaded datasets allows for potential supply chain attacks if a dataset is replaced with malicious content.
[DATA_EXFILTRATION] (LOW): The 'HF Upload' feature implies the agent may handle sensitive credentials such as HF_TOKEN. While no keys are hardcoded, the capability to transmit data to an external service (Hugging Face) could be abused if combined with a successful prompt injection attack.

funsloth-check