The Agent Skills Directory

[COMMAND_EXECUTION]: The skill utilizes several CLI commands to perform its discovery and delivery phases.
It runs langwatch prompt list, langwatch trace search, and langwatch trace get to gather context about the AI system being evaluated.
It executes git log to analyze commit history for potential failure modes.
It uses langwatch dataset create and langwatch dataset upload to manage the resulting data.
It also instructs the agent to write and execute a local Python or Node.js script to properly format CSV data, which is a standard procedure for ensuring data integrity.
[DATA_EXFILTRATION]: The skill is designed to collect local information (codebase structure, prompt definitions) and production data (traces) to generate synthetic datasets. These datasets are then uploaded to the LangWatch platform using the langwatch dataset upload command. As the skill and the CLI tool are provided by the same vendor (LangWatch), this represents the core intended functionality of the service.
[EXTERNAL_DOWNLOADS]: The skill fetches data from the LangWatch platform, including production traces and existing dataset information, via the langwatch CLI during the discovery phase.
[PROMPT_INJECTION]: The skill exhibits an indirect prompt injection surface as it processes data from untrusted external sources.
Ingestion points: The skill reads production traces via langwatch trace get, local codebase files, and user-provided documents like PDFs.
Boundary markers: The instructions do not specify the use of delimiters or 'ignore' instructions when processing the content of these external files or traces.
Capability inventory: The skill has the ability to execute shell commands (via langwatch and git), write local files (CSV/scripts), and perform network uploads (via the CLI).
Sanitization: There are no explicit instructions to sanitize or escape the content retrieved from traces or external documents before it is used to generate the dataset. However, per security guidelines, this finding is classified as low risk.

datasets