The Agent Skills Directory

[COMMAND_EXECUTION]: The skill instructs the agent to execute various shell commands using the Bash(*) tool, including running project-specific scripts such as python3 tools/research_wiki.py and tools/save_trace.sh. Broad bash access allows for arbitrary system interaction.
[REMOTE_CODE_EXECUTION]: The workflow includes instructions to execute commands on remote servers via SSH (e.g., ssh server "tail -100 /path/to/training.log"). This pattern allows the agent to interact with and execute code on external infrastructure.
[DATA_EXFILTRATION]: The skill is designed to fetch data from external services, specifically Weights & Biases (W&B), using patterns like wandb.Api().run("<entity>/<project>/<run_id>").history(). This involves transmitting identifiers and potentially credentials to a remote API.
[PROMPT_INJECTION]: The skill has a surface for indirect prompt injection by processing data from files and remote sources and interpolating them into LLM prompts.
Ingestion points: Content is read from EXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, docs/research_contract.md, remote log files via SSH, and W&B experiment history.
Boundary markers: While the prompt for the Codex sub-agent uses structural headers (e.g., 'Experiments run:', 'Results:'), it lacks explicit delimiters or warnings to ignore malicious instructions embedded within the ingested experimental data or logs.
Capability inventory: The skill possesses Bash(*), Write, and Edit capabilities, and triggers local script executions (research_wiki.py).
Sanitization: There is no evidence of data sanitization or validation performed on the external metrics or log content before inclusion in the reasoning process.

result-to-claim