The Agent Skills Directory

Unverifiable Dependencies (LOW): The skill references standard machine learning packages including gymnasium, stable-baselines3, tensorboard, and optuna. These are widely used and trusted in the RL community.
Dynamic Execution (MEDIUM): In SKILL.md and references/algorithms.md, the code snippets utilize PPO.load(), DQN.load(), and similar methods. Stable-Baselines3 uses zipfile and pickle/cloudpickle to restore model weights and metadata. Loading a malicious model file (e.g., ppo_cartpole.zip) could result in arbitrary code execution on the user's machine.
Indirect Prompt Injection (INFO): The skill defines patterns for ingesting data from environments (observations). While these are typically numeric arrays in the provided examples, environments that return text-based observations represent a theoretical ingestion point for untrusted data into an agent's reasoning loop.

reinforcement-learning