langsmith-code-eval

Pass

Audited by Gen Agent Trust Hub on Feb 18, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [Command Execution] (LOW): The skill instructs the agent to execute internal Python scripts (inspect_trace.py and inspect_dataset.py) and encourages the creation and execution of custom evaluation code. This is expected behavior for a developer-focused skill.
  • [Indirect Prompt Injection] (LOW): The skill ingests data from LangSmith datasets which could contain untrusted content.
  • Ingestion points: scripts/inspect_dataset.py reads data from the LangSmith API.
  • Boundary markers: None; external data is printed directly to the terminal.
  • Capability inventory: The skill allows for local script execution and network access to LangSmith APIs.
  • Sanitization: No sanitization or escaping is performed on the data fetched from the dataset before it is displayed or used.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 18, 2026, 06:32 PM