langgraph-testing-evaluation
Fail
Audited by Gen Agent Trust Hub on Feb 14, 2026
Risk Level: HIGHREMOTE_CODE_EXECUTIONPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- REMOTE_CODE_EXECUTION (HIGH): The Node.js scripts
compare_agents.js,evaluate_with_langsmith.js,generate_test_cases.js, andrun_trajectory_eval.jsutilize dynamicimport()to load and execute code from absolute file paths provided as command-line arguments. This behavior allows for arbitrary code execution if an attacker can control the path argument through prompt injection. - PROMPT_INJECTION (HIGH): Per Category 8 (Indirect Prompt Injection), the skill ingests untrusted content from local JSON datasets or remote LangSmith repositories. These inputs are passed directly to the agent functions being evaluated without sanitization or boundary markers, allowing malicious entries in a dataset to override agent instructions.
- COMMAND_EXECUTION (MEDIUM): The skill relies on and encourages the execution of shell commands (
uv run,node) that take variable file paths and module names as input, which can be exploited for argument injection or unauthorized file execution. - DATA_EXFILTRATION (LOW): The skill is designed to interact with LangSmith, an external service, to upload datasets and experiment results. While intended, this establishes a network data flow that could be abused to exfiltrate sensitive information if improperly configured.
Recommendations
- AI detected serious security threats
Audit Metadata