math-olympiad

Pass

Audited by Gen Agent Trust Hub on Mar 30, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • Local Code Execution: The 'Deep mode' in the workflow allows the agent to utilize local Bash and Python for mathematical verification tasks, such as symbolic identity checks and modular arithmetic. These actions are governed by strict instructions to maintain local operation and avoid network access to preserve the integrity of the solving process.
  • LaTeX Document Compilation: The skill provides utility scripts to compile verified mathematical proofs into PDF format using LaTeX. This process involves executing a local LaTeX compiler on generated proof text. The workflow mitigates risks by only passing the final, human-readable proof to the compilation pass after multiple verification rounds.
  • Indirect Prompt Injection Surface: The skill processes user-supplied mathematics problems that are eventually formatted into a LaTeX document. While this presents a potential surface for LaTeX injection, the design incorporates multiple agent-based cleaning and verification passes that serve to normalize content before compilation. (Ingestion point: User-provided problem statement; Boundary markers: Prompts in attempt_agent.md use labels; Capability inventory: Local LaTeX, Bash, and Python execution; Sanitization: Multi-stage cleaning and verification layers).
  • Environment Utilities: The skill includes helper scripts to verify the presence of required tools like pdflatex. These scripts perform standard system checks to determine available capabilities without altering the underlying system configuration.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 30, 2026, 08:37 PM