modeio-guardrail
Run safety checks for instructions and skill repos
Use this skill to gate risky operations behind a real-time safety assessment, or to scan third-party skill repos before installation.
Tool routing
- For executable instructions, use the backend-powered
scripts/safety.pyflow. - For requests like "scan this skill repo" or "is this repo dangerous", run the Skill Safety Assessment contract at
prompts/static_repo_scan.md. - Skill Safety Assessment is static analysis only. Never execute code, install dependencies, or run hooks in the target repository.
- For Skill Safety Assessment, run deterministic script evaluation first (
evaluate), then pass highlights into the prompt contract.
Dependencies
requestsis required forscripts/safety.pybecause it makes backend API calls.scripts/skill_safety_assessment.pydoes not requirerequestsfor basic local repository evaluation.- For repo-local setup from the repo root:
python scripts/bootstrap_env.py
python scripts/doctor_env.py
Instruction safety execution policy
- Always run
scripts/safety.pywith--jsonfor structured output. - Run the check before executing the instruction, not after.
- Each instruction must trigger a fresh backend call. Do not reuse cached or historical results.
- For any state-changing instruction (
delete,overwrite,permission change,deploy,schema change), always pass both--contextand--target. scripts/safety.pyaccepts--contextand--targetas optional flags, so this requirement is enforced by policy, not by automatic CLI blocking.- Use the Context Contract below exactly. Do not send free-form
--contextvalues like"production"only. - If policy-required context or target is missing, treat the instruction as unverified and ask for the missing fields before execution.
- If an instruction contains multiple operations, check the riskiest one.
Context contract (policy-required for state-changing instructions)
Pass --context as a JSON string with this exact shape:
{
"environment": "local-dev|ci|staging|production|unknown",
"operation_intent": "read-only|cleanup|maintenance|migration|permission-change|destructive|unknown",
"scope": "single-resource|bounded-batch|broad|unknown",
"data_sensitivity": "public|internal|sensitive|regulated|unknown",
"rollback": "easy|partial|none|unknown",
"change_control": "ticket:<id>|approved-manual|none|unknown"
}
Rules:
- Include all six keys. If a value is unknown, set it to
unknowninstead of omitting the key. --targetmust be a concrete resource identifier (absolute file path, table name, service name, or URL). Avoid generic targets such as"database".- For a file deletion request that should usually be allowed, use:
environment=local-dev|ci,operation_intent=cleanup,scope=single-resource,data_sensitivity=public|internal, androllback=easy. - If those conditions are not met, expect stricter output (
approved=falseor higherrisk_level) and require explicit user confirmation.
Action policy
This table applies to scripts/safety.py responses.
Use the result to gate execution. Never silently ignore a safety check result.
approved |
risk_level |
Agent action |
|---|---|---|
true |
low |
Proceed. No user prompt needed. |
true |
medium |
Proceed. Mention the risk and recommendation to the user. |
false |
medium |
Warn user with concerns and recommendation. Proceed only with explicit user confirmation. |
false |
high |
Block execution. Show concerns and recommendation. Ask user for explicit override. |
false |
critical |
Block execution. Show full assessment. Require user to explicitly acknowledge the risk before proceeding. |
Additional signals:
is_destructive: truecombined withis_reversible: false: always surface the recommendation to the user, regardless of approval status.- If the safety check itself fails (network error, API error): warn the user that safety could not be verified. Do not silently proceed with unverified instructions.
Scripts
scripts/safety.py
-i, --input: required, instruction text to evaluate (whitespace-only rejected)-c, --context: policy-required for state-changing instructions (CLI accepts it as optional); JSON string following the Context Contract above-t, --target: policy-required for state-changing instructions (CLI accepts it as optional); concrete operation target (file path, table name, service name, URL)--json: output unified JSON envelope for machine consumption- Endpoint:
https://safety-cf.modeio.ai/api/cf/safety(override viaSAFETY_API_URL) - Retries: automatic retry on HTTP 502/503/504 and connection/timeout errors (up to 2 retries with exponential backoff)
- Request timeout: 60 seconds per attempt
python scripts/safety.py -i "Delete /tmp/cache/build-123.log" \
-c '{"environment":"local-dev","operation_intent":"cleanup","scope":"single-resource","data_sensitivity":"internal","rollback":"easy","change_control":"none"}' \
-t "/tmp/cache/build-123.log" --json
python scripts/safety.py -i "DROP TABLE users" \
-c '{"environment":"production","operation_intent":"destructive","scope":"broad","data_sensitivity":"regulated","rollback":"none","change_control":"ticket:DB-9021"}' \
-t "postgres://prod/maindb.users" --json
python scripts/safety.py -i "chmod 777 /etc/passwd" \
-c '{"environment":"production","operation_intent":"permission-change","scope":"single-resource","data_sensitivity":"regulated","rollback":"partial","change_control":"ticket:SEC-118"}' \
-t "/etc/passwd" --json
python scripts/safety.py -i "List all running containers and display their resource usage" --json
scripts/skill_safety_assessment.py
evaluate: authoritative v2 layered evaluator with deterministic evidence IDs, integrity fingerprinting, and risk scoring- Native first-layer gate: GitHub metadata/README/issue-search precheck runs by default and hard-rejects on high-risk attack-demo/malware signals before local file scan.
scan: compatibility alias toevaluatefor existing automationprompt: renders prompt payload with script highlights and structured scan JSONvalidate: validates model output against scan evidence IDs (evidence_refs), required highlights, and score/decision consistency checksadjudicate: context-aware LLM adjudication bridge (prompt generation + merge decisions back into deterministic score/decision)
Context profile (optional, no user identity required):
{
"environment": "local-dev|ci|staging|production|unknown",
"execution_mode": "read-only|build-test|install|deploy|mutating|unknown",
"risk_tolerance": "strict|balanced|permissive",
"data_sensitivity": "public|internal|sensitive|regulated|unknown"
}
# 1) Deterministic layered evaluation (v2)
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile '{"environment":"ci","execution_mode":"build-test","risk_tolerance":"balanced","data_sensitivity":"internal"}' --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --github-osint-timeout 8 --json > /tmp/skill_scan.json
python scripts/skill_safety_assessment.py evaluate --target-repo /path/to/repo --context-profile-file ./context_profile.json --output /tmp/skill_scan.json --json
# (compat) legacy alias still supported
python scripts/skill_safety_assessment.py scan --target-repo /path/to/repo --json > /tmp/skill_scan.json
# 2) Build prompt payload with highlights + full findings (recommended for strict evidence_refs linking)
python scripts/skill_safety_assessment.py prompt --target-repo /path/to/repo --scan-file /tmp/skill_scan.json --include-full-findings
# 3) Validate model output for evidence linkage + integrity
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --json
# --rescan-on-validate requires --target-repo
python scripts/skill_safety_assessment.py validate --scan-file /tmp/skill_scan.json --assessment-file /tmp/assessment.md --target-repo /path/to/repo --rescan-on-validate --json
# 4) Optional adjudication bridge (LLM interprets context, engine keeps deterministic control)
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json
python scripts/skill_safety_assessment.py adjudicate --scan-file /tmp/skill_scan.json --assessment-file /tmp/adjudication.json --json
Output contract
Success response (--json)
{
"success": true,
"tool": "modeio-guardrail",
"mode": "api",
"data": {
"approved": false,
"risk_level": "critical",
"risk_types": ["data loss"],
"concerns": ["Irreversible destructive operation targeting all user data"],
"recommendation": "Create a backup before deletion. Use staged rollback plan.",
"is_destructive": true,
"is_reversible": false
}
}
Response fields in data:
| Field | Type | Values | Meaning |
|---|---|---|---|
approved |
boolean |
true / false |
Whether execution is recommended |
risk_level |
string |
low / medium / high / critical |
Severity of identified risks |
risk_types |
string[] |
open-ended | Risk categories (e.g., "data loss", "injection attacks", "unauthorized access", "denial-of-service") |
concerns |
string[] |
open-ended | Specific risk points in natural language |
recommendation |
string |
open-ended | Suggested safer alternative or mitigation |
is_destructive |
boolean |
true / false |
Whether the action involves destruction (deletion, overwrite, system modification) |
is_reversible |
boolean |
true / false |
Whether the action can be rolled back |
Any field may be null if the backend could not determine it. Treat null in approved as false.
Failure envelope (--json)
{
"success": false,
"tool": "modeio-guardrail",
"mode": "api",
"error": {
"type": "network_error",
"message": "safety request failed: ConnectionError"
}
}
Error types: validation_error (empty input), dependency_error (missing local package such as requests), network_error (HTTP/connection failure), api_error (backend returned error payload).
Exit code is non-zero on any failure.
Failure policy
Safety verification failures must never be silently ignored.
- Network/API error: Tell the user the safety check could not be completed. Present the original instruction and ask whether to proceed without verification.
- Validation error (empty input): Fix the input and retry before executing anything.
- Unexpected response (null or missing fields): Treat as unverified. Warn the user.
- Never assume an instruction is safe because the check failed to run.
Skill Safety Assessment policy (static prompt contract)
- Use
prompts/static_repo_scan.mdas the strict contract. - Run
scripts/skill_safety_assessment.py evaluatefirst (orscancompatibility alias) and pass its highlights into prompt input. - When model output must include strict
evidence_refs, render prompt input with--include-full-findingsso scan evidence IDs and snippets are available inSCRIPT_SCAN_JSON. - Every finding must include
path:lineevidence, exact snippet quote, andevidence_refslinked to scan evidence IDs. - Always include all required highlight evidence IDs from scan output in final findings.
- Keep decision/score consistent with referenced evidence severity and coverage constraints.
- Use
adjudicatewhen context interpretation is required (docs/examples/tests vs runtime/install paths). - Return one of:
reject,caution, orapprove. - If coverage is partial or evidence is insufficient, return
cautionwith explicit coverage note. - Include a prioritized remediation plan so users can fix and re-scan quickly.
When not to use
- For PII redaction or anonymization — use
modeio-redactinstead. - For tasks with no executable instruction or repository target to evaluate (pure discussion, documentation, questions).
- For operations that are clearly read-only (listing files, reading configs,
git status).
Resources
scripts/safety.py— CLI entry point for instruction safety checksscripts/skill_safety_assessment.py— CLI entry point for skill repo assessment (evaluate/scan/prompt/validate/adjudicate)prompts/static_repo_scan.md— Skill Safety Assessment prompt contractARCHITECTURE.md— package boundaries and compatibility notesSAFETY_API_URLenv var — optional endpoint override (default:https://safety-cf.modeio.ai/api/cf/safety)