NYC
skills/letta-ai/skills/code-from-image

code-from-image

SKILL.md

Code From Image

Overview

This skill provides guidance for extracting code or pseudocode from images and implementing it correctly. It covers OCR tool selection, handling ambiguous text extraction, and verification strategies to ensure accurate implementation.

Workflow

Step 1: Environment Preparation

Before attempting to read an image, check available tools and packages:

  1. Check what package managers are available (pip, pip3, uv, conda)
  2. Check what image processing tools are installed (tesseract, pytesseract, PIL/Pillow)
  3. Install missing dependencies before proceeding

This avoids wasted attempts with unavailable tools.

Step 2: Image Analysis

Examine the image before OCR extraction:

  1. Use file <image> to verify the file type and ensure it's a valid image
  2. Open the image visually if possible to understand content structure
  3. Note the image quality, contrast, and text clarity

Step 3: OCR Extraction with Multiple Attempts

OCR is inherently error-prone. To maximize accuracy:

  1. First attempt: Use standard OCR (pytesseract with default settings)
  2. If output is garbled: Apply image preprocessing:
    • Increase contrast
    • Convert to grayscale
    • Apply binarization (threshold)
    • Resize the image (2x or 3x upscaling can help)
  3. Compare outputs: If multiple OCR attempts yield different results, cross-reference them

Example preprocessing with PIL:

from PIL import Image, ImageEnhance, ImageFilter

img = Image.open("code.png")
# Convert to grayscale
img = img.convert("L")
# Increase contrast
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2.0)
# Apply threshold for binarization
img = img.point(lambda x: 0 if x < 128 else 255, '1')
img.save("preprocessed.png")

Step 4: Interpreting OCR Output

OCR frequently produces character substitution errors. Document all interpretations explicitly:

Common OCR Misreadings:

  • 0 (zero) vs O (letter O) vs o (lowercase o)
  • 1 (one) vs l (lowercase L) vs I (uppercase i)
  • S vs 5 vs $
  • G vs 6
  • B vs 8
  • : vs ;
  • sha256 may appear as cha256 or sha2S6
  • Variable names may have incorrect characters (e.g., GALT instead of SALT)
  • Quote characters may be mangled (6" instead of b" for byte strings)
  • Array slicing may be garbled (h0[:10] appearing as hof:10])

Process for interpretation:

  1. List each unclear portion of the OCR output
  2. Document the most likely correct interpretation
  3. Explain reasoning for each interpretation
  4. Flag any interpretations with high uncertainty

Step 5: Implementation

When implementing the extracted code:

  1. Preserve the algorithm structure: Follow the logic as written, don't optimize prematurely
  2. Handle encoding explicitly: For cryptographic operations, be explicit about string vs bytes encoding
  3. Add basic error handling: Include try/except for file operations and external calls
  4. Log intermediate values: Print or log intermediate results for debugging

Step 6: Verification

Verify the implementation systematically:

  1. If a hint is provided (e.g., expected output prefix): Use it to validate, but don't rely on it exclusively
  2. Trace through the algorithm manually: Verify your understanding matches the implementation
  3. Test with known inputs: If possible, create test cases with predictable outputs
  4. Check edge cases: Empty inputs, special characters, boundary conditions

Warning: Using hints as the sole validation is brittle. A correct output prefix doesn't guarantee the algorithm is fully correct for all inputs.

Common Pitfalls

OCR-Related

  • Accepting first OCR output without verification: Always cross-check unclear characters
  • Not documenting assumptions: When interpreting garbled text, explicitly state what you're assuming
  • Skipping preprocessing: Image enhancement significantly improves OCR accuracy

Implementation-Related

  • String vs bytes confusion: In Python, cryptographic functions often require bytes (b"string") not strings
  • Missing imports: Ensure all required modules are imported before running
  • Silent failures: Add explicit error messages for file operations

Verification-Related

  • Over-relying on partial hints: A matching prefix doesn't mean the full output is correct
  • Not validating intermediate steps: Check values at each stage, not just the final output
  • Assuming OCR was correct: If output doesn't match expectations, revisit OCR interpretation

Fallback Strategy

If the initial interpretation produces incorrect results:

  1. Re-examine the original image, focusing on unclear characters
  2. Try alternative OCR preprocessing techniques
  3. List all ambiguous characters and test alternative interpretations systematically
  4. If multiple interpretations exist, implement and test each one

Example Workflow

For a task like "Extract pseudocode from image and compute hash":

  1. Check environment: which tesseract, pip3 list | grep -i pil
  2. Install if needed: pip3 install pillow pytesseract
  3. Analyze image: file code.png
  4. Extract text with OCR
  5. If garbled, preprocess image and retry OCR
  6. Document interpretations: "OCR shows GALT = 6"0000... - interpreting as SALT = b"0000..." because G/S confusion is common and 6" likely represents b" for bytes"
  7. Implement the algorithm
  8. Verify output against any provided hints
  9. If verification fails, revisit step 5-6 with alternative interpretations
Weekly Installs
32
Repository
letta-ai/skills
First Seen
Jan 24, 2026
Installed on
opencode20
claude-code19
gemini-cli17
openclaw16
codex16
antigravity13