Code From Image

Overview

This skill provides guidance for extracting code or pseudocode from images and implementing it correctly. It covers OCR tool selection, handling ambiguous text extraction, and verification strategies to ensure accurate implementation.

Workflow

Step 1: Environment Preparation

Before attempting to read an image, check available tools and packages:

Check what package managers are available (pip, pip3, uv, conda)
Check what image processing tools are installed (tesseract, pytesseract, PIL/Pillow)
Install missing dependencies before proceeding

This avoids wasted attempts with unavailable tools.

Step 2: Image Analysis

Examine the image before OCR extraction:

Use file <image> to verify the file type and ensure it's a valid image
Open the image visually if possible to understand content structure
Note the image quality, contrast, and text clarity

Step 3: OCR Extraction with Multiple Attempts

OCR is inherently error-prone. To maximize accuracy:

First attempt: Use standard OCR (pytesseract with default settings)
If output is garbled: Apply image preprocessing:
- Increase contrast
- Convert to grayscale
- Apply binarization (threshold)
- Resize the image (2x or 3x upscaling can help)
Compare outputs: If multiple OCR attempts yield different results, cross-reference them

Example preprocessing with PIL:

from PIL import Image, ImageEnhance, ImageFilter

img = Image.open("code.png")
# Convert to grayscale
img = img.convert("L")
# Increase contrast
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2.0)
# Apply threshold for binarization
img = img.point(lambda x: 0 if x < 128 else 255, '1')
img.save("preprocessed.png")

Step 4: Interpreting OCR Output

OCR frequently produces character substitution errors. Document all interpretations explicitly:

Common OCR Misreadings:

0 (zero) vs O (letter O) vs o (lowercase o)
1 (one) vs l (lowercase L) vs I (uppercase i)
S vs 5 vs $
G vs 6
B vs 8
: vs ;
sha256 may appear as cha256 or sha2S6
Variable names may have incorrect characters (e.g., GALT instead of SALT)
Quote characters may be mangled (6" instead of b" for byte strings)
Array slicing may be garbled (h0[:10] appearing as hof:10])

Process for interpretation:

List each unclear portion of the OCR output
Document the most likely correct interpretation
Explain reasoning for each interpretation
Flag any interpretations with high uncertainty

Step 5: Implementation

When implementing the extracted code:

Preserve the algorithm structure: Follow the logic as written, don't optimize prematurely
Handle encoding explicitly: For cryptographic operations, be explicit about string vs bytes encoding
Add basic error handling: Include try/except for file operations and external calls
Log intermediate values: Print or log intermediate results for debugging

Step 6: Verification

Verify the implementation systematically:

If a hint is provided (e.g., expected output prefix): Use it to validate, but don't rely on it exclusively
Trace through the algorithm manually: Verify your understanding matches the implementation
Test with known inputs: If possible, create test cases with predictable outputs
Check edge cases: Empty inputs, special characters, boundary conditions

Warning: Using hints as the sole validation is brittle. A correct output prefix doesn't guarantee the algorithm is fully correct for all inputs.

Common Pitfalls

OCR-Related

Accepting first OCR output without verification: Always cross-check unclear characters
Not documenting assumptions: When interpreting garbled text, explicitly state what you're assuming
Skipping preprocessing: Image enhancement significantly improves OCR accuracy

Implementation-Related

String vs bytes confusion: In Python, cryptographic functions often require bytes (b"string") not strings
Missing imports: Ensure all required modules are imported before running
Silent failures: Add explicit error messages for file operations

Verification-Related

Over-relying on partial hints: A matching prefix doesn't mean the full output is correct
Not validating intermediate steps: Check values at each stage, not just the final output
Assuming OCR was correct: If output doesn't match expectations, revisit OCR interpretation

Fallback Strategy

If the initial interpretation produces incorrect results:

Re-examine the original image, focusing on unclear characters
Try alternative OCR preprocessing techniques
List all ambiguous characters and test alternative interpretations systematically
If multiple interpretations exist, implement and test each one

Example Workflow

For a task like "Extract pseudocode from image and compute hash":

Check environment: which tesseract, pip3 list | grep -i pil
Install if needed: pip3 install pillow pytesseract
Analyze image: file code.png
Extract text with OCR
If garbled, preprocess image and retry OCR
Document interpretations: "OCR shows GALT = 6"0000... - interpreting as SALT = b"0000..." because G/S confusion is common and 6" likely represents b" for bytes"
Implement the algorithm
Verify output against any provided hints
If verification fails, revisit step 5-6 with alternative interpretations

code-from-image