NYC
skills/letta-ai/skills/gpt2-codegolf

gpt2-codegolf

SKILL.md

GPT-2 Code Golf

Overview

This skill provides guidance for implementing GPT-2 or similar transformer model inference in minimal code, typically for code golf challenges. These tasks require parsing binary checkpoint formats, implementing BPE tokenization, and performing forward passes through transformer architectures—all within strict size constraints.

Critical Principles

1. Verify Before Optimizing

Never optimize for size before functionality is verified. A working 10KB solution is better than a broken 4KB solution. Size optimization should be the final step, not a constraint during development.

2. Incremental Development and Testing

Build and test components independently before integration:

  1. Checkpoint parsing - Verify weights load correctly
  2. Tokenization - Verify BPE encoding/decoding works
  3. Forward pass - Verify matrix operations produce reasonable outputs
  4. Integration - Combine components only after individual verification

3. Format Analysis Before Implementation

Before writing any parsing code, analyze the actual file formats:

# Examine checkpoint structure
hexdump -C model.ckpt.data-00000-of-00001 | head -100

# Check index file format
cat model.ckpt.index | xxd | head -50

# Examine vocabulary/BPE file structure
head -50 vocab.bpe

Approach: Checkpoint Parsing

Understanding TensorFlow Checkpoint Format

TensorFlow checkpoints consist of multiple files:

  • .index file: Contains tensor metadata (names, shapes, offsets)
  • .data-* files: Contains actual weight values

The index file uses protocol buffers. Parsing it correctly requires understanding:

  • Varint encoding for integers
  • String length prefixes
  • Tensor shape encodings

Verification Strategy for Weight Loading

After implementing checkpoint parsing, verify weights loaded correctly:

// Print statistics for loaded weights
printf("wte sum: %f\n", sum_tensor(wte, vocab_size * n_embd));
printf("wpe[0]: %f %f %f\n", wpe[0], wpe[1], wpe[2]);
printf("First layer attn weight samples: %f %f\n",
       attn_w[0][0], attn_w[0][100]);

Red flags indicating parsing failure:

  • All values are zero
  • All values are identical
  • Values are extremely large (e.g., 1e30+)
  • NaN or Inf values

Common Mistakes in Checkpoint Parsing

  1. Using arbitrary magic offsets - Offsets depend on tensor order and checkpoint version
  2. Ignoring endianness - TensorFlow uses little-endian floats
  3. Skipping index file - The data file alone lacks tensor boundaries
  4. Pattern matching on binary data - Binary patterns like 0x08 are not reliable markers

Approach: BPE Tokenization

Understanding BPE Structure

GPT-2 uses byte-pair encoding with:

  • A vocabulary file mapping tokens to IDs
  • Merge rules defining how to combine byte sequences

Verification Strategy for Tokenization

Test with known input/output pairs:

// Known tokenization for GPT-2
// "Hello" -> [15496]
// " world" -> [995]
// "Hello world" -> [15496, 995]

int tokens[MAX_TOKENS];
int n = tokenize("Hello world", tokens);
assert(n == 2);
assert(tokens[0] == 15496);
assert(tokens[1] == 995);

Common Mistakes in BPE Implementation

  1. Word-level splitting - GPT-2 BPE operates on bytes, not words
  2. Ignoring merge order - Merges must be applied in priority order
  3. Missing special handling - Spaces are encoded as part of tokens (e.g., " world" is one token)
  4. Reading wrong file sections - Vocabulary IDs vs merge rules are in different sections

Approach: Forward Pass

Verification Strategy for Forward Pass

  1. Test with simple inputs first:
// Single token input, check output shape and range
float logits[VOCAB_SIZE];
forward_pass(single_token, 1, logits);
// Logits should be roughly in range [-10, 10]
// Softmax should sum to 1.0
  1. Compare against reference implementation:
# Generate reference outputs with Hugging Face
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained('gpt2')
# Save intermediate activations for comparison
  1. Check numerical stability:
// After attention, values should be bounded
// After layer norm, mean should be ~0, std ~1

Common Numerical Issues

  1. Overflow in softmax - Subtract max before exponentiating
  2. Accumulation errors - Use double precision for reductions
  3. Missing layer normalization - Each transformer block requires layer norm

Verification Checklist

Before declaring completion, verify each component:

  • Checkpoint parsing: Print sample weights, verify non-zero and reasonable values
  • Vocabulary loading: Print sample token mappings, verify expected tokens exist
  • BPE encoding: Test with known strings, verify token IDs match reference
  • BPE decoding: Round-trip test: encode then decode should return original string
  • Forward pass: Single token produces logits in expected range
  • Sampling: Top token for common prefixes matches expected continuations
  • End-to-end: Generated text is coherent (not garbage or repeated patterns)

When to Simplify vs. Ask for Alternatives

If proper implementation of a component would exceed constraints:

  1. Checkpoint format too complex: Ask if weights can be provided in a simpler format (raw binary floats, NumPy arrays)
  2. Full BPE too large: Ask if a pre-tokenized input is acceptable
  3. Full transformer too large: Ask about acceptable accuracy tradeoffs (fewer layers, smaller dimensions)

Never implement a non-functional simplification silently. If corners must be cut, communicate clearly which functionality is affected.

Anti-Patterns to Avoid

  1. Claiming success without testing - Always run the program with actual inputs before completion
  2. Ignoring self-identified risks - If a limitation is noted, address it or communicate it
  3. Premature size optimization - Get it working first, then optimize
  4. Testing only happy path - Verify error handling and edge cases
  5. Assuming file format knowledge - Always examine actual files before parsing
Weekly Installs
18
Repository
letta-ai/skills
First Seen
Jan 24, 2026
Installed on
claude-code13
codex12
gemini-cli12
opencode12
antigravity11
cursor9