classical-cipher-analysis
SKILL: Classical Cipher Analysis — Expert Cryptanalysis Playbook
AI LOAD INSTRUCTION: Expert classical cipher identification and breaking techniques for CTF. Covers cipher identification methodology (frequency analysis, IC, Kasiski), monoalphabetic substitution, Caesar/ROT, Vigenere, Enigma, affine, Hill, transposition ciphers, Bacon/Polybius/Playfair, and XOR ciphers. Base models often skip the identification step and jump to the wrong cipher type, or fail to recognize encoded (base64/hex) ciphertext that needs decoding before analysis.
0. RELATED ROUTING
- symmetric-cipher-attacks when dealing with modern symmetric ciphers (AES/DES) rather than classical
- hash-attack-techniques when the challenge involves hash-based constructions
- lattice-crypto-attacks when knapsack-based ciphers are encountered
Quick identification guide
| Observation | Likely Cipher | First Action |
|---|---|---|
| All uppercase letters, uneven frequency | Monoalphabetic substitution | Frequency analysis |
| All uppercase, flat frequency distribution | Polyalphabetic (Vigenere) | IC + Kasiski |
| Only A-Z shifted uniformly | Caesar/ROT | Brute force 25 shifts |
| Base64 alphabet (A-Za-z0-9+/=) | Base64 encoded (decode first) | Base64 decode |
| Hex string (0-9a-f) | Hex encoded (decode first) | Hex decode |
| Binary (0s and 1s) | Binary encoded | Convert to ASCII |
| Dots and dashes | Morse code | Morse decode |
| Raised/normal text pattern | Bacon cipher | Map to A/B, decode |
| 2-digit number pairs (11-55) | Polybius square | Grid lookup |
| Text appears scrambled (right letters, wrong order) | Transposition | Anagram analysis |
| Non-printable bytes XOR-like | XOR cipher | Single/repeating key XOR analysis |
1. CIPHER IDENTIFICATION METHODOLOGY
1.1 Step 1: Character Set Analysis
def analyze_charset(ciphertext):
"""Identify encoding/cipher by character set."""
chars = set(ciphertext.strip())
if chars <= set('01 \n'):
return "Binary encoding"
if chars <= set('.-/ \n'):
return "Morse code"
if chars <= set('0123456789abcdef \n'):
return "Hex encoding"
if chars <= set('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=\n'):
if '=' in ciphertext or len(ciphertext) % 4 == 0:
return "Base64 encoding"
if chars <= set('ABCDEFGHIJKLMNOPQRSTUVWXYZ \n'):
return "Uppercase only — classical cipher"
if all(c in '12345' for c in ciphertext.replace(' ', '').replace('\n', '')):
return "Polybius square (digits 1-5)"
return "Mixed charset — needs further analysis"
1.2 Step 2: Frequency Analysis
from collections import Counter
def frequency_analysis(text):
"""Compute letter frequency distribution."""
text = text.upper()
letters = [c for c in text if c.isalpha()]
total = len(letters)
freq = Counter(letters)
print("Letter frequencies:")
for letter, count in freq.most_common():
pct = count / total * 100
bar = '#' * int(pct)
print(f" {letter}: {pct:5.1f}% {bar}")
return freq
# English letter frequency (for comparison):
# E T A O I N S H R D L C U M W F G Y P B V K J X Q Z
# 12.7 9.1 8.2 7.5 7.0 6.7 6.3 6.1 6.0 4.3 4.0 2.8 ...
1.3 Step 3: Index of Coincidence (IC)
def index_of_coincidence(text):
"""
IC ≈ 0.065 → English / monoalphabetic substitution
IC ≈ 0.038 → random / polyalphabetic cipher
"""
text = [c for c in text.upper() if c.isalpha()]
N = len(text)
freq = Counter(text)
ic = sum(f * (f - 1) for f in freq.values()) / (N * (N - 1))
return ic
# Interpretation:
# IC > 0.060 → monoalphabetic (Caesar, simple substitution, Playfair)
# IC ≈ 0.045-0.055 → polyalphabetic with short key (Vigenere key < 10)
# IC ≈ 0.038-0.042 → polyalphabetic with long key or random
1.4 Step 4: Kasiski Examination (for Polyalphabetic)
from math import gcd
from functools import reduce
def kasiski(ciphertext, min_len=3):
"""Find repeated sequences and their distances → key length."""
text = ''.join(c for c in ciphertext.upper() if c.isalpha())
distances = []
for length in range(min_len, min(20, len(text) // 3)):
for i in range(len(text) - length):
seq = text[i:i+length]
j = text.find(seq, i + 1)
while j != -1:
distances.append(j - i)
j = text.find(seq, j + 1)
if not distances:
return None
# Key length is likely GCD of common distances
common_gcds = Counter()
for d in distances:
for factor in range(2, min(d + 1, 30)):
if d % factor == 0:
common_gcds[factor] += 1
print("Likely key lengths (by frequency):")
for length, count in common_gcds.most_common(5):
print(f" Key length {length}: {count} occurrences")
return common_gcds.most_common(1)[0][0]
2. MONOALPHABETIC SUBSTITUTION
2.1 Frequency Analysis Attack
def solve_substitution(ciphertext, interactive=False):
"""Solve monoalphabetic substitution via frequency analysis."""
freq = frequency_analysis(ciphertext)
# English frequency order
eng_order = "ETAOINSRHLDCUMWFGYPBVKJXQZ"
cipher_order = ''.join(c for c, _ in freq.most_common())
# Initial mapping (frequency-based guess)
mapping = {}
for i, c in enumerate(cipher_order):
if i < len(eng_order):
mapping[c] = eng_order[i]
# Apply mapping
result = ""
for c in ciphertext.upper():
result += mapping.get(c, c)
return result, mapping
# Better approach: use automated solvers
# quipqiup.com — online substitution solver
# dcode.fr/monoalphabetic-substitution — with word pattern matching
2.2 Known Plaintext (Crib Dragging)
If part of the plaintext is known (e.g., "flag{" prefix):
def crib_drag_substitution(ciphertext, known_plain, known_cipher):
"""Build partial mapping from known plaintext-ciphertext pair."""
mapping = {}
for p, c in zip(known_plain.upper(), known_cipher.upper()):
mapping[c] = p
# Apply partial mapping
result = ""
for c in ciphertext.upper():
result += mapping.get(c, '?')
return result, mapping
3. CAESAR / ROT CIPHERS
3.1 Brute Force
def caesar_bruteforce(ciphertext):
"""Try all 25 shifts, score by English frequency."""
results = []
for shift in range(26):
decrypted = ""
for c in ciphertext:
if c.isalpha():
base = ord('A') if c.isupper() else ord('a')
decrypted += chr((ord(c) - base - shift) % 26 + base)
else:
decrypted += c
# Chi-squared scoring against English frequency
score = chi_squared_score(decrypted)
results.append((shift, score, decrypted))
results.sort(key=lambda x: x[1])
return results[0] # best match
def chi_squared_score(text):
"""Lower score = closer to English."""
expected = {
'E': 12.7, 'T': 9.1, 'A': 8.2, 'O': 7.5, 'I': 7.0,
'N': 6.7, 'S': 6.3, 'H': 6.1, 'R': 6.0, 'D': 4.3,
'L': 4.0, 'C': 2.8, 'U': 2.8, 'M': 2.4, 'W': 2.4,
'F': 2.2, 'G': 2.0, 'Y': 2.0, 'P': 1.9, 'B': 1.5,
'V': 1.0, 'K': 0.8, 'J': 0.2, 'X': 0.2, 'Q': 0.1, 'Z': 0.1,
}
text = text.upper()
letters = [c for c in text if c.isalpha()]
total = len(letters)
if total == 0:
return float('inf')
freq = Counter(letters)
score = sum(
(freq.get(c, 0) / total * 100 - expected.get(c, 0)) ** 2 / max(expected.get(c, 0.1), 0.1)
for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
)
return score
3.2 ROT13 and ROT47
import codecs
# ROT13 (letters only)
rot13 = codecs.decode(ciphertext, 'rot_13')
# ROT47 (ASCII 33-126)
def rot47(text):
return ''.join(
chr(33 + (ord(c) - 33 + 47) % 94) if 33 <= ord(c) <= 126 else c
for c in text
)
4. VIGENERE CIPHER
4.1 Full Attack Workflow
Step 1: Confirm polyalphabetic (IC ≈ 0.04-0.05)
Step 2: Find key length (Kasiski + IC per period)
Step 3: For each key position, solve as single Caesar cipher
Step 4: Assemble key → decrypt
4.2 IC-Based Key Length Detection
def find_vigenere_key_length(ciphertext, max_key=20):
"""Use IC to find Vigenere key length."""
text = [c for c in ciphertext.upper() if c.isalpha()]
results = []
for kl in range(1, max_key + 1):
# Split text into kl columns
columns = [[] for _ in range(kl)]
for i, c in enumerate(text):
columns[i % kl].append(c)
# Average IC across columns
avg_ic = sum(
index_of_coincidence(''.join(col)) for col in columns
) / kl
results.append((kl, avg_ic))
print(f" Key length {kl:2d}: IC = {avg_ic:.4f}")
# Key length with IC closest to 0.065
best = max(results, key=lambda x: x[1])
return best[0]
4.3 Per-Position Frequency Attack
def crack_vigenere(ciphertext, key_length):
"""Crack Vigenere given known key length."""
text = [c for c in ciphertext.upper() if c.isalpha()]
key = ""
for pos in range(key_length):
column = ''.join(text[i] for i in range(pos, len(text), key_length))
# Solve as Caesar cipher
shift, score, _ = caesar_bruteforce(column)
key += chr(shift + ord('A'))
# Decrypt
plaintext = ""
ki = 0
for c in ciphertext:
if c.isalpha():
shift = ord(key[ki % key_length]) - ord('A')
base = ord('A') if c.isupper() else ord('a')
plaintext += chr((ord(c) - base - shift) % 26 + base)
ki += 1
else:
plaintext += c
return key, plaintext
5. AFFINE CIPHER
5.1 Definition
E(x) = (a·x + b) mod 26 where gcd(a, 26) = 1.
Valid a values: 1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25 (12 values).
5.2 Brute Force (312 combinations)
def crack_affine(ciphertext):
"""Brute force affine cipher: 12 × 26 = 312 combinations."""
valid_a = [a for a in range(1, 26) if gcd(a, 26) == 1]
for a in valid_a:
a_inv = pow(a, -1, 26)
for b in range(26):
plaintext = ""
for c in ciphertext.upper():
if c.isalpha():
y = ord(c) - ord('A')
x = (a_inv * (y - b)) % 26
plaintext += chr(x + ord('A'))
else:
plaintext += c
score = chi_squared_score(plaintext)
if score < 50: # reasonable English
print(f"a={a}, b={b}: {plaintext[:50]}...")
5.3 Known Plaintext
def affine_from_known(plain1, cipher1, plain2, cipher2):
"""Recover (a, b) from two known plaintext-ciphertext pairs."""
p1, c1 = ord(plain1) - ord('A'), ord(cipher1) - ord('A')
p2, c2 = ord(plain2) - ord('A'), ord(cipher2) - ord('A')
# c1 = a*p1 + b, c2 = a*p2 + b
# c1 - c2 = a*(p1 - p2) mod 26
diff_p = (p1 - p2) % 26
diff_c = (c1 - c2) % 26
if gcd(diff_p, 26) != 1:
return None
a = (diff_c * pow(diff_p, -1, 26)) % 26
b = (c1 - a * p1) % 26
return a, b
6. HILL CIPHER
Matrix-based cipher: C = K · P mod 26 where K is an n×n key matrix.
6.1 Known-Plaintext Attack
import numpy as np
def crack_hill(known_plain, known_cipher, n=2):
"""Recover Hill cipher key from known plaintext-ciphertext (mod 26)."""
# Convert to numbers
P = [ord(c) - ord('A') for c in known_plain.upper()]
C = [ord(c) - ord('A') for c in known_cipher.upper()]
# Build matrices (need at least n pairs of n-grams)
P_matrix = np.array(P[:n*n]).reshape(n, n).T
C_matrix = np.array(C[:n*n]).reshape(n, n).T
# K = C · P⁻¹ mod 26
# Need modular matrix inverse
from sympy import Matrix
P_mat = Matrix(P_matrix.tolist())
C_mat = Matrix(C_matrix.tolist())
P_inv = P_mat.inv_mod(26)
K = (C_mat * P_inv) % 26
return K
7. TRANSPOSITION CIPHERS
7.1 Rail Fence
def rail_fence_decrypt(ciphertext, rails):
"""Decrypt rail fence cipher."""
n = len(ciphertext)
# Build the zigzag pattern
pattern = []
for i in range(n):
row = 0
cycle = 2 * (rails - 1)
pos = i % cycle
row = pos if pos < rails else cycle - pos
pattern.append((row, i))
pattern.sort()
# Fill in characters
result = [''] * n
ci = 0
for _, orig_pos in pattern:
result[orig_pos] = ciphertext[ci]
ci += 1
return ''.join(result)
# Brute force all rail counts
for rails in range(2, 20):
print(f"Rails {rails}: {rail_fence_decrypt(ct, rails)[:50]}")
7.2 Columnar Transposition
def columnar_decrypt(ciphertext, key):
"""Decrypt columnar transposition given key word."""
n_cols = len(key)
n_rows = -(-len(ciphertext) // n_cols) # ceiling division
# Determine column order from key
order = sorted(range(n_cols), key=lambda i: key[i])
# Calculate column lengths (some may be shorter)
full_cols = len(ciphertext) % n_cols
if full_cols == 0:
full_cols = n_cols
# Split ciphertext into columns (in key order)
columns = [''] * n_cols
pos = 0
for col_idx in order:
col_len = n_rows if col_idx < full_cols else n_rows - 1
columns[col_idx] = ciphertext[pos:pos + col_len]
pos += col_len
# Read off row by row
plaintext = ''
for row in range(n_rows):
for col in range(n_cols):
if row < len(columns[col]):
plaintext += columns[col][row]
return plaintext
8. XOR CIPHER
8.1 Single-Byte XOR
See symmetric-cipher-attacks Section 4.2 for full implementation.
8.2 Multi-Byte XOR (xortool)
# Automatic key length detection and cracking
xortool ciphertext.bin -l 5 # try key length 5
xortool ciphertext.bin -b # brute force key length
xortool ciphertext.bin -c 20 # assume most common char is space (0x20)
8.3 Known Plaintext XOR
def xor_known_plaintext(ciphertext, known_plain, offset=0):
"""Recover XOR key from known plaintext at given offset."""
key_fragment = bytes(
c ^ p for c, p in zip(ciphertext[offset:], known_plain)
)
print(f"Key fragment: {key_fragment}")
# If repeating key, infer full key from fragment
return key_fragment
9. SPECIAL CIPHERS
9.1 Bacon Cipher
Binary encoding using two typefaces (A=normal, B=bold/italic).
BACON = {
'AAAAA': 'A', 'AAAAB': 'B', 'AAABA': 'C', 'AAABB': 'D',
'AABAA': 'E', 'AABAB': 'F', 'AABBA': 'G', 'AABBB': 'H',
'ABAAA': 'I', 'ABAAB': 'J', 'ABABA': 'K', 'ABABB': 'L',
'ABBAA': 'M', 'ABBAB': 'N', 'ABBBA': 'O', 'ABBBB': 'P',
'BAAAA': 'Q', 'BAAAB': 'R', 'BAABA': 'S', 'BAABB': 'T',
'BABAA': 'U', 'BABAB': 'V', 'BABBA': 'W', 'BABBB': 'X',
'BAAAA': 'Y', 'BAAAB': 'Z',
}
def decode_bacon(text):
"""Decode Bacon cipher: uppercase=B, lowercase=A (or similar mapping)."""
binary = ''.join('B' if c.isupper() else 'A' for c in text if c.isalpha())
result = ''
for i in range(0, len(binary) - 4, 5):
chunk = binary[i:i+5]
result += BACON.get(chunk, '?')
return result
9.2 Polybius Square
1 2 3 4 5
┌──────────
1 │ A B C D E
2 │ F G H I/J K
3 │ L M N O P
4 │ Q R S T U
5 │ V W X Y Z
"HELLO" = "23 15 31 31 34"
9.3 Playfair
5×5 grid cipher encrypting digraphs.
Key: "MONARCHY" → grid:
M O N A R
C H Y B D
E F G I/J K
L P Q S T
U V W X Z
Rules:
Same row → shift right: HE → FE → "GF"
Same col → shift down
Rectangle → swap columns
10. DECISION TREE
Unknown ciphertext — how to identify and break?
│
├─ Step 1: Check encoding
│ ├─ Base64 alphabet with padding? → Decode first, then re-analyze
│ ├─ Hex string? → Convert to bytes, re-analyze
│ ├─ Binary (01)? → Convert to ASCII
│ ├─ Morse (.-/)? → Decode Morse
│ └─ Printable text? → Continue to Step 2
│
├─ Step 2: Character set
│ ├─ Only letters (A-Z)?
│ │ ├─ Compute IC
│ │ │ ├─ IC ≈ 0.065 → Monoalphabetic
│ │ │ │ ├─ Uniform shift in freq? → Caesar → brute force 25
│ │ │ │ ├─ Random-looking mapping? → Simple substitution → frequency analysis
│ │ │ │ └─ Digraph patterns? → Playfair → digraph analysis
│ │ │ │
│ │ │ ├─ IC ≈ 0.04-0.05 → Polyalphabetic
│ │ │ │ ├─ Kasiski → find key length
│ │ │ │ └─ Per-position frequency → crack Vigenere
│ │ │ │
│ │ │ └─ IC ≈ 0.038 → Very long key or one-time pad
│ │ │ └─ Look for key reuse or weak key generation
│ │ │
│ │ └─ Letters appear scrambled (right freq, wrong order)?
│ │ └─ Transposition
│ │ ├─ Rail fence → brute force rail count
│ │ └─ Columnar → try common key lengths
│ │
│ ├─ Numbers (digit pairs)?
│ │ ├─ Pairs in range 11-55 → Polybius square
│ │ └─ Numbers mod 26 → numeric substitution
│ │
│ ├─ Mixed case with pattern?
│ │ └─ Upper/lower encodes binary → Bacon cipher
│ │
│ └─ Non-printable bytes?
│ └─ XOR cipher
│ ├─ Single-byte key → brute force 256
│ ├─ Repeating key → xortool / Hamming distance
│ └─ Known plaintext → direct key recovery
│
└─ Step 3: Apply specific attack
├─ Substitution → quipqiup.com / frequency analysis
├─ Caesar → dcode.fr / brute force
├─ Vigenere → Kasiski + per-column Caesar
├─ Affine → brute force 312 combinations
├─ Hill → known-plaintext matrix attack
├─ Transposition → pattern analysis + brute force
└─ XOR → xortool / crib dragging
11. TOOLS
| Tool | Purpose | URL/Usage |
|---|---|---|
| CyberChef | Universal encoding/cipher Swiss army knife | gchq.github.io/CyberChef |
| dcode.fr | 200+ cipher solvers online | dcode.fr |
| quipqiup | Automated substitution cipher solver | quipqiup.com |
| xortool | XOR cipher analysis and cracking | pip install xortool |
| RsaCtfTool | RSA + some classical cipher support | GitHub |
| Ciphey | Automated cipher detection and decryption | pip install ciphey |
| hashID | Identify hash types | pip install hashid |
| Python | Custom frequency analysis and scripting | All attacks above |
CyberChef Recipes (Common)
ROT13: ROT13
Caesar brute force: ROT13 (with offset slider)
Base64 decode: From Base64
Hex decode: From Hex
XOR: XOR (key as hex/utf8)
Vigenere: Vigenère Decode
Morse: From Morse Code