Style Analyzer

Overview

Extract a structured, quantitative writing style profile from text samples. The profile captures vocabulary patterns, sentence structure, tone markers, and formatting preferences as measurable metrics that can constrain downstream content generation.

Core principle: Style is measurable. Every writer produces a statistical fingerprint across vocabulary, syntax, and formatting dimensions.

When to Use

Before generating content that should match a specific author's voice
When building a reusable style profile for brand voice consistency
When comparing writing styles across multiple authors or documents
When auditing content for style drift from an established voice

Workflow

Step 1: Ingest Text Samples

Accept text from one or more sources:

# Analyze a single file
node .claude/tools/cli/style-profiler.cjs /path/to/sample.txt

# Analyze a directory of text files
node .claude/tools/cli/style-profiler.cjs /path/to/samples/

# Output is written to .claude/context/data/user-style-profile.json

Alternatively, read files with the Read tool and analyze inline for smaller samples.

Minimum input requirements:

At least 3 text samples for reliable metrics
Each sample should be 200+ words
Samples should represent the author's typical output (not edge cases)

Step 2: Extract Vocabulary Patterns

Analyze word usage across all samples:

Tokenize all text into words (lowercase, strip punctuation)
Remove stop words (the, a, is, are, etc.)
Compute frequency distribution of remaining words
Extract top 50 vocabulary (most frequently used content words)
Calculate type-token ratio (unique words / total words) as vocabulary richness score
Identify signature phrases (2-3 word combinations that appear 3+ times)

Step 3: Analyze Sentence Structure

Measure syntactic patterns:

Average sentence length (words per sentence)
Sentence length variance (standard deviation)
Short sentence ratio (sentences under 8 words / total sentences)
Long sentence ratio (sentences over 25 words / total sentences)
Question frequency (questions / total sentences)
Clause complexity (average commas per sentence as a proxy)

Step 4: Identify Tone Markers

Score the writing on five dimensions (each 1.0 to 5.0):

Dimension	1.0 (Low)	5.0 (High)	How Measured
Formality	Very formal	Very casual	Contraction frequency, slang usage, punctuation
Directness	Hedged	Blunt	Hedge word frequency (maybe, perhaps, somewhat)
Emotion	Neutral	Expressive	Exclamation marks, emotional adjectives
Humor	Serious	Playful	Parenthetical asides, informal interjections
Authority	Tentative	Authoritative	Imperative sentences, certainty language

Step 5: Detect Formatting Preferences

Analyze structural patterns:

Average paragraph length (sentences per paragraph)
Heading depth (H1 only, H1-H2, H1-H3, etc.)
List frequency (bulleted/numbered lists per 1000 words)
Code block frequency (code blocks per 1000 words, if technical)
Bold/italic usage (emphasis markers per 1000 words)
Punctuation signature (em-dash frequency, semicolon frequency, ellipsis frequency)

Step 6: Build Style Profile JSON

Assemble all metrics into a structured profile:

{
  "version": "1.0.0",
  "createdAt": "2026-03-21T00:00:00Z",
  "sampleCount": 5,
  "totalWords": 12450,
  "vocabulary": {
    "topWords": ["specific", "pattern", "implementation", "..."],
    "typeTokenRatio": 0.42,
    "signaturePhrases": ["in practice", "the key insight"]
  },
  "sentenceStructure": {
    "avgLength": 18.3,
    "lengthVariance": 7.2,
    "shortSentenceRatio": 0.15,
    "longSentenceRatio": 0.22,
    "questionFrequency": 0.08,
    "avgCommasPerSentence": 1.4
  },
  "tone": {
    "formality": 2.8,
    "directness": 4.1,
    "emotion": 2.0,
    "humor": 1.5,
    "authority": 3.8
  },
  "formatting": {
    "avgParagraphLength": 3.2,
    "headingDepth": 3,
    "listFrequencyPer1000": 2.1,
    "codeBlockFrequencyPer1000": 0.8,
    "emphasisFrequencyPer1000": 4.5,
    "punctuation": {
      "emDashFrequency": 0.03,
      "semicolonFrequency": 0.01,
      "ellipsisFrequency": 0.005,
      "exclamationFrequency": 0.02
    }
  }
}

Save to .claude/context/data/user-style-profile.json.

Iron Laws

ALWAYS require minimum 3 text samples before computing a style profile -- fewer samples produce unreliable metrics that do not generalize.
NEVER include stop words in the top-50 vocabulary list -- stop words are universal and carry no style signal.
ALWAYS save the profile to .claude/context/data/user-style-profile.json -- downstream skills depend on this exact path.
NEVER treat a single metric in isolation as the style fingerprint -- style is the combination of all dimensions; individual metrics can coincide across very different authors.
ALWAYS include the sampleCount and totalWords fields in the profile -- consumers need to assess the statistical reliability of the metrics.

Anti-Patterns

Anti-Pattern	Why It Fails	Correct Approach
Analyzing a single short sample	Insufficient data; metrics reflect one mood, not style	Require 3+ samples of 200+ words each
Including stop words in vocabulary metrics	Universal words add noise; no discriminative power	Filter all stop words before frequency analysis
Outputting profile to a non-standard path	Downstream skills cannot find the profile	Always write to `.claude/context/data/user-style-profile.json`
Treating tone as a single number	Tone is multidimensional; a single score loses nuance	Score all five dimensions independently
Skipping formatting analysis	Style includes structure, not just words	Always analyze paragraph length, heading depth, list usage

Assigned Agents

This skill is used by:

voice-replicator-agent -- Primary consumer for style-constrained content generation

Memory Protocol (MANDATORY)

Before starting:

node .claude/lib/memory/memory-search.cjs "style analysis text profiling"

Read .claude/context/memory/learnings.md

After completing:

New style pattern discovered -> .claude/context/memory/learnings.md
Issue with analysis -> .claude/context/memory/issues.md
Decision about metrics -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

style-analyzer