VADER Sentiment Analysis

Overview

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment tool optimized for social media. Returns compound score [-1, +1] combining positive, negative, and neutral proportions. Runs in O(n) per text where n = word count. No training required.

When to Use

Trigger conditions:

Analyzing sentiment in social media posts, tweets, or reviews
Quick sentiment scoring without ML model training
Processing text with slang, emoticons, and informal language

When NOT to use:

For formal/academic text (VADER is tuned for social media)
When domain-specific sentiment matters (e.g., financial sentiment — use FinBERT)
When sarcasm detection is critical (VADER doesn't detect sarcasm)

Algorithm

IRON LAW: VADER Is Designed for SOCIAL MEDIA Text
It handles slang, emoticons, capitalization, and punctuation as
sentiment modifiers. Applying it to formal documents (legal, academic,
medical) produces unreliable scores. For domain-specific text, use
domain-trained models instead.

Phase 1: Input Validation

Tokenize text. Preserve: capitalization (ALL CAPS = emphasis), punctuation (! amplifies), emoticons/emoji. Gate: Text is non-empty, encoding handled correctly.

Phase 2: Core Algorithm

Look up each token in VADER lexicon (7,500+ sentiment-rated terms)
Apply grammatical rules: negation ("not good" = negative), degree modifiers ("very good" > "good"), capitalization boost, punctuation amplification
Compute raw valence scores for positive, negative, neutral proportions
Compute compound score: normalized sum of all valence scores using formula: compound = sum / √(sum² + α) where α = 15

Phase 3: Verification

Classify: compound ≥ 0.05 → positive, ≤ -0.05 → negative, else neutral. Spot-check sample results. Gate: Classifications pass manual spot-check on 10-20 examples.

Phase 4: Output

Return compound score and polarity classification per text.

Output Format

{
  "results": [{"text": "...", "compound": 0.76, "pos": 0.45, "neu": 0.55, "neg": 0.0, "label": "positive"}],
  "metadata": {"texts_analyzed": 500, "distribution": {"positive": 0.45, "neutral": 0.35, "negative": 0.20}}
}

Examples

Sample I/O

Input: "This product is AMAZING!!! 😍" Expected: compound ≈ 0.87 (positive). Boosted by: CAPS, !!!, 😍 emoji.

Edge Cases

Input	Expected	Why
"Not bad at all"	Slightly positive (~0.2)	Double negation partially handled
"😂😂😂"	Positive	Emoji mapped in lexicon
Empty string	Compound = 0, neutral	No tokens to score

Gotchas

Sarcasm is invisible: "Oh great, another meeting" reads as positive. VADER has no sarcasm detection.
Negation window: VADER applies negation within a 3-word window. "I do not think this is bad" may misparse the negation chain.
Emoji coverage: VADER's emoji lexicon may not cover newer emoji. Update or supplement as needed.
Language limitation: VADER is English-only. For Chinese/Japanese, use language-specific tools (e.g., SnowNLP for Chinese).
Compound threshold sensitivity: The 0.05 boundary is arbitrary. Adjust thresholds based on your specific use case and tolerance for false positives.

References

For VADER lexicon and rules documentation, see references/vader-rules.md
For comparison with transformer-based sentiment models, see references/model-comparison.md

algo-social-sentiment