algo-rank-bayesian
Bayesian Average Rating
Overview
Bayesian average combines an item's observed average rating with a prior (global average), weighted by review count. Formula: BR = (C × m + Σrᵢ) / (C + n) where m=global mean, C=confidence parameter, n=item reviews, Σrᵢ=sum of item ratings. Items with few reviews are pulled toward the global mean.
When to Use
Trigger conditions:
- Ranking items by continuous ratings (1-5 stars) with varying review counts
- IMDB-style "Top 250" lists that balance quality and popularity
- Any rating aggregation where new items shouldn't dominate with few high ratings
When NOT to use:
- For binary (upvote/downvote) data (use Wilson Score instead)
- When all items have similar review counts (simple average is sufficient)
Algorithm
IRON LAW: The Prior Protects Against Small-Sample Extremes
Without a prior, a single 5-star review makes an item "the best."
The Bayesian average adds C "phantom votes" at the global mean m,
shrinking small-sample items toward average. C controls shrinkage
strength: higher C = more conservative (more phantom votes).
Typical C = median review count across all items.
Phase 1: Input Validation
Compute: global mean rating (m) across all items, choose C (phantom vote count). Collect per item: review count (n), average rating, or sum of ratings. Gate: m computed, C selected, item data available.
Phase 2: Core Algorithm
- Global mean: m = Σ(all ratings) / Σ(all review counts)
- Bayesian average per item: BR = (C × m + n × avg_rating) / (C + n)
- Rank items by BR descending
- For items with n >> C, BR ≈ avg_rating (data dominates). For n << C, BR ≈ m (prior dominates).
Phase 3: Verification
Check: items with very few reviews should be near global mean. Items with many reviews should be near their actual average. Ranking is intuitive. Gate: Shrinkage behavior confirmed, top items have both high ratings AND sufficient reviews.
Phase 4: Output
Return ranked items with Bayesian scores.
Output Format
{
"rankings": [{"item": "Movie_A", "bayesian_avg": 8.7, "raw_avg": 9.1, "reviews": 5000, "shrinkage": 0.04}],
"metadata": {"global_mean": 6.8, "confidence_C": 500, "items_ranked": 10000}
}
Examples
Sample I/O
Input: m=7.0, C=100. Item A: avg=9.5, n=5. Item B: avg=8.5, n=500. Expected: BR_A = (100×7 + 5×9.5)/(105) = 7.12. BR_B = (100×7 + 500×8.5)/(600) = 8.25. B ranks higher.
Edge Cases
| Input | Expected | Why |
|---|---|---|
| n=0 | BR = m (global mean) | No data, fully prior-driven |
| n=100000 | BR ≈ raw average | Massive sample overwhelms prior |
| All items same n | Equivalent to simple average ranking | Uniform shrinkage, ordering preserved |
Gotchas
- C selection is subjective: Common choices: median review count, minimum reviews for "reliable" rating (IMDB uses top 25,000 voters with min votes). No universally correct value.
- Rating scale matters: A 4.0 on a 5-point scale means something different than 4.0 on a 10-point scale. Normalize or use the same scale.
- Category-specific priors: A 4.0 average in "horror movies" might be exceptional, while 4.0 in "Studio Ghibli" might be below average. Consider category-level priors.
- Temporal bias: Old items accumulate reviews. Unless you weight recent reviews more, established items permanently dominate "top" lists.
- Review gaming: Bayesian average doesn't prevent review manipulation — it only mitigates small-sample extremes. Pair with fraud detection.
Scripts
| Script | Description | Usage |
|---|---|---|
scripts/bayesian_avg.py |
Rank items using Bayesian average to handle small-sample extremes | python scripts/bayesian_avg.py --help |
Run python scripts/bayesian_avg.py --verify to execute built-in sanity tests.
References
- For IMDB weighted rating formula, see
references/imdb-formula.md - For multi-dimensional Bayesian rating, see
references/multi-dimensional.md