Opponent Archetype Classifier

Example
Workflow
Common Patterns
Guardrails
Quick Reference

Example

Scenario: Fantasy baseball league, Week 5. Classify opponent "Manager A" into one of six archetypes -- balanced, stars_and_scrubs, punt_sv, punt_sb, punt_wins_qs, hitter_heavy.

Inputs (abbreviated; full taxonomy in resources/template.md):

archetype_taxonomy:
  balanced:          {prior: 0.30, feature_distributions: {sp_roster_share: {mean: 0.40, std: 0.06}, closer_count: {mean: 2.0, std: 0.6}, sb_speed_count: {mean: 3.0, std: 1.0}, moves_per_week: {mean: 2.5, std: 1.0}, bid_aggression: {low: 0.5, high: 0.5}}}
  stars_and_scrubs:  {prior: 0.15, ...}
  punt_sv:           {prior: 0.15, feature_distributions: {closer_count: {mean: 0.3, std: 0.4}, ...}}
  punt_sb:           {prior: 0.15, feature_distributions: {sb_speed_count: {mean: 0.8, std: 0.7}, ...}}
  punt_wins_qs:      {prior: 0.10, feature_distributions: {sp_roster_share: {mean: 0.20, std: 0.05}, closer_count: {mean: 3.0, std: 0.8}, moves_per_week: {mean: 4.0, std: 1.2}, ...}}
  hitter_heavy:      {prior: 0.15, feature_distributions: {sp_roster_share: {mean: 0.28, std: 0.05}, ...}}

observed_features:
  sp_roster_share:   0.22    # low -- thin on starters
  closer_count:      3       # heavy RP
  sb_speed_count:    2       # moderate speed
  moves_per_week:    4.2     # high activity
  bid_aggression:    high    # active FAAB bidder

observation_weight: 0.7        # ~4 weeks of data

Computation (per-feature log-likelihood, summed, exponentiated, multiplied by prior, normalized):

Archetype	log L(features)	L * prior	Normalized Posterior
balanced	-14.8	1.1e-7	0.04
stars_and_scrubs	-12.5	5.6e-7	0.21
punt_sv	-18.2	1.8e-9	0.00
punt_sb	-22.6	2.3e-11	0.00
punt_wins_qs	-10.1	1.8e-6	0.68
hitter_heavy	-13.2	4.2e-7	0.16

Outputs:

posterior:
  balanced: 0.04
  stars_and_scrubs: 0.21
  punt_sv: 0.00
  punt_sb: 0.00
  punt_wins_qs: 0.68
  hitter_heavy: 0.16

map_archetype: punt_wins_qs

classification_confidence: 47.6     # = max_posterior (0.68) * observation_weight (0.7) * 100

best_response_hints:
  - "Concede K and QS; lock 6 of remaining 8 cats"
  - "Don't stream starting pitchers against them"
  - "They will dominate SV and ratios via all-RP staff; push hitting cats hard"

feature_contribution_breakdown:
  sp_roster_share:  {map_likelihood: 0.92, alternative_max: 0.15, likelihood_ratio: 6.1}   # low SP share -- strong punt_wins_qs signal
  closer_count:     {map_likelihood: 0.52, alternative_max: 0.46, likelihood_ratio: 1.1}   # 3 closers fits several archetypes
  sb_speed_count:   {map_likelihood: 0.38, alternative_max: 0.41, likelihood_ratio: 0.9}   # not discriminating
  moves_per_week:   {map_likelihood: 0.33, alternative_max: 0.28, likelihood_ratio: 1.2}
  bid_aggression:   {map_likelihood: 0.70, alternative_max: 0.60, likelihood_ratio: 1.2}

assumptions_flagged:
  - "Conditional independence across features assumed -- sp_roster_share and moves_per_week may be correlated (active punt strategy drives both)"
  - "Feature distributions are SME priors, not empirically fit -- refresh after season 1 with posterior data"

Note on confidence: posterior peaks at 0.68 but observation_weight=0.7 (only 4 weeks of data) dampens confidence to 47.6. Above the 40 threshold, so MAP is reported; but caller is advised that another 2-3 weeks of observation will sharpen the call.

Workflow

Copy this checklist and track progress:

Opponent Archetype Classification Progress:
- [ ] Step 1: Load archetype taxonomy (names, priors, feature distributions)
- [ ] Step 2: Collect observed features for the target opponent
- [ ] Step 3: Compute per-feature likelihood under each archetype
- [ ] Step 4: Combine likelihoods (assume conditional independence; flag it)
- [ ] Step 5: Apply Bayes rule, normalize posterior
- [ ] Step 6: Select MAP archetype; compute confidence
- [ ] Step 7: Check inconclusive threshold; report or defer
- [ ] Step 8: Produce feature-contribution breakdown and best-response hints

Step 1: Load archetype taxonomy

The caller supplies the taxonomy. See resources/template.md for required fields.

Each archetype has a name, a prior probability, and a feature distribution per observable
Priors sum to 1.0 (normalize if not)
Each feature has either a parametric distribution (Gaussian with mean/std) or a categorical (value -> probability)
Each archetype has a documented best_response string array

Step 2: Collect observed features

Observed features must match feature names in the taxonomy. Missing features are dropped (not imputed) and flagged.

Feature names match taxonomy keys exactly
Numeric features in the taxonomy's expected units
Categorical features use the taxonomy's category labels
observation_weight (0-1) supplied; rises with sample size -- see methodology.md

Step 3: Compute per-feature likelihood

For each (archetype, feature) pair compute P(feature_value | archetype). See methodology.md.

Gaussian feature: L = (1/(std*sqrt(2*pi))) * exp(-0.5 * ((x - mean)/std)^2)
Categorical feature: L = P_archetype[category] with Laplace smoothing if zero
Work in log-space (log L) to avoid underflow when combining 5+ features

Step 4: Combine likelihoods (conditional independence)

First-approximation assumption: features are conditionally independent given archetype. This is rarely exactly true; flag it.

Sum log-likelihoods across features per archetype
Flag correlated feature pairs (e.g., sp_roster_share and moves_per_week in fantasy baseball)
If two features are strongly correlated, consider merging them or down-weighting one; document the choice

Step 5: Apply Bayes rule, normalize

posterior_unnorm[a] = exp(sum_log_L[a]) * prior[a]
posterior[a] = posterior_unnorm[a] / sum_a(posterior_unnorm[a])

Compute in log-space for numerical stability, then subtract max before exponentiating
Verify sum(posterior) == 1.0 within rounding
Document any archetype with posterior < 1e-6 as "ruled out"

Step 6: Select MAP archetype; compute confidence

map_archetype = argmax(posterior)
classification_confidence = max(posterior) * observation_weight * 100

MAP is the most likely archetype, not the only possibility
Confidence blends how peaked the posterior is with how much we trust the observation

Step 7: Inconclusive threshold

If classification_confidence < 40:

Return map_archetype: "inconclusive"
Return the full posterior anyway (caller may still act on the top-2)
Advise: "Collect 2-3 more weeks of observation before committing to a classification"

Step 8: Feature-contribution breakdown + best-response hints

For each feature compute likelihood ratio: L(feature | MAP) / max_{a != MAP} L(feature | a)
Pull best_response_hints from the MAP archetype's documented string array
Return alongside the posterior so caller sees why this archetype was selected

Common Patterns

Pattern 1: Fantasy sports (baseball, basketball, hockey) manager archetypes

Taxonomy: 5-8 archetypes like balanced, punt_<cat>, stars_and_scrubs, inactive covering category-league strategy.
Features: roster composition by position, transaction frequency, FAAB aggression, lineup-setting accuracy.
Conditional-independence risk: several features covary (an inactive manager has low moves AND low bids AND stale lineup). Down-weight or collapse.
Observation weight: 0.3 by Week 2, 0.6 by Week 5, 0.85 by Week 10.

Pattern 2: Poker opponent archetypes

Taxonomy: tight_aggressive (TAG), loose_aggressive (LAG), tight_passive (rock), loose_passive (calling station), maniac.
Features: VPIP, PFR, 3-bet %, aggression factor, c-bet frequency, showdown frequency.
Conditional-independence risk: VPIP and PFR are tightly linked; keep both but note the correlation.
Observation weight: rises sharply with hand count -- 0.4 at 100 hands, 0.75 at 500 hands, 0.9 at 2000+.

Pattern 3: DFS lineup-construction archetypes

Taxonomy: cash_game_optimizer, GPP_ceiling_chaser, contrarian_pivot, chalk_herding.
Features: average salary usage, stack count, ownership-vs-projection ratio, tournament vs cash entry split.
Conditional-independence risk: stack count and ownership-vs-projection covary for GPP players.

Pattern 4: M&A / auction bidder archetypes

Taxonomy: strategic_premium_bidder, financial_disciplined_bidder, fishing_expedition, structured_earnout_preferer.
Features: announced bid count per quarter, strategic vs financial press-release language, premium multiple paid, deal structure (cash vs stock vs earnout).
Observation weight: low at any single deal; rises with repeated bidding history.

Guardrails

Conditional independence is almost never exactly true. Always flag it. If two features are strongly correlated (|r| > 0.6), merge them into a single composite feature or down-weight one by 50%. Otherwise the confident archetype gets credited twice for the same underlying signal.
Priors matter when data is thin. Uniform priors are a choice, not a neutral default. Use domain-informed priors when the population distribution is known (e.g., in a 12-team fantasy league, inactive has a real base rate of ~1-2 managers, not 1/12).
Numeric stability: work in log-space. Multiplying 5+ small likelihoods underflows to 0 in float64. Sum log-likelihoods, subtract the max log-posterior before exponentiating, then normalize.
Laplace smoothing for categoricals. If an archetype has P(category=X) = 0 in its distribution and the observation is X, the posterior for that archetype becomes 0 -- permanently ruling it out on one data point. Apply add-epsilon smoothing (epsilon = 0.01 is typical).
Inconclusive is a feature, not a failure. A low-confidence classification is valuable information -- it tells the caller to gather more data before committing. Don't force a MAP when confidence is below 40.
Best-response hints come from the taxonomy, not the classifier. The skill should not invent strategy; it should retrieve what the taxonomy author documented for that archetype. If the taxonomy's best_response is empty, return an empty array and note that the taxonomy needs enrichment.
Posterior is a distribution, not a point estimate. When downstream agents consume the output, they should ideally consume the full posterior (and make expected-value decisions over it) rather than collapsing to MAP. Expose both.
Update sequentially as new observations arrive. The current week's posterior becomes next week's prior. See methodology.md for the recursive formula.
Feature contribution breakdown guards against overfitting. If one feature's likelihood ratio is > 10, that single feature is driving the classification -- verify the feature was measured correctly before trusting the result.
Refresh feature distributions with empirical data once available. Initial distributions are SME priors. After N opponents have been labelled and observed, fit the distributions empirically and replace the SME priors.

Quick Reference

Key formulas:

Gaussian likelihood:
  L(x | a) = (1 / (std_a * sqrt(2*pi))) * exp(-0.5 * ((x - mean_a) / std_a)^2)

Categorical likelihood (with Laplace smoothing, epsilon = 0.01):
  L(x = c | a) = (count_a[c] + epsilon) / (sum_c' count_a[c'] + epsilon * num_categories)

Joint likelihood (conditional independence assumption):
  L(features | a) = prod_f L(feature_f | a)

Log-space joint likelihood:
  log L(features | a) = sum_f log L(feature_f | a)

Bayes posterior:
  posterior(a) proportional to L(features | a) * prior(a)
  posterior(a) = posterior_unnorm(a) / sum_a' posterior_unnorm(a')

MAP selection:
  map_archetype = argmax_a posterior(a)

Classification confidence:
  confidence = max_a posterior(a) * observation_weight * 100
  if confidence < 40:
      map_archetype = "inconclusive"

Sequential update (week t):
  prior_t(a) = posterior_{t-1}(a)
  posterior_t(a) proportional to L(new_features_t | a) * prior_t(a)

Feature contribution (likelihood ratio):
  LR(feature_f) = L(feature_f | MAP) / max_{a != MAP} L(feature_f | a)

Confidence bands:

Confidence	Interpretation	Action
0-39	Inconclusive	Gather more data
40-59	Weak MAP	Treat MAP as tentative; hedge downstream decisions
60-79	Solid MAP	Act on MAP but keep top-2 in mind
80-100	Confident MAP	Commit to MAP

Key resources:

resources/template.md: Archetype taxonomy schema (YAML), observed-features schema, output schema, fully worked 6-archetype fantasy baseball example.
resources/methodology.md: Bayesian classification primer, likelihood function design, characteristic-feature distribution construction (empirical vs SME), conditional independence assumption and when it breaks, sequential posterior updating.
resources/evaluators/rubric_opponent_archetype_classifier.json: 10 quality criteria for evaluating the output.

Inputs required:

archetype_taxonomy: dict of archetype_name -> {prior, feature_distributions, best_response}
observed_features: dict of feature_name -> observed value
observation_weight: 0-1, how much to trust the observation vs the prior
archetype_prior (optional): dict of archetype_name -> prior; defaults to taxonomy priors (or uniform)

Outputs produced:

posterior: dict<archetype, probability>, sums to 1
map_archetype: string, most likely archetype (or "inconclusive")
classification_confidence: 0-100
best_response_hints: string[], pulled from the MAP archetype's documented best-response
feature_contribution_breakdown: dict<feature, {map_likelihood, alternative_max, likelihood_ratio}>
assumptions_flagged: string[], e.g., correlated features, smoothing applied, priors forced uniform