algo-rec-session
Session-Based Recommendation
Overview
Session-based recommendation predicts the next item a user will interact with based on their current session's click/view sequence, without relying on long-term user profiles. Uses Markov chains, association rules, or neural approaches (GRU4Rec). Operates in real-time with O(sequence_length) inference.
When to Use
Trigger conditions:
- Anonymous users (no login, no long-term profile)
- Short browsing sessions where recency matters most
- Real-time "next item" prediction during active sessions
When NOT to use:
- When rich user history is available (use CF or content-based for better personalization)
- When sessions are extremely short (1-2 clicks) — insufficient signal
Algorithm
IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.
Phase 1: Input Validation
Parse clickstream into sessions (by session ID or timeout-based splitting, typically 30min inactivity). Filter sessions below minimum length (3+ events). Gate: Sessions parsed, minimum length threshold applied.
Phase 2: Core Algorithm
Markov Chain approach:
- Build transition matrix from item-to-item sequences across all sessions
- For current session [A, B, C], predict next item from P(next | C) or higher-order P(next | B, C)
Association Rules approach:
- Mine frequent item sequences (sequential pattern mining)
- Match current session suffix against known patterns
- Recommend items that frequently follow the matched pattern
Phase 3: Verification
Evaluate with leave-one-out: hide last item in each session, predict, check hit rate and MRR (Mean Reciprocal Rank). Gate: Hit@20 significantly above random baseline.
Phase 4: Output
Return ranked next-item predictions with confidence scores.
Output Format
{
"predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
"session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
"metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}
Examples
Sample I/O
Input: Session: [shoes_page, running_shoes, nike_air_max] Expected: Recommend: nike_air_zoom (0.72), adidas_ultraboost (0.58), shoe_size_guide (0.41)
Edge Cases
| Input | Expected | Why |
|---|---|---|
| Session length = 1 | Popularity fallback | Single click insufficient for sequence pattern |
| Repeated item views | Weight recency, not count | User may be comparing, not broadening |
| Session intent shift | Adapt to latest clicks | User changed their goal mid-session |
Gotchas
- Session definition matters: 30-minute timeout is conventional but arbitrary. E-commerce may need shorter (15min); research browsing may need longer (60min).
- Position bias: Users click top results more. Session data reflects UI position, not just preference. Correct for position bias.
- Repeat recommendations: Users often revisit items. Distinguish "recommend something new" from "remind of previously viewed."
- Cold start for new items: Items with zero prior session appearances can't be predicted by transition matrices. Mix in feature-based candidates.
- Computational efficiency: For real-time inference, pre-compute transition probabilities. Recomputing per-request at scale is too slow.
References
- For GRU4Rec neural session model, see
references/gru4rec.md - For session splitting heuristics, see
references/session-splitting.md