product-antifraud

SKILL.md

Product Antifraud -- Log-Based Fraud Detection

Aspect Detail
Purpose Rule-based fraud detection from application logs (registration + auth flows)
Approach Pure Python + pandas -- counting, grouping, threshold problems
Not for ML classification -- use at moderate volumes (~50K entries/day)
Outputs Markdown report + CSV alerts for security teams

When to Use This Skill

Task This Skill Applies
Building fraud detection scripts for registration or auth logs Yes
Analyzing K8s application logs for suspicious behavioral patterns Yes
Detecting bots, credential stuffing, or velocity abuse from structured logs Yes
Auditing logs for GDPR PII exposure (unmasked emails, phones, names) Yes
Designing tunable threshold-based rule engines with JSON config Yes
Reviewing or extending existing antifraud detection rules Yes
Building fraud alerting reports (Markdown + CSV) for security teams Yes
ML-based fraud scoring (real-time model inference) No -- use ai-ml-data-science
Application security hardening (OWASP, auth implementation) No -- use software-security-appsec
Infrastructure log analysis (access logs, firewall, WAF) No -- use ops-devops-platform
Real-time streaming fraud detection No -- use data-lake-platform

Quick-Start Checklist

Step Action Notes
1 Identify log type Registration (.txt.gz/.debug.gz) or auth (.log/.log.gz)
2 Create directory structure config/, reports/, script file
3 Build LogParser Correct timestamp format: , for registration ms, . for auth ms
4 Implement SessionAggregator pandas groupby for key dimensions (token, IP, device, email)
5 Create JSON config Default thresholds (see Configuration Pattern below)
6 Implement velocity rules R1-R12 or A1-A13 -- highest signal-to-noise ratio
7 Add bot detection rules R13-R17 or A14-A19
8 Add behavioral analysis rules R18-R22 or A20-A25
9 Enable PIIScanner GDPR compliance pass on all log lines
10 Test in discover mode --mode discover against example data
11 Tune thresholds Reduce false positives, verify known fraud patterns surface
12 Cross-node correlation Merge by token/session_id before aggregation

Quick Reference

Architecture (4 Layers)

Every fraud detection script follows this pattern:

Log Files (.gz, .log)
    |
    v
[1] LogFileReader       -- Walk dirs, handle .gz decompression, iterate lines
    |
    v
[2] LogParser           -- Regex extraction -> dataclass (RegistrationEvent / AuthEvent)
    |
    v
[3] SessionAggregator   -- Group by token/IP/device/email, compute features
    |
    v
[4] RuleEngine + Report -- Evaluate rules, produce Markdown + CSV alerts

Detection Rule Categories

Category Registration (R) Authentication (A) Reference
Fraud velocity R1-R12 A1-A13 references/registration-fraud-rules.md, references/auth-fraud-rules.md
Bot vs human R13-R17 A14-A19 references/bot-detection-patterns.md
Behavioral analysis R18-R22 A20-A25 references/behavioral-analysis-rules.md
GDPR PII scanning Both scripts Both scripts references/gdpr-pii-scanning.md

Rule Severity Quick Map

Severity Registration Examples Auth Examples
CRITICAL JNDI injection (R10), national ID exposure Personal data API response leaks
HIGH Email/device velocity (R1-R2), IP hopping (R6) Brute force (A1), credential stuffing (A2), session hijack (A4)
MEDIUM Partial phone masking, confirmation brute force (R8) Captcha trigger rate (A8), off-hours surge (A10)
LOW Sequential email patterns (R17) Auth method escalation (A21)

Decision Tree

New fraud detection task:
    |
    +-- Registration logs?
    |   +-- .txt.gz / .debug.gz format?
    |   |   -> Use RegistrationEvent parser (references/log-parser-architecture.md)
    |   +-- What signals available?
    |       +-- Token, IP, DeviceSerial, Email, Phone -> R1-R12 velocity rules
    |       +-- Timing data -> R13-R15 bot detection
    |       +-- Platform field -> R12, R16 device fingerprinting
    |
    +-- Authentication logs?
    |   +-- .log / .log.gz format?
    |   |   -> Use AuthEvent parser (references/log-parser-architecture.md)
    |   +-- What signals available?
    |       +-- user_id, IP, device_id -> A1-A6 velocity rules
    |       +-- Fraud check weights -> A5, A11 risk scoring
    |       +-- Country field -> A4, A20 impossible travel
    |       +-- Auth type field -> A12, A21 method switching
    |
    +-- GDPR compliance audit?
        -> Run PIIScanner pass on both log types
        -> See references/gdpr-pii-scanning.md

CLI Interface Pattern

# Discover mode: analyze example logs, output pattern statistics
python registration_fraud.py examples/epa-registration/ --mode discover --output reports/

# Detect mode: apply rules to new logs, generate alerts
python registration_fraud.py /path/to/new-day-logs/ \
  --config config/registration_rules.json --output reports/

# Auth fraud (same pattern)
python auth_fraud.py examples/epa-identity-auth-publicapi/ --mode discover --output reports/

Output Format

Output Filename Contents
Markdown report report_YYYYMMDD_HHMMSS.md Summary table, severity breakdown, detailed alerts with log line evidence
CSV export alerts_YYYYMMDD_HHMMSS.csv One row per alert, importable into SIEM/ticketing

Tech Stack

Component Tool Notes
Runtime Python 3.10+ Standard library: re, gzip, json, csv, argparse, dataclasses, collections, datetime, pathlib, statistics
Data analysis pandas Time-window grouping and aggregation (only pip dependency)
Report formatting tabulate (optional) Pretty markdown tables

Configuration Pattern

Rules use external JSON configs for tunable thresholds (no code changes needed):

{
  "gdpr_pii_scanner": {
    "enabled": true,
    "check_emails": true,
    "check_phones": true,
    "check_names": true,
    "check_national_ids": true
  },
  "bot_detection": {
    "timing_variance_threshold_ms": 50,
    "min_human_step_interval_seconds": 2,
    "known_emulator_serials": ["000000000000000", "emulator-5554"],
    "scripting_user_agents": ["python-requests", "curl", "Go-http-client"]
  },
  "behavioral": {
    "impossible_travel_speed_kmh": 900,
    "burst_silence_ratio_threshold": 5.0,
    "session_abandonment_rate_threshold": 0.8
  }
}

Common Anti-Patterns

Anti-Pattern Why It Fails Instead
Hardcoded thresholds in code Cannot tune without redeployment External JSON config per rule
Single-dimension rules only Easy to evade by changing one variable Cross-correlate IP + device + email + timing
No deduplication Duplicate log lines inflate counts Deduplicate by (timestamp, request_id, message hash)
Ignoring multi-line entries Auth logs have stack traces across lines Parser must detect continuation lines
Treating all timestamps alike Registration uses , for ms; auth uses . Normalize timestamp parsing per log type
Cross-node blind spots Same session spans K8s nodes Merge by token/session_id before aggregation
PII in fraud reports GDPR violation in the detection output itself Mask PII in report output, reference by hash/ID

Known Challenges

Challenge Impact Mitigation
Multi-line log entries Auth logs have stack traces across lines Detect continuation lines (leading whitespace, at , Caused by:)
Duplicate log lines Registration logs inflate counts Deduplicate by (timestamp, request_id, message hash)
Masked data (***MASKED***) Auth logs limit email/phone correlation IP/device/user_id analysis still works
Different timestamp formats Registration , for ms; auth . for ms Normalize parsing per log type
Cross-node correlation Same session spans K8s nodes Merge by token/session_id before aggregation
Internal scanner noise Qualys scanner IP 10.7.2.171 triggers R10 Flag but annotate as likely internal scan

Trend Awareness Protocol

When users ask about current fraud detection approaches, search before answering:

# Search Query Domain
1 "fintech fraud detection patterns 2026" Fraud patterns
2 "application log fraud analysis tools 2026" Tooling
3 "GDPR log compliance requirements 2026" Compliance
4 "bot detection registration abuse 2026" Bot detection

Navigation

Reference Guides

File Coverage
references/registration-fraud-rules.md Registration fraud rules R1-R12: thresholds, signals, detection logic
references/auth-fraud-rules.md Auth fraud rules A1-A13: thresholds, signals, detection logic
references/bot-detection-patterns.md Bot vs human: timing analysis, UA fingerprinting, speed checks, emulators
references/behavioral-analysis-rules.md Behavioral: impossible travel, session abandonment, burst-then-silence
references/gdpr-pii-scanning.md GDPR PII scanner: regex patterns, severity levels, config, report format
references/log-parser-architecture.md 4-layer architecture: LogFileReader, LogParser, SessionAggregator, RuleEngine
data/sources.json 18 curated antifraud, OWASP, GDPR, and log analysis resources

Related Skills

Skill Use For
software-security-appsec Application security patterns, OWASP Top 10
ai-ml-data-science ML-based fraud classification (when rule-based is insufficient)
data-analytics-engineering Data pipeline patterns for log aggregation
qa-observability Observability, structured logging, SIEM integration
Weekly Installs
14
GitHub Stars
41
First Seen
Feb 25, 2026
Installed on
codex14
opencode13
gemini-cli13
github-copilot13
cursor13
amp12