product-antifraud
SKILL.md
Product Antifraud -- Log-Based Fraud Detection
| Aspect | Detail |
|---|---|
| Purpose | Rule-based fraud detection from application logs (registration + auth flows) |
| Approach | Pure Python + pandas -- counting, grouping, threshold problems |
| Not for | ML classification -- use at moderate volumes (~50K entries/day) |
| Outputs | Markdown report + CSV alerts for security teams |
When to Use This Skill
| Task | This Skill Applies |
|---|---|
| Building fraud detection scripts for registration or auth logs | Yes |
| Analyzing K8s application logs for suspicious behavioral patterns | Yes |
| Detecting bots, credential stuffing, or velocity abuse from structured logs | Yes |
| Auditing logs for GDPR PII exposure (unmasked emails, phones, names) | Yes |
| Designing tunable threshold-based rule engines with JSON config | Yes |
| Reviewing or extending existing antifraud detection rules | Yes |
| Building fraud alerting reports (Markdown + CSV) for security teams | Yes |
| ML-based fraud scoring (real-time model inference) | No -- use ai-ml-data-science |
| Application security hardening (OWASP, auth implementation) | No -- use software-security-appsec |
| Infrastructure log analysis (access logs, firewall, WAF) | No -- use ops-devops-platform |
| Real-time streaming fraud detection | No -- use data-lake-platform |
Quick-Start Checklist
| Step | Action | Notes |
|---|---|---|
| 1 | Identify log type | Registration (.txt.gz/.debug.gz) or auth (.log/.log.gz) |
| 2 | Create directory structure | config/, reports/, script file |
| 3 | Build LogParser | Correct timestamp format: , for registration ms, . for auth ms |
| 4 | Implement SessionAggregator | pandas groupby for key dimensions (token, IP, device, email) |
| 5 | Create JSON config | Default thresholds (see Configuration Pattern below) |
| 6 | Implement velocity rules | R1-R12 or A1-A13 -- highest signal-to-noise ratio |
| 7 | Add bot detection rules | R13-R17 or A14-A19 |
| 8 | Add behavioral analysis rules | R18-R22 or A20-A25 |
| 9 | Enable PIIScanner | GDPR compliance pass on all log lines |
| 10 | Test in discover mode | --mode discover against example data |
| 11 | Tune thresholds | Reduce false positives, verify known fraud patterns surface |
| 12 | Cross-node correlation | Merge by token/session_id before aggregation |
Quick Reference
Architecture (4 Layers)
Every fraud detection script follows this pattern:
Log Files (.gz, .log)
|
v
[1] LogFileReader -- Walk dirs, handle .gz decompression, iterate lines
|
v
[2] LogParser -- Regex extraction -> dataclass (RegistrationEvent / AuthEvent)
|
v
[3] SessionAggregator -- Group by token/IP/device/email, compute features
|
v
[4] RuleEngine + Report -- Evaluate rules, produce Markdown + CSV alerts
Detection Rule Categories
| Category | Registration (R) | Authentication (A) | Reference |
|---|---|---|---|
| Fraud velocity | R1-R12 | A1-A13 | references/registration-fraud-rules.md, references/auth-fraud-rules.md |
| Bot vs human | R13-R17 | A14-A19 | references/bot-detection-patterns.md |
| Behavioral analysis | R18-R22 | A20-A25 | references/behavioral-analysis-rules.md |
| GDPR PII scanning | Both scripts | Both scripts | references/gdpr-pii-scanning.md |
Rule Severity Quick Map
| Severity | Registration Examples | Auth Examples |
|---|---|---|
| CRITICAL | JNDI injection (R10), national ID exposure | Personal data API response leaks |
| HIGH | Email/device velocity (R1-R2), IP hopping (R6) | Brute force (A1), credential stuffing (A2), session hijack (A4) |
| MEDIUM | Partial phone masking, confirmation brute force (R8) | Captcha trigger rate (A8), off-hours surge (A10) |
| LOW | Sequential email patterns (R17) | Auth method escalation (A21) |
Decision Tree
New fraud detection task:
|
+-- Registration logs?
| +-- .txt.gz / .debug.gz format?
| | -> Use RegistrationEvent parser (references/log-parser-architecture.md)
| +-- What signals available?
| +-- Token, IP, DeviceSerial, Email, Phone -> R1-R12 velocity rules
| +-- Timing data -> R13-R15 bot detection
| +-- Platform field -> R12, R16 device fingerprinting
|
+-- Authentication logs?
| +-- .log / .log.gz format?
| | -> Use AuthEvent parser (references/log-parser-architecture.md)
| +-- What signals available?
| +-- user_id, IP, device_id -> A1-A6 velocity rules
| +-- Fraud check weights -> A5, A11 risk scoring
| +-- Country field -> A4, A20 impossible travel
| +-- Auth type field -> A12, A21 method switching
|
+-- GDPR compliance audit?
-> Run PIIScanner pass on both log types
-> See references/gdpr-pii-scanning.md
CLI Interface Pattern
# Discover mode: analyze example logs, output pattern statistics
python registration_fraud.py examples/epa-registration/ --mode discover --output reports/
# Detect mode: apply rules to new logs, generate alerts
python registration_fraud.py /path/to/new-day-logs/ \
--config config/registration_rules.json --output reports/
# Auth fraud (same pattern)
python auth_fraud.py examples/epa-identity-auth-publicapi/ --mode discover --output reports/
Output Format
| Output | Filename | Contents |
|---|---|---|
| Markdown report | report_YYYYMMDD_HHMMSS.md |
Summary table, severity breakdown, detailed alerts with log line evidence |
| CSV export | alerts_YYYYMMDD_HHMMSS.csv |
One row per alert, importable into SIEM/ticketing |
Tech Stack
| Component | Tool | Notes |
|---|---|---|
| Runtime | Python 3.10+ | Standard library: re, gzip, json, csv, argparse, dataclasses, collections, datetime, pathlib, statistics |
| Data analysis | pandas | Time-window grouping and aggregation (only pip dependency) |
| Report formatting | tabulate (optional) | Pretty markdown tables |
Configuration Pattern
Rules use external JSON configs for tunable thresholds (no code changes needed):
{
"gdpr_pii_scanner": {
"enabled": true,
"check_emails": true,
"check_phones": true,
"check_names": true,
"check_national_ids": true
},
"bot_detection": {
"timing_variance_threshold_ms": 50,
"min_human_step_interval_seconds": 2,
"known_emulator_serials": ["000000000000000", "emulator-5554"],
"scripting_user_agents": ["python-requests", "curl", "Go-http-client"]
},
"behavioral": {
"impossible_travel_speed_kmh": 900,
"burst_silence_ratio_threshold": 5.0,
"session_abandonment_rate_threshold": 0.8
}
}
Common Anti-Patterns
| Anti-Pattern | Why It Fails | Instead |
|---|---|---|
| Hardcoded thresholds in code | Cannot tune without redeployment | External JSON config per rule |
| Single-dimension rules only | Easy to evade by changing one variable | Cross-correlate IP + device + email + timing |
| No deduplication | Duplicate log lines inflate counts | Deduplicate by (timestamp, request_id, message hash) |
| Ignoring multi-line entries | Auth logs have stack traces across lines | Parser must detect continuation lines |
| Treating all timestamps alike | Registration uses , for ms; auth uses . |
Normalize timestamp parsing per log type |
| Cross-node blind spots | Same session spans K8s nodes | Merge by token/session_id before aggregation |
| PII in fraud reports | GDPR violation in the detection output itself | Mask PII in report output, reference by hash/ID |
Known Challenges
| Challenge | Impact | Mitigation |
|---|---|---|
| Multi-line log entries | Auth logs have stack traces across lines | Detect continuation lines (leading whitespace, at , Caused by:) |
| Duplicate log lines | Registration logs inflate counts | Deduplicate by (timestamp, request_id, message hash) |
Masked data (***MASKED***) |
Auth logs limit email/phone correlation | IP/device/user_id analysis still works |
| Different timestamp formats | Registration , for ms; auth . for ms |
Normalize parsing per log type |
| Cross-node correlation | Same session spans K8s nodes | Merge by token/session_id before aggregation |
| Internal scanner noise | Qualys scanner IP 10.7.2.171 triggers R10 |
Flag but annotate as likely internal scan |
Trend Awareness Protocol
When users ask about current fraud detection approaches, search before answering:
| # | Search Query | Domain |
|---|---|---|
| 1 | "fintech fraud detection patterns 2026" |
Fraud patterns |
| 2 | "application log fraud analysis tools 2026" |
Tooling |
| 3 | "GDPR log compliance requirements 2026" |
Compliance |
| 4 | "bot detection registration abuse 2026" |
Bot detection |
Navigation
Reference Guides
| File | Coverage |
|---|---|
| references/registration-fraud-rules.md | Registration fraud rules R1-R12: thresholds, signals, detection logic |
| references/auth-fraud-rules.md | Auth fraud rules A1-A13: thresholds, signals, detection logic |
| references/bot-detection-patterns.md | Bot vs human: timing analysis, UA fingerprinting, speed checks, emulators |
| references/behavioral-analysis-rules.md | Behavioral: impossible travel, session abandonment, burst-then-silence |
| references/gdpr-pii-scanning.md | GDPR PII scanner: regex patterns, severity levels, config, report format |
| references/log-parser-architecture.md | 4-layer architecture: LogFileReader, LogParser, SessionAggregator, RuleEngine |
| data/sources.json | 18 curated antifraud, OWASP, GDPR, and log analysis resources |
Related Skills
| Skill | Use For |
|---|---|
| software-security-appsec | Application security patterns, OWASP Top 10 |
| ai-ml-data-science | ML-based fraud classification (when rule-based is insufficient) |
| data-analytics-engineering | Data pipeline patterns for log aggregation |
| qa-observability | Observability, structured logging, SIEM integration |
Weekly Installs
14
Repository
vasilyu1983/ai-…s-publicGitHub Stars
41
First Seen
Feb 25, 2026
Security Audits
Installed on
codex14
opencode13
gemini-cli13
github-copilot13
cursor13
amp12