Extracting Session Data Skill

Core Responsibility

Provide raw access to Claude Code session logs stored in ~/.claude/projects/{project-dir}/{session-id}.jsonl.

Key Principle: This skill extracts data only - return raw data to calling skills for analysis. Do not analyze or interpret within this skill.

Available Scripts

All scripts located in scripts/ subdirectory relative to this skill.

1. locate-logs.sh

Find log directory or specific session file path.

# Get logs directory for current working directory
scripts/locate-logs.sh

# Get logs directory for specific project
scripts/locate-logs.sh /path/to/project

# Get specific session log file path
scripts/locate-logs.sh /path/to/project abc123-session-id

Use when: Building dynamic paths, verifying logs exist before processing.

2. list-sessions.sh

Enumerate all sessions with metadata (ID, size, lines, date, branch).

# List all sessions (table format)
scripts/list-sessions.sh

# JSON output
scripts/list-sessions.sh --format json

# Sort by size or lines
scripts/list-sessions.sh --sort size
scripts/list-sessions.sh --sort lines

# Specific project
scripts/list-sessions.sh /path/to/project

Output formats: table, json, csv Sort options: date, size, lines

Use when: Starting retrospective, showing available sessions to user, checking for recent sessions.

3. extract-data.sh

Parse JSONL logs and extract specific data types.

Available extraction types:

metadata - Session info (ID, timestamps, branch, working dir)
user-prompts - All user messages
tool-usage - Tool call statistics
errors - Failed tool calls with timestamps
thinking - Thinking blocks (if extended thinking enabled)
text-responses - Assistant text responses only
statistics - Session metrics (message counts, tool calls, errors)
all - Combined extraction

# Extract from specific session
scripts/extract-data.sh --type statistics --session SESSION_ID
scripts/extract-data.sh --type errors --session SESSION_ID
scripts/extract-data.sh --type tool-usage --session SESSION_ID

# Extract from all sessions (omit --session)
scripts/extract-data.sh --type statistics

# Limit output
scripts/extract-data.sh --type user-prompts --limit 10

# Different project
scripts/extract-data.sh --type metadata --project /path/to/project

Use when: Need specific data without loading entire log, generating metrics, identifying errors.

4. filter-sessions.sh

Find sessions matching criteria.

Filter options:

--since DATE - Sessions modified since date ("2 days ago", "2025-10-20")
--until DATE - Sessions modified until date
--branch NAME - Sessions on specific git branch
--min-size SIZE - Minimum file size ("1M", "500K")
--max-size SIZE - Maximum file size
--min-lines N - Minimum line count
--max-lines N - Maximum line count
--has-errors - Only sessions with failed tool calls
--keyword WORD - Sessions containing keyword

Output formats: list, paths, json

# Recent sessions
scripts/filter-sessions.sh --since "2 days ago"

# Large sessions with errors
scripts/filter-sessions.sh --min-lines 500 --has-errors

# Sessions on main branch in last week
scripts/filter-sessions.sh --branch main --since "7 days ago"

# Sessions containing keyword
scripts/filter-sessions.sh --keyword "authentication"

# Get paths only (for piping)
scripts/filter-sessions.sh --since "1 day ago" --format paths

Use when: User requests analysis of recent sessions, finding sessions for specific feature/branch, identifying problematic sessions.

Working Process

Single Session Analysis

# 1. Verify session exists and get metadata
scripts/extract-data.sh --type metadata --session SESSION_ID

# 2. Get session statistics (to determine size)
scripts/extract-data.sh --type statistics --session SESSION_ID

# 3. Extract specific data as needed
scripts/extract-data.sh --type errors --session SESSION_ID
scripts/extract-data.sh --type tool-usage --session SESSION_ID

Multiple Session Analysis

# 1. Filter to find relevant sessions
scripts/filter-sessions.sh --since "7 days ago" --branch main

# 2. Extract data from all filtered sessions
scripts/extract-data.sh --type statistics

# 3. Or iterate through filtered subset
SESSIONS=$(scripts/filter-sessions.sh --has-errors --format paths)
for session in $SESSIONS; do
    SESSION_ID=$(basename "$session" .jsonl)
    scripts/extract-data.sh --type errors --session $SESSION_ID
done

Integration Pattern for Calling Skills

When another skill (like retrospecting) needs session data:

Discovery: Use list-sessions.sh or filter-sessions.sh to find relevant sessions
Size Check: Use extract-data.sh --type statistics to determine session complexity
Targeted Extraction: Use extract-data.sh with specific types for needed data
Return Raw Data: Return extracted data to caller for analysis

Example:

# Get latest session ID
LATEST=$(scripts/list-sessions.sh --format json --sort date | jq -r '.[0].sessionId')

# Check size before processing
STATS=$(scripts/extract-data.sh --type statistics --session $LATEST)
LINE_COUNT=$(echo "$STATS" | grep "Total Lines:" | awk '{print $3}')

# Extract based on size
if [ "$LINE_COUNT" -lt 500 ]; then
    # Small session: extract detail
    scripts/extract-data.sh --type errors --session $LATEST
    scripts/extract-data.sh --type tool-usage --session $LATEST
else
    # Large session: summary only
    scripts/extract-data.sh --type statistics --session $LATEST
fi

Context Budget Management

CRITICAL: This skill is designed for context efficiency

Use Bash Processing, Not Read Tool

# GOOD: Extract via bash, stays in bash context
STATS=$(scripts/extract-data.sh --type statistics)
# Process $STATS in bash

# BAD: Reading full log files
Read ~/.claude/projects/-path/session.jsonl
# Loads entire file into context unnecessarily

Check Session Size Before Loading

Never load full session logs into context without checking size first.

# Always check statistics first
scripts/extract-data.sh --type statistics --session SESSION_ID
# Shows total lines, message counts, etc.

# Decision rules:
# - Small (<500 lines): Can extract detail safely
# - Medium (500-2000 lines): Use selective extraction
# - Large (>2000 lines): Statistics only, offer targeted deep-dives

Return Raw Data to Caller

This skill should:

Execute bash scripts to extract data
Return raw text output to calling skill
Let calling skill manage context for analysis
Avoid interpretation or analysis within this skill

Output Format

Return raw extracted data with minimal formatting:

# Statistics output
Session: abc123-def456-ghi789
  Total Lines: 450
  User Messages: 12
  Assistant Messages: 23
  Tool Calls: 45
  Errors: 2

# Tool usage output
=== Tool Usage: abc123-def456-ghi789 ===
Read                          15
Bash                          12
Edit                          8
Grep                          5
Write                         3

No analysis, no interpretation - just data extraction.

Error Handling

All scripts exit with non-zero status on errors and output to stderr.

Check exit status before processing:

if ! scripts/locate-logs.sh /path/to/project &>/dev/null; then
    # Handle: logs directory doesn't exist
    echo "Project has no session logs yet"
fi

if ! scripts/extract-data.sh --type metadata --session abc123 &>/dev/null; then
    # Handle: session doesn't exist
    echo "Session not found"
fi

Common error messages:

Error: Logs directory not found: ~/.claude/projects/-path
Error: Session file not found: ~/.claude/projects/-path/session-id.jsonl
Error: --type is required
Error: jq is required but not installed. Install with: brew install jq

Path Calculation

Claude Code stores sessions using this pattern:

~/.claude/projects/{project-identifier}/{session-id}.jsonl

Where {project-identifier} is calculated by replacing all / with - in the absolute working directory path:

# Example: /Users/user/project → -Users-user-project
PROJECT_ID=$(echo "${PWD}" | sed 's/\//\-/g')
LOGS_DIR="${HOME}/.claude/projects/${PROJECT_ID}"

All scripts use locate-logs.sh internally for consistent path calculation.

Anti-Patterns to Avoid

Don't:

Load full session logs into context without checking size
Parse JSONL manually - use extract-data.sh
Hardcode log paths - use locate-logs.sh
Analyze or interpret data - return raw data to caller
Process large logs synchronously without user awareness

Do:

Check session size with --type statistics before processing
Use appropriate extraction type for specific needs
Filter sessions before extraction for efficiency
Stream/pipe data when processing multiple sessions
Return raw data for caller to analyze

Success Criteria

Effective use of this skill means:

Efficient Discovery: Quickly find relevant sessions without manual searching
Targeted Extraction: Get exactly the data needed, nothing more
Context Preservation: Avoid loading unnecessary data into context
Raw Data Focus: Return unprocessed data for caller to analyze
Multi-Session Support: Handle analysis across timeframes or branches efficiently

Dependencies

Required:

bash (v4.0+)
jq (JSON parser)

Scripts check for jq and provide installation instructions if missing:

Error: jq is required but not installed. Install with: brew install jq

extracting-session-data