NYC

regex-log

SKILL.md

Regex Log Parsing Skill

This skill provides a systematic approach for constructing complex regular expressions that extract and validate structured data from log files.

When to Use This Skill

This skill applies when:

  • Building regex patterns to extract data from log entries
  • Validating specific formats (IPv4 addresses, dates, timestamps) within logs
  • Handling requirements for first/last occurrence selection
  • Enforcing word boundary conditions
  • Combining multiple validation constraints in a single pattern

Approach: Decomposition Strategy

Complex log parsing regex should be built by decomposing the problem into sub-patterns:

Step 1: Identify All Requirements

Before writing any regex, create a complete list of requirements:

  • What data needs to be validated (present but not captured)?
  • What data needs to be captured?
  • What boundary conditions apply (word boundaries, line anchors)?
  • Are there positional requirements (first, last, nth occurrence)?
  • What constitutes an invalid match?

Step 2: Build Sub-Patterns Independently

Construct each validation pattern separately before combining:

IPv4 Address Pattern

For valid IPv4 addresses (0-255 per octet, no leading zeros except for 0 itself):

  • Octet pattern: (?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])
  • Order alternatives from most specific to least specific
  • Full IPv4: (?:(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])

Date Pattern (YYYY-MM-DD)

For valid dates with proper month-day validation:

  • 31-day months: (?:0[13578]|1[02])-(?:0[1-9]|[12][0-9]|3[01])
  • 30-day months: (?:0[469]|11)-(?:0[1-9]|[12][0-9]|30)
  • February (up to 29): 02-(?:0[1-9]|1[0-9]|2[0-9])
  • Combine with year: [0-9]{4}-(?:...combined month-day patterns...)

Step 3: Apply Positional Requirements

Selecting Last Occurrence

To capture the last valid pattern in a line:

^.*<pattern>(?!.*<pattern>)
  • Use ^.* to greedily consume characters
  • Use negative lookahead (?!.*<pattern>) to ensure no pattern follows

Selecting First Occurrence

To capture the first valid pattern:

^(?:(?!<pattern>).)*<pattern>

Or simply rely on regex engines returning the first match by default.

Step 4: Apply Validation Without Capture

To require presence of a pattern without capturing it:

  • Use lookahead: (?=.*<pattern>) at the start of the regex
  • This validates the line contains the pattern without affecting the capture

Step 5: Apply Word Boundaries

For patterns that must not be adjacent to alphanumeric characters:

  • Use \b word boundaries: \b<pattern>\b
  • Be aware that \b matches between word and non-word characters

Verification Strategy

Create Comprehensive Test Cases

Organize tests by category:

  1. Valid cases: Confirm expected matches

    • Minimum/maximum valid values (e.g., 0.0.0.0, 255.255.255.255)
    • Edge values for each component
  2. Invalid format cases: Confirm rejection

    • Out-of-range values (e.g., 256.0.0.0)
    • Invalid formatting (leading zeros where prohibited)
    • Invalid months (00, 13) or days (32)
  3. Boundary condition cases:

    • Pattern at start/end of line
    • Pattern adjacent to alphanumeric characters (should fail with word boundaries)
    • Pattern adjacent to punctuation (should pass with word boundaries)
  4. Positional cases:

    • Multiple valid patterns in one line (verify correct one is captured)
    • Single pattern in line
    • No valid pattern in line

Test File Structure

Create a structured test file that:

  • Groups tests by category
  • Uses clear naming for each test case
  • Reports pass/fail status for each test
  • Summarizes overall results

Example structure:

test_cases = {
    "valid_ipv4": [...],
    "invalid_ipv4": [...],
    "valid_dates": [...],
    "invalid_dates": [...],
    "last_occurrence": [...],
    "boundary_conditions": [...]
}

Common Pitfalls

1. Incomplete First Attempt

  • Problem: Creating incomplete or truncated test files
  • Solution: Plan the full test structure before writing; validate file completeness before execution

2. Environment Assumptions

  • Problem: Assuming python command exists when only python3 is available
  • Solution: Check the Python environment first or use python3 explicitly

3. Scattered Reasoning

  • Problem: Disorganized thought process leading to repeated work
  • Solution: Follow the decomposition strategy linearly; complete each sub-pattern before moving to the next

4. Duplicate Patterns Without Abstraction

  • Problem: Same regex pattern repeated multiple times, increasing error risk
  • Solution: Define complex sub-patterns once in reasoning, then reference them; in code, use variables

5. Missing Edge Cases

  • Problem: Focusing only on happy path validation
  • Solution: Explicitly test:
    • Boundary values (min/max for each component)
    • Invalid values just outside valid range
    • Empty and null cases
    • Patterns at different positions in the line

6. Order of Alternatives

  • Problem: Less specific alternatives matching before more specific ones
  • Solution: Order regex alternatives from most specific to least specific (e.g., 25[0-5] before 2[0-4][0-9] before [0-9])

7. Greedy vs Non-Greedy Matching

  • Problem: Unexpected capture due to greedy quantifiers
  • Solution: Understand when to use .* vs .*?; for "last occurrence" patterns, greedy .* is typically correct

Workflow Summary

  1. List all requirements explicitly
  2. Build and test sub-patterns independently
  3. Combine sub-patterns with appropriate anchors and lookaheads
  4. Create comprehensive test cases covering all categories
  5. Run tests and verify all pass
  6. Clean up test files after validation
Weekly Installs
18
Repository
letta-ai/skills
First Seen
Jan 24, 2026
Installed on
claude-code13
codex12
gemini-cli12
opencode12
antigravity11
cursor9