cairn-ai-pentest

Installation
SKILL.md

Cairn AI Automated Penetration Testing System

Skill by ara.so — Daily 2026 Skills collection.

Cairn is an AI-driven automated penetration testing and general problem-solving framework developed by the Bytex@起零衍迹实验室 team. It achieved the unique "AK" (All Killed / full score) result in the 2nd TCH Tencent Cloud Hackathon Intelligent Penetration Challenge, placing 4th online. The system uses LLM-based agents to autonomously reason about, plan, and execute multi-step security testing tasks.


What Cairn Does

  • Autonomous AI Agent Loop: Iteratively reasons about a target, selects tools, executes commands, and interprets results
  • Penetration Testing Automation: Web vulnerability discovery, exploitation, CTF-style challenge solving
  • General Problem Solving: Extensible to non-security tasks via tool/plugin architecture
  • Multi-step Planning: Breaks complex objectives into subtasks with memory and context management
  • Tool Integration: Wraps common pentest tools (nmap, sqlmap, curl, custom scripts) as callable agent actions

Project Status

⚠️ Code is still being organized and is expected to be open-sourced soon. The examples below reflect the architecture described in the competition writeup and visible repository structure.

Follow the writeup for architecture details: https://mp.weixin.qq.com/s/DlpEH7bVr0xi0VawPJs3XA


Installation

# Clone the repository
git clone https://github.com/oritera/Cairn.git
cd Cairn

# Install Python dependencies (expected)
pip install -r requirements.txt

# Or with uv (modern Python tooling)
uv sync

Environment Configuration

Create a .env file in the project root:

# LLM Provider (OpenAI-compatible endpoint)
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1
MODEL_NAME=gpt-4o

# OR use a local/alternative provider
# OPENAI_BASE_URL=https://api.deepseek.com/v1
# MODEL_NAME=deepseek-chat

# Agent configuration
MAX_ITERATIONS=30
TIMEOUT_PER_STEP=60

# Target scope (safety guard)
TARGET_SCOPE=192.168.1.0/24

# Logging
LOG_LEVEL=INFO
LOG_FILE=./logs/cairn.log

Core Architecture

Cairn follows a ReAct (Reasoning + Acting) agent pattern:

User Goal
┌─────────────────────────────┐
│         Agent Loop          │
│  ┌────────────────────────┐ │
│  │  Think (LLM Reasoning) │ │
│  └──────────┬─────────────┘ │
│             │               │
│  ┌──────────▼─────────────┐ │
│  │  Act (Tool Selection)  │ │
│  └──────────┬─────────────┘ │
│             │               │
│  ┌──────────▼─────────────┐ │
│  │  Observe (Parse Result)│ │
│  └──────────┬─────────────┘ │
│             │               │
│         (loop until done)   │
└─────────────────────────────┘
Final Answer / Exploit / Report

Key Usage Patterns

1. Basic Agent Invocation (Expected CLI)

# Run against a CTF challenge or target
python cairn.py --target "http://192.168.1.100" --goal "Find and exploit SQL injection to retrieve admin credentials"

# With custom model
python cairn.py --target "http://challenge.example.com" \
  --goal "Solve this web CTF challenge and get the flag" \
  --model gpt-4o \
  --max-iterations 25

# Dry run (plan only, no execution)
python cairn.py --target "http://192.168.1.100" \
  --goal "Enumerate all open services" \
  --dry-run

2. Python API Usage (Expected)

from cairn import CairnAgent
from cairn.tools import ToolRegistry
from cairn.config import CairnConfig

# Initialize configuration
config = CairnConfig(
    model_name="gpt-4o",
    api_key=os.environ["OPENAI_API_KEY"],
    base_url=os.environ.get("OPENAI_BASE_URL", "https://api.openai.com/v1"),
    max_iterations=30,
    target_scope=["192.168.1.0/24"],
)

# Build tool registry
tools = ToolRegistry()
tools.register_defaults()  # nmap, curl, sqlmap, ffuf, etc.

# Create and run agent
agent = CairnAgent(config=config, tools=tools)

result = agent.run(
    target="http://192.168.1.100",
    goal="Find all web vulnerabilities and attempt exploitation",
)

print(result.summary)
print(result.findings)

3. Custom Tool Registration

from cairn.tools import Tool, ToolResult

class CustomExploitTool(Tool):
    name = "custom_exploit"
    description = "Exploits a specific vulnerability in target application"
    
    def execute(self, target: str, payload: str, **kwargs) -> ToolResult:
        import subprocess
        cmd = f"python exploit.py --target {target} --payload '{payload}'"
        output = subprocess.run(cmd, shell=True, capture_output=True, text=True)
        return ToolResult(
            success=output.returncode == 0,
            output=output.stdout,
            error=output.stderr,
        )

# Register with agent
tools.register(CustomExploitTool())
agent = CairnAgent(config=config, tools=tools)

4. Multi-Phase Penetration Test

from cairn import CairnAgent, Phase
from cairn.pipeline import PentestPipeline

pipeline = PentestPipeline(agent=agent)

# Define phases
pipeline.add_phase(Phase(
    name="reconnaissance",
    goal="Enumerate all open ports and services on {target}",
))
pipeline.add_phase(Phase(
    name="vulnerability_scan",
    goal="Based on discovered services, identify exploitable vulnerabilities",
    depends_on=["reconnaissance"],
))
pipeline.add_phase(Phase(
    name="exploitation",
    goal="Exploit identified vulnerabilities and achieve {objective}",
    depends_on=["vulnerability_scan"],
))

# Run full pipeline
report = pipeline.run(
    target="192.168.1.100",
    objective="obtain root shell or read /flag",
)
report.save("./reports/pentest_report.json")

Tool Integration Examples

Built-in Tool Wrappers (Expected)

# nmap integration
from cairn.tools.network import NmapTool

nmap = NmapTool()
result = nmap.execute(target="192.168.1.100", flags="-sV -sC -p-")
# Returns structured service enumeration data

# HTTP request tool
from cairn.tools.web import HTTPTool

http = HTTPTool()
result = http.execute(
    url="http://target.com/login",
    method="POST",
    data={"username": "admin' OR '1'='1", "password": "x"},
    follow_redirects=True,
)

# Command execution tool (sandboxed)
from cairn.tools.shell import ShellTool

shell = ShellTool(allowed_commands=["curl", "nmap", "sqlmap", "ffuf"])
result = shell.execute(command="sqlmap -u 'http://target.com/?id=1' --dbs --batch")

Agent Memory and Context

from cairn.memory import AgentMemory

# Memory persists findings across agent steps
memory = AgentMemory(
    short_term_limit=20,    # Recent observations in context
    long_term_enabled=True, # Summarize older context
    facts_store=True,       # Extract and index key facts
)

agent = CairnAgent(config=config, tools=tools, memory=memory)

# Access collected facts after run
for finding in agent.memory.findings:
    print(f"[{finding.severity}] {finding.description}")
    print(f"  Evidence: {finding.evidence}")
    print(f"  Recommendation: {finding.remediation}")

Configuration Reference

# cairn/config.py (expected structure)

@dataclass
class CairnConfig:
    # LLM settings
    model_name: str = "gpt-4o"
    api_key: str = field(default_factory=lambda: os.environ["OPENAI_API_KEY"])
    base_url: str = "https://api.openai.com/v1"
    temperature: float = 0.1       # Low temp for consistent tool use
    max_tokens: int = 4096
    
    # Agent behavior
    max_iterations: int = 30       # Hard stop on runaway loops
    timeout_per_step: int = 60     # Seconds per tool execution
    verbose: bool = False
    
    # Safety
    target_scope: list[str] = field(default_factory=list)
    dry_run: bool = False          # Plan without executing
    require_confirmation: bool = False  # Interactive approval per step
    
    # Output
    report_format: str = "json"    # json | markdown | html
    report_path: str = "./reports"

Prompt Engineering Patterns

Cairn uses structured system prompts for reliable tool invocation:

# Example system prompt structure (inferred from competition writeup)
SYSTEM_PROMPT = """You are an expert penetration tester AI agent.

## Objective
{goal}

## Target
{target}

## Available Tools
{tool_descriptions}

## Rules
1. Always reason step-by-step before acting
2. Stay within scope: {scope}
3. Prefer non-destructive enumeration before exploitation
4. Document every finding with evidence

## Response Format
Thought: <your reasoning>
Action: <tool_name>
Action Input: <tool parameters as JSON>

After receiving Observation, continue until you reach a Final Answer.
"""

CTF / Challenge Mode

# Optimized for CTF flag capture
python cairn.py \
  --mode ctf \
  --target "http://ctf-challenge.com:8080" \
  --goal "Find the hidden flag in format FLAG{...}" \
  --model gpt-4o \
  --iterations 50 \
  --verbose

# With flag pattern matching
python cairn.py \
  --mode ctf \
  --target "http://target.com" \
  --flag-pattern "CTF\{[a-zA-Z0-9_]+\}" \
  --auto-submit

Logging and Debugging

import logging
from cairn import CairnAgent

# Enable detailed agent trace logging
logging.basicConfig(level=logging.DEBUG)

agent = CairnAgent(config=config, tools=tools, verbose=True)

# Each step is logged:
# [THINK] Analyzing login form for injection points...
# [ACT]   Calling tool: http_request
# [INPUT] {"url": "...", "method": "POST", "data": {...}}
# [OBS]   Response 200, contains "Invalid credentials"
# [THINK] Response suggests valid injection point, trying UNION...

Troubleshooting

Issue Cause Fix
Agent loops without progress Goal too vague or tools failing silently Add --max-iterations 15, use --verbose to inspect loop
Tool execution timeout Slow network or heavy scan Increase TIMEOUT_PER_STEP in config
LLM refuses tool call Safety filter on model provider Use a less restrictive model endpoint or rephrase goal
Out of context window Long agent history Reduce short_term_limit or enable memory summarization
Scope violation error Target not in allowed scope Add target CIDR to TARGET_SCOPE in .env
Empty findings report Agent completed but found nothing Check target accessibility, increase iterations

Responsible Use

Cairn is licensed under AGPL-3.0. Usage must comply with:

  • ✅ Authorized penetration tests with written permission
  • ✅ CTF competitions and intentionally vulnerable lab environments
  • ✅ Personal security research on systems you own
  • ❌ Unauthorized access to systems you don't own
  • ❌ Commercial use without a separate commercial license

Contact the maintainer at the repository for commercial licensing inquiries.


Resources

Weekly Installs
122
GitHub Stars
39
First Seen
Today