obsidian-rlm
Obsidian RLM - Large File Processor
Status: Production-Tested Last Updated: 2026-01-26 Dependencies: RLM Server running, API provider recommended (OpenAI/OpenRouter) Integrates With: ContentImporter agent, Obsidian vault workflows
⚠️ Critical: Provider Requirements
Local models (14B-24B) are unreliable for RLM's JSON protocol. Testing with 72MB Claude export showed:
| Provider | JSON Reliability | Recommended |
|---|---|---|
| OpenAI GPT-4o | ✅ Excellent | Yes |
| OpenRouter | ✅ Excellent | Yes |
| DeepSeek API | ✅ Excellent | Yes |
| Ollama 70B+ | ⚠️ Moderate | Fallback |
| Ollama 14B-24B | ❌ Unreliable | No |
For production use with large files, use API providers.
Recommended: Hybrid Approach
For files >50MB, the hybrid approach is most reliable:
- Python/jq for metadata (instant, 100% reliable)
- RLM for content analysis on extracted segments
See "Hybrid Workflow" section below.
Purpose
When the ContentImporter agent encounters files too large for direct processing (10MB+), this skill takes over to:
- Analyze the file structure via RLM iterative commands
- Extract metadata, counts, and summaries without loading entire file
- Return structured data for ContentImporter's triage workflow
- Generate Obsidian-ready output (frontmatter, wikilinks, tags)
Hybrid Workflow (Recommended for 50MB+)
Tested with 72MB Claude export (1,375 conversations)
Step 1: Python Metadata Extraction (Instant)
import json
with open('conversations.json', 'r', encoding='utf-8') as f:
data = json.load(f)
# Basic stats
print(f"Total conversations: {len(data)}")
dates = [c['created_at'][:10] for c in data if c.get('created_at')]
print(f"Date range: {min(dates)} to {max(dates)}")
# Triage by message count
msg_counts = [len(c.get('chat_messages', [])) for c in data]
high_value = sum(1 for c in msg_counts if c > 50)
medium = sum(1 for c in msg_counts if 10 < c <= 50)
low = sum(1 for c in msg_counts if 0 < c <= 10)
print(f"HIGH (50+ msgs): {high_value}")
print(f"MEDIUM (11-50): {medium}")
print(f"LOW (1-10): {low}")
# Top conversations
convos = [(c.get('name', ''), len(c.get('chat_messages', []))) for c in data]
convos.sort(key=lambda x: x[1], reverse=True)
print("\nTop 10:")
for name, count in convos[:10]:
print(f" {count:3} msgs - {name[:50]}")
Step 2: RLM for Content Analysis
For specific conversations needing deeper analysis:
# Extract one conversation to a smaller file
python -c "
import json
with open('conversations.json') as f:
data = json.load(f)
# Find conversation by name
conv = next(c for c in data if 'BMAD' in c.get('name', ''))
with open('single_conv.json', 'w') as f:
json.dump(conv, f)
"
# Analyze with RLM
curl -X POST http://localhost:4539/query \
-H "Content-Type: application/json" \
-d '{
"query": "Summarize the key decisions and action items from this conversation",
"context_path": "demo/single_conv.json"
}'
Quick Start
Prerequisites
-
RLM Server Running at
http://localhost:4539cd D:\rlm-project\rlm-orchestrator # Use OpenAI config for reliability $env:LITELLM_API_KEY = "your-openai-key" .\target\release\rlm-server.exe config-openai.toml -
Verify Health
curl http://localhost:4539/health # Expected: {"status":"healthy","wasm_enabled":false}
Note: WASM is disabled for large files (crashes on 70MB+). DSL-only mode is sufficient.
Basic Usage
# Count conversations in large Claude export
curl -X POST http://localhost:4539/query \
-H "Content-Type: application/json" \
-d '{
"query": "Count the total number of conversations in this JSON",
"context_file": "C:/exports/conversations.json"
}'
# Extract conversation titles for triage
curl -X POST http://localhost:4539/query \
-H "Content-Type: application/json" \
-d '{
"query": "List all conversation names with their created_at dates",
"context_file": "C:/exports/conversations.json"
}'
Size Threshold Logic
| File Size | Recommended Approach | Why |
|---|---|---|
| < 1MB | Direct parsing | Fast, no overhead |
| 1-10MB | Direct with streaming | Manageable in memory |
| 10-50MB | RLM with API provider | Iterative processing, reliable JSON |
| 50MB+ | Hybrid: Python + RLM | Python for metadata, RLM for analysis |
| 70MB+ | Hybrid only | WASM crashes, DSL-only too slow |
⚠️ WASM crashes on files >70MB - Always disable WASM for large files.
Detection Code (for ContentImporter)
// In ContentImporter workflow
const fileSizeMB = fs.statSync(filePath).size / (1024 * 1024);
if (fileSizeMB > 10) {
// Trigger obsidian-rlm skill
console.log(`Large file detected (${fileSizeMB.toFixed(1)}MB) - using RLM`);
return processWithRLM(filePath, query);
} else {
// Standard processing
return JSON.parse(fs.readFileSync(filePath));
}
Supported File Formats
Claude/Anthropic Exports
Structure:
{
"conversations": [
{
"uuid": "abc123",
"name": "Project Discussion",
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z",
"chat_messages": [
{"sender": "human", "text": "..."},
{"sender": "assistant", "text": "..."}
]
}
]
}
Common RLM Queries:
- "Count total conversations"
- "List conversation names with dates"
- "Find conversations mentioning [topic]"
- "Extract conversations longer than 20 messages"
- "Identify conversations with code blocks"
ChatGPT/OpenAI Exports
Structure:
{
"title": "Conversation Title",
"create_time": 1705312800,
"mapping": {
"node_id": {
"message": {
"author": {"role": "user|assistant"},
"content": {"parts": ["..."]}
}
}
}
}
Common RLM Queries:
- "Count total conversations across all files"
- "List titles with create_time converted to dates"
- "Find conversations about [topic]"
- "Extract conversations with assistant code responses"
RLM Query Patterns for Obsidian
Pattern 1: Generate Triage Report
Query:
Analyze this Claude conversations export and generate a triage report:
- Count total conversations
- Categorize by apparent topic (coding, writing, research, general)
- Flag conversations with >50 messages as HIGH VALUE
- Flag conversations with code blocks as HIGH VALUE
- Return as markdown table format
Expected Output:
| Category | Count | High Value |
|----------|-------|------------|
| Coding | 47 | 23 |
| Writing | 18 | 5 |
| Research | 12 | 8 |
| General | 89 | 2 |
| **Total** | **166** | **38** |
Pattern 2: Extract Metadata for Staging
Query:
For each conversation in this export, extract:
- uuid
- name (truncated to 50 chars)
- created_at date (YYYY-MM-DD format)
- message_count
- has_code_blocks (true/false)
Return as JSON array
Expected Output:
[
{"uuid": "abc123", "name": "Project Architecture Discussion", "date": "2024-01-15", "message_count": 67, "has_code": true},
{"uuid": "def456", "name": "Quick Question About Syntax", "date": "2024-01-14", "message_count": 4, "has_code": false}
]
Pattern 3: Search for Specific Topics
Query:
Find all conversations that discuss "Obsidian" or "vault" or "PKM".
Return conversation names and relevant excerpt (first 200 chars of matching message).
Pattern 4: Generate Frontmatter Batch
Query:
For HIGH VALUE conversations (>30 messages or contains code), generate Obsidian frontmatter:
- title: conversation name
- date: created_at
- tags: [inferred from content]
- source: "Claude Export"
- import-date: "[[2026-01-25]]"
Return as YAML blocks ready for copy-paste
Integration with ContentImporter
Workflow: Large Claude Export
1. User: "Process my Claude export at C:\exports\conversations.json"
2. ContentImporter checks file size:
- Size: 64MB → Triggers obsidian-rlm
3. obsidian-rlm runs initial analysis:
- Query: "Count conversations, identify structure"
- Returns: 847 conversations, Claude format
4. obsidian-rlm runs triage:
- Query: "Categorize by value (HIGH/MEDIUM/LOW)"
- Returns: 52 HIGH, 189 MEDIUM, 606 LOW
5. ContentImporter generates Triage Report
6. User selects items to import
7. obsidian-rlm extracts selected conversations:
- Query: "Extract full conversation with uuid [X]"
- Returns: Complete conversation text
8. ContentImporter creates notes in 00 NoteLab/
Data Handoff Format
obsidian-rlm returns data in this format for ContentImporter:
{
"analysis": {
"file_path": "C:/exports/conversations.json",
"file_size_mb": 64.2,
"format": "claude",
"total_conversations": 847,
"processing_time_seconds": 45
},
"triage": {
"high_value": [
{
"uuid": "abc123",
"name": "System Architecture Design",
"date": "2024-01-15",
"message_count": 89,
"has_code": true,
"suggested_folder": "02 Cards/Reference",
"suggested_tags": ["#Architecture", "#Reference", "#🌲"]
}
],
"medium_value": [...],
"low_value_count": 606
}
}
Configuration
RLM Config for Large Files
Use increased limits for 64MB+ files:
# config-large-files.toml
max_iterations = 50 # More iterations for complex queries
max_sub_calls = 100 # More sub-calls for detailed analysis
output_limit = 50000 # Larger output for full extractions
bypass_enabled = true
bypass_threshold = 4000
[wasm]
enabled = true
rust_wasm_enabled = true
fuel_limit = 5000000 # 5x normal for large files
memory_limit = 268435456 # 256MB for large context
codegen_provider = "ollama"
codegen_url = "http://192.168.1.120:11434"
codegen_model = "qwen2.5:14b-instruct-q4_K_M"
[[providers]]
provider_type = "ollama"
base_url = "http://192.168.1.120:11434"
model = "qwen2.5:14b-instruct-q4_K_M"
role = "root"
weight = 1
[[providers]]
provider_type = "ollama"
base_url = "http://192.168.1.120:11434"
model = "qwen3:1.7b-q4_K_M"
role = "sub"
weight = 1
Troubleshooting
Problem: RLM times out on 64MB file
Solution: Use chunked queries. Instead of loading entire file, query specific ranges:
"Analyze conversations 0-100 in this export"
"Analyze conversations 100-200 in this export"
Problem: JSON parse errors on Claude export
Solution: Verify file is valid JSON. Claude exports occasionally have encoding issues:
# Validate JSON
python -c "import json; json.load(open('conversations.json'))"
Problem: RLM server not responding
Solution: Check server is running and accessible:
curl http://localhost:4539/health
# If no response, restart server
Problem: Ollama model too slow
Solution: For initial triage (not extraction), use faster/smaller models:
# Fast triage model
model = "qwen3:8b" # Faster than 14b for simple counts
Performance Expectations
| File Size | Query Type | Expected Time |
|---|---|---|
| 10MB | Count conversations | 5-10 seconds |
| 10MB | Full triage | 30-60 seconds |
| 64MB | Count conversations | 15-30 seconds |
| 64MB | Full triage | 2-5 minutes |
| 64MB | Extract specific conversation | 10-20 seconds |
Complete Example: Process 64MB Export
# 1. Start RLM server
cd D:\rlm-project\rlm-orchestrator
.\target\release\rlm-server.exe config-lan-ollama.toml
# 2. Initial analysis
curl -X POST http://localhost:4539/query \
-H "Content-Type: application/json" \
-d '{
"query": "Analyze this Claude export: count conversations, identify date range, count total messages",
"context_file": "C:/exports/conversations.json"
}'
# Response: 847 conversations, 2023-06-01 to 2024-12-31, 42,891 messages
# 3. Triage by value
curl -X POST http://localhost:4539/query \
-H "Content-Type: application/json" \
-d '{
"query": "Categorize conversations: HIGH VALUE (>50 messages OR has code), MEDIUM (10-50 messages), LOW (<10 messages). Return counts and list HIGH VALUE names.",
"context_file": "C:/exports/conversations.json"
}'
# Response: HIGH: 52, MEDIUM: 189, LOW: 606. HIGH VALUE list...
# 4. Extract specific conversation
curl -X POST http://localhost:4539/query \
-H "Content-Type: application/json" \
-d '{
"query": "Extract the full conversation named \"System Architecture Design\" with all messages",
"context_file": "C:/exports/conversations.json"
}'
# 5. Generate Obsidian note
# ContentImporter takes the extracted conversation and creates note in 00 NoteLab/
Related Skills
| Skill | Purpose |
|---|---|
rlm-project-assistant |
Full RLM setup and configuration |
obsidian-markdown |
Note formatting and frontmatter |
json-canvas |
Processing .canvas files |
References
- See
rlm-project-assistantskill for full RLM setup instructions - See
references/QUERY_PATTERNS.mdfor additional query examples - See ContentImporter agent:
06 Toolkit/Agents/Sub Agents/10-ContentImporter.md
Production Tested: 64MB Claude export, 847 conversations Processing Time: ~3 minutes for full triage Integration: ContentImporter agent ready