notemdpro-batch-processor
NoteMD Pro - Batch Processing
Overview
This skill provides utilities for batch processing multiple markdown files with proper concurrency control, rate limiting, and error handling. It prevents API rate limits and ensures stable processing of large vaults.
When to Use
- Large vault migrations: Process 100+ files
- Bulk operations: Translate, generate, or extract from multiple files
- Rate limit prevention: Avoid overwhelming LLM APIs
- Progress tracking: Monitor batch progress
Key Functions (from utils.ts)
createConcurrentProcessor
Creates a concurrent processor with staggered task execution.
export function createConcurrentProcessor<T, R>(
concurrency: number,
apiCallIntervalMs: number,
progressReporter: ProgressReporter,
): (tasks: (() => Promise<T>)[]) => Promise<R[]>;
// Features:
// - Staggered worker start (prevents burst)
// - Progress reporting
// - Cancellation support
// - Result ordering
chunkArray
Splits an array into chunks of specified size.
export function chunkArray<T>(arr: T[], size: number): T[][];
// Example:
chunkArray([1, 2, 3, 4, 5, 6, 7, 8], 3);
// Returns: [[1,2,3], [4,5,6], [7,8]]
retry
Retry with exponential backoff.
export async function retry<T>(
fn: () => Promise<T>,
maxRetries?: number,
delayMs?: number,
signal?: AbortSignal,
): Promise<T>;
// Features:
// - Exponential backoff (delay * 2^i)
// - Abort signal support
// - Error propagation
Batch Processing Flow
batchOperation
├── Get list of files
├── Create concurrent processor
├── Create task functions
│ └── For each file
│ └── Process file
├── Execute with staggered start
├── Handle errors per file
└── Collect results
Settings
Concurrency Settings
| Setting | Description | Default |
|---|---|---|
enableBatchParallelism |
Enable parallel processing | false |
batchConcurrency |
Number of concurrent operations | 1 |
batchSize |
Files per batch | 10 |
batchInterDelayMs |
Delay between batches | 1000 |
apiCallIntervalMs |
Delay between API calls | 0 |
API Stability
| Setting | Description |
|---|---|
enableStableApiCall |
Enable stable API calls |
apiCallInterval |
Interval between calls (ms) |
apiCallMaxRetries |
Max retry attempts |
Concurrency Strategy
Staggered Start
Instead of starting all workers at once, workers are started with delays:
Worker 0: starts at 0ms
Worker 1: starts at interval * 1ms
Worker 2: starts at interval * 2ms
...
Worker N: starts at interval * Nms
This prevents API rate limits by distributing requests over time.
Inter-batch Delays
For very large batches, delays between batches prevent overwhelming APIs:
Batch 1: [====] → delay → Batch 2: [====] → delay → Batch 3: [====]
Error Handling Strategy
Per-File Errors
Each file is processed independently. Errors don't stop the entire batch:
try {
await processFile(file);
results.push({ file, success: true });
} catch (error) {
results.push({ file, success: false, error });
}
Retry Logic
For transient errors (network, rate limits), use exponential backoff:
Attempt 1: wait 1000ms
Attempt 2: wait 2000ms
Attempt 3: wait 4000ms
🛑 Fatal Errors & HTTP 429 (Independent Operation)
When running as an independent AI agent, you must rigorously guard against LLM API failures:
- HTTP 429 (Rate Limit Exceeded): Immeadiately pause the batch processor. Do not rely on naive retries if the bucket is exhausted. Implement a generic
AbortSignalto gracefully halt the queue. - Gateway Timeouts/502: API providers often return HTML instead of JSON during outages. Wrap all
JSON.parseor provider SDK calls in resilienttry/catchblocks that log the raw text toerror_processing_filename.logbefore aborting.
Persistent Error Logging (saveErrorLog)
[!IMPORTANT] Essential Debugging Output The
notemdpro-batch-processorworkflow utilizessaveErrorLog(fromfileUtils.ts) to write detailed stack traces toerror_processing_filename.login the Vault root. When acting as an AI Agent, if a user reports a batch failure, you MUST proactively read this log file to secure the stack trace rather than asking the user for screenshots.
Progress Reporting
progressReporter.log(`Processing file ${i}/${total}: ${file.name}`);
progressReporter.updateStatus(`Processing...`, Math.floor((i / total) * 100));
progressReporter.updateActiveTasks(1); // Increment active tasks
Usage Example
import { createConcurrentProcessor, chunkArray, delay } from "./utils";
// Settings
const concurrency = 5;
const apiCallIntervalMs = 1000;
const batchSize = 10;
// Get files
const files = app.vault.getMarkdownFiles();
// Create processor
const processor = createConcurrentProcessor(
concurrency,
apiCallIntervalMs,
progressReporter,
);
// Create tasks
const tasks = files.map((file) => async () => {
await processFile(file, settings, progressReporter);
return { file: file.name, success: true };
});
// Process in batches
const fileBatches = chunkArray(tasks, batchSize);
for (const batch of fileBatches) {
const results = await processor(batch);
// Handle batch results
const errors = results.filter((r) => !r.success);
if (errors.length > 0) {
console.log(`Batch had ${errors.length} errors`);
}
// Delay between batches
await delay(1000);
}
Best Practices
- Start with low concurrency: Test with 1-3 concurrent operations
- Monitor rate limits: Adjust delays based on API responses
- Use meaningful progress messages: Help users understand status
- Collect errors: Don't fail fast; collect all errors for review
- Implement cancellation: Allow users to stop long-running batches
- Log every N files: Don't log every file; log every 10-50
Common Issues
Rate Limit (429)
- Symptom: API returns 429 error
- Solution: Increase
apiCallIntervalMs, reducebatchConcurrency
Timeout
- Symptom: Request times out
- Solution: Increase timeout settings, check network
Memory Issues & OOM Avoidance
- Symptom: Heap Out of Memory (OOM) with large vaults or large files.
- Root Cause:
splitContentattempts to chunk by word count, which fails completely if the Markdown file contains massive Base64 encoded images(![[data:image/png;base64,...]]). This causes massive strings to be cloned in memory during parallel processing. - Solution (Mandatory Pre-processing):
- Base64 Sanitization: Before passing any file to the chunker, strip or replace Base64 strings with a placeholder.
- Reduce
batchSizeheavily (e.g., to 2). - Reduce
batchConcurrencyto 1 for extremely large vaults.
🧠 Token-Safe Chunking Guidelines
When building an independent implementation of splitContent, the AI MUST NOT blindly slice strings at the Nth character or word.
- Never Split Code Fences: Do not split inside
``` ... ```blocks. - Never Split LaTeX: Do not split inside
$$ ... $$math blocks. - Never Split Frontmatter: Do not split inside the YAML
---header. - Safe Boundaries: Always attempt to split at Markdown headers (
##,###) or at least at double newlines\n\n.
Related skills
notemdpro— for broader NoteMD Pro workflow routingnotemdpro-content-generator— for large-scale generation workflowsnotemdpro-web-researcher— when batches need research enrichment first
More from zpankz/obsidian-skills
viva-llm
Use VIVA LLM for multi-provider chat, voice calls, terminal integration, assistants, skills, MCP tools, and agent mode inside Obsidian. Trigger when the user mentions VIVA LLM, voice chat, realtime voice, LLM providers in Obsidian, or vault-integrated AI chat.
1obsidian-plugin-accessibility
Use this skill when building or reviewing Obsidian plugin UI for keyboard access, ARIA labels, screen reader support, focus handling, or mobile touch targets. Accessibility is treated as mandatory, not optional.
1tasks
Create and query tasks using the Tasks plugin syntax including due dates, recurrence, priorities, and task queries. Use when the user mentions Tasks plugin, recurring tasks, task queries, or advanced task management in Obsidian.
1dataview
Create Dataview queries using DQL (Dataview Query Language), inline queries, and DataviewJS. Use when the user mentions Dataview, DQL, querying notes, listing notes by metadata, or creating dynamic views of vault content.
1defuddle
Extract clean markdown from web pages using Defuddle CLI, removing clutter to save tokens. Use when the user provides a URL to read or analyze.
1datacore
Create Datacore views using JSX/React syntax and the dc.* API. Use when the user mentions Datacore, dc.useQuery, JSX views, or React-based vault queries. Datacore is the successor to Dataview with better performance and interactive views.
1