notemdpro-batch-processor

Installation

SKILL.md

NoteMD Pro - Batch Processing

Overview

This skill provides utilities for batch processing multiple markdown files with proper concurrency control, rate limiting, and error handling. It prevents API rate limits and ensures stable processing of large vaults.

When to Use

Large vault migrations: Process 100+ files
Bulk operations: Translate, generate, or extract from multiple files
Rate limit prevention: Avoid overwhelming LLM APIs
Progress tracking: Monitor batch progress

Key Functions (from utils.ts)

createConcurrentProcessor

Creates a concurrent processor with staggered task execution.

export function createConcurrentProcessor<T, R>(
  concurrency: number,
  apiCallIntervalMs: number,
  progressReporter: ProgressReporter,
): (tasks: (() => Promise<T>)[]) => Promise<R[]>;

// Features:
// - Staggered worker start (prevents burst)
// - Progress reporting
// - Cancellation support
// - Result ordering

chunkArray

Splits an array into chunks of specified size.

export function chunkArray<T>(arr: T[], size: number): T[][];

// Example:
chunkArray([1, 2, 3, 4, 5, 6, 7, 8], 3);
// Returns: [[1,2,3], [4,5,6], [7,8]]

retry

Retry with exponential backoff.

export async function retry<T>(
  fn: () => Promise<T>,
  maxRetries?: number,
  delayMs?: number,
  signal?: AbortSignal,
): Promise<T>;

// Features:
// - Exponential backoff (delay * 2^i)
// - Abort signal support
// - Error propagation

Batch Processing Flow

batchOperation
├── Get list of files
├── Create concurrent processor
├── Create task functions
│   └── For each file
│       └── Process file
├── Execute with staggered start
├── Handle errors per file
└── Collect results

Settings

Concurrency Settings

Setting	Description	Default
`enableBatchParallelism`	Enable parallel processing	false
`batchConcurrency`	Number of concurrent operations	1
`batchSize`	Files per batch	10
`batchInterDelayMs`	Delay between batches	1000
`apiCallIntervalMs`	Delay between API calls	0

API Stability

Setting	Description
`enableStableApiCall`	Enable stable API calls
`apiCallInterval`	Interval between calls (ms)
`apiCallMaxRetries`	Max retry attempts

Concurrency Strategy

Staggered Start

Instead of starting all workers at once, workers are started with delays:

Worker 0: starts at 0ms
Worker 1: starts at interval * 1ms
Worker 2: starts at interval * 2ms
...
Worker N: starts at interval * Nms

This prevents API rate limits by distributing requests over time.

Inter-batch Delays

For very large batches, delays between batches prevent overwhelming APIs:

Batch 1: [====] → delay → Batch 2: [====] → delay → Batch 3: [====]

Error Handling Strategy

Per-File Errors

Each file is processed independently. Errors don't stop the entire batch:

try {
  await processFile(file);
  results.push({ file, success: true });
} catch (error) {
  results.push({ file, success: false, error });
}

Retry Logic

For transient errors (network, rate limits), use exponential backoff:

Attempt 1: wait 1000ms
Attempt 2: wait 2000ms
Attempt 3: wait 4000ms

🛑 Fatal Errors & HTTP 429 (Independent Operation)

When running as an independent AI agent, you must rigorously guard against LLM API failures:

HTTP 429 (Rate Limit Exceeded): Immeadiately pause the batch processor. Do not rely on naive retries if the bucket is exhausted. Implement a generic AbortSignal to gracefully halt the queue.
Gateway Timeouts/502: API providers often return HTML instead of JSON during outages. Wrap all JSON.parse or provider SDK calls in resilient try/catch blocks that log the raw text to error_processing_filename.log before aborting.

Persistent Error Logging (saveErrorLog)

[!IMPORTANT] Essential Debugging Output The notemdpro-batch-processor workflow utilizes saveErrorLog (from fileUtils.ts) to write detailed stack traces to error_processing_filename.log in the Vault root. When acting as an AI Agent, if a user reports a batch failure, you MUST proactively read this log file to secure the stack trace rather than asking the user for screenshots.

Progress Reporting

progressReporter.log(`Processing file ${i}/${total}: ${file.name}`);
progressReporter.updateStatus(`Processing...`, Math.floor((i / total) * 100));
progressReporter.updateActiveTasks(1); // Increment active tasks

Usage Example

import { createConcurrentProcessor, chunkArray, delay } from "./utils";

// Settings
const concurrency = 5;
const apiCallIntervalMs = 1000;
const batchSize = 10;

// Get files
const files = app.vault.getMarkdownFiles();

// Create processor
const processor = createConcurrentProcessor(
  concurrency,
  apiCallIntervalMs,
  progressReporter,
);

// Create tasks
const tasks = files.map((file) => async () => {
  await processFile(file, settings, progressReporter);
  return { file: file.name, success: true };
});

// Process in batches
const fileBatches = chunkArray(tasks, batchSize);
for (const batch of fileBatches) {
  const results = await processor(batch);

  // Handle batch results
  const errors = results.filter((r) => !r.success);
  if (errors.length > 0) {
    console.log(`Batch had ${errors.length} errors`);
  }

  // Delay between batches
  await delay(1000);
}

Best Practices

Start with low concurrency: Test with 1-3 concurrent operations
Monitor rate limits: Adjust delays based on API responses
Use meaningful progress messages: Help users understand status
Collect errors: Don't fail fast; collect all errors for review
Implement cancellation: Allow users to stop long-running batches
Log every N files: Don't log every file; log every 10-50

Common Issues

Rate Limit (429)

Symptom: API returns 429 error
Solution: Increase apiCallIntervalMs, reduce batchConcurrency

Timeout

Symptom: Request times out
Solution: Increase timeout settings, check network

Memory Issues & OOM Avoidance

Symptom: Heap Out of Memory (OOM) with large vaults or large files.
Root Cause: splitContent attempts to chunk by word count, which fails completely if the Markdown file contains massive Base64 encoded images (![[data:image/png;base64,...]]). This causes massive strings to be cloned in memory during parallel processing.
Solution (Mandatory Pre-processing):
1. Base64 Sanitization: Before passing any file to the chunker, strip or replace Base64 strings with a placeholder.
2. Reduce batchSize heavily (e.g., to 2).
3. Reduce batchConcurrency to 1 for extremely large vaults.

🧠 Token-Safe Chunking Guidelines

When building an independent implementation of splitContent, the AI MUST NOT blindly slice strings at the Nth character or word.

Never Split Code Fences: Do not split inside ``` ... ``` blocks.
Never Split LaTeX: Do not split inside $$ ... $$ math blocks.
Never Split Frontmatter: Do not split inside the YAML --- header.
Safe Boundaries: Always attempt to split at Markdown headers (##, ###) or at least at double newlines \n\n.

Related skills

notemdpro — for broader NoteMD Pro workflow routing
notemdpro-content-generator — for large-scale generation workflows
notemdpro-web-researcher — when batches need research enrichment first

Related skills

More from zpankz/obsidian-skills

Installs

Repository

zpankz/obsidian-skills

GitHub Stars

First Seen

Apr 16, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

notemdpro-batch-processor

NoteMD Pro - Batch Processing

Overview

When to Use

Key Functions (from utils.ts)

createConcurrentProcessor

chunkArray

retry

Batch Processing Flow

Settings

Concurrency Settings

API Stability

Concurrency Strategy

Staggered Start

Inter-batch Delays

Error Handling Strategy

Per-File Errors

Retry Logic

🛑 Fatal Errors & HTTP 429 (Independent Operation)

Persistent Error Logging (saveErrorLog)

Progress Reporting

Usage Example

Best Practices

Common Issues

Rate Limit (429)

Timeout

Memory Issues & OOM Avoidance

🧠 Token-Safe Chunking Guidelines

Related skills

More from zpankz/obsidian-skills

viva-llm

obsidian-plugin-accessibility

tasks

dataview

defuddle

datacore