gpu-document-processing
GPU Document Processing Skill
Process large documents and document collections using GPU-accelerated tools. This skill uses the sandbox-as-tool pattern: the agent runs on CPU for reasoning, and sends document processing work to a GPU-equipped environment.
When to Use This Skill
Use this skill when:
- Processing large PDF files (50+ pages)
- Analyzing collections of documents (10+ files)
- Extracting structured data from unstructured documents
- Performing bulk text extraction and chunking
- Generating embeddings for large document sets
- The user uploads or references large documents for analysis
Architecture: Sandbox as Tool
This skill follows the sandbox-as-tool pattern for GPU execution:
- Agent reasons on CPU - planning, synthesis, report writing
- Processing sent to GPU sandbox - document parsing, embedding, extraction
- Results returned to agent - structured output for further analysis
This separation ensures:
- API keys stay outside the sandbox (security)
- Agent state persists independently of processing jobs
- Processing can be parallelized across documents
- Cost-efficient: GPU used only during processing, not during reasoning
Capabilities
PDF Text Extraction
Extract text content from PDF documents with layout preservation:
- Headers, paragraphs, lists, and tables detected separately
- Page numbers and section boundaries preserved
- Multi-column layout handling
Tabular Data Extraction
Extract tables from documents into structured formats:
- PDF tables to CSV/DataFrames using GPU-accelerated parsing
- Automatic column type detection
- Handles merged cells and multi-row headers
Document Chunking
Split large documents into meaningful chunks for analysis:
- Semantic chunking (by topic/section boundaries)
- Fixed-size chunking with overlap for embedding
- Configurable chunk sizes (default: 512 tokens)
Embedding Generation
Generate vector embeddings for document chunks:
- Uses NVIDIA NeMo Retriever NIM for GPU-accelerated embedding
- Supports batch processing for large document sets
- Compatible with standard vector stores (Milvus, ChromaDB)
Workflow
- Receive document reference from the orchestrator
- Determine processing type (extraction, analysis, embedding)
- Send to GPU sandbox for processing
- Collect structured results (text, tables, embeddings)
- Write findings to /shared/ for the orchestrator to synthesize
Processing Large Document Collections
For multiple documents:
- Process documents in parallel batches (3-5 concurrent)
- Extract key metadata first (title, date, author, page count)
- Generate per-document summaries
- Cross-reference findings across documents
- Write consolidated findings with per-document citations
Output Format
When reporting document processing results:
- Include document metadata (filename, pages, size)
- Structure extracted content by section/chapter
- Format tables as markdown tables
- Include page references for all extracted content
- Note any extraction quality issues (scanned images, corrupted pages)
Integration with NVIDIA NIM
For production deployments, GPU document processing can leverage:
- NVIDIA NeMo Retriever: GPU-accelerated embedding and retrieval
- NVIDIA RAPIDS cuDF: Tabular data processing from extracted tables
- NVIDIA Triton: Scalable inference for document classification models
See NVIDIA's NIM documentation for self-hosted deployment options.