syncfusion-dotnet-smart-data-extraction
Smart Data Extractor — Syncfusion
Overview
Extracts complete document structures from PDFs and images files using the Syncfusion SmartDataExtractor Library. This skill supports one operational mode — generating C# code for the user's project.
Key Capabilities
- Document structure extraction: Identify text elements, images, headers, footers, and tables (including regions, header rows, columns, cell boundaries, and merged cells).
- File format support: Works with PDF documents and common image formats such as JPEG and PNG.
- Table extraction: Specialized capability to extract tabular data.
- Form recognition: Detects and processes structured form data.
- Page-level control: Extract data from specific pages or defined page ranges.
- Confidence threshold: Results are filtered based on a configurable confidence score (0.0–1.0).
Prerequisites
- Install required runtime and library packages from NuGet before running extraction.
- Syncfusion License:
LICENSE.txtor env varSYNCFUSION_LICENSE_KEY
Quick Start Examples
Example : Generate Code
User: "Write Program.cs code to extract the data from pdf and save as JSON." Result: C# code snippet displayed (no files created)
Mode
Mode 1: Generate C# Code for the User's Project (default)
Use this mode when the user wants to view, write, review, refactor, or modify C# code related to Smart Data Extractor processing. Trigger keywords: "show me how", "how to", "how can I", "how do I", "provide code", "provide an example", "give an example", "demonstrate", "code snippet", "sample code", "example", "sample", "give me", "show me", "Program.cs", "example code", "generate code for", "codesnippet" .
Workflow:
Step 1 — Detect Application Type and Suggest Required NuGet Packages
- Inspect the workspace project files (
.csproj,web.config,App.config,Startup.cs,Program.cs, etc.) and use the detection signals table inreferences/nuget-packages.mdto determine the application type. - Based on the detected application type, identify the correct NuGet package(s) from
references/nuget-packages.mdand instruct the user to install them before generating any code. ONLY use package IDs and versions listed inreferences/nuget-packages.md— do not suggest, look up, or infer package names from external sources or common naming conventions. - Note: If the user's request is explicitly table-only (asks only to extract table data), recommend only the Table Extractor package listed in
references/nuget-packages.mdand review the ExtractTable section for the detected application type. Do not recommend or add the broaderSmartDataExtractorpackage unless the user requests non-table extraction or JSON conversion features.
Step 2 — Generate Code from Reference Files Only
Do NOT invent, guess, or suggest any API, method, property, class, or namespace not explicitly present in the reference files.
- Read the relevant
references/*.mdfile(s) for the requested feature - Build C# code strictly from the APIs and snippets found in those files
- Select the correct snippet variant based on the app type detected in Step 1:
- Windows-specific apps (WinForms, WPF, .NET Framework Console) → use Windows-specific snippets
- Cross-platform apps (ASP.NET Core, .NET Core/.NET 5+ Console, Blazor, MAUI) → use cross-platform /
.Net.Coresnippets - After the
using/ namespace lines at the top of the generated code, always insert the license registration block from the Register License section inreferences/nuget-packages.md - Do not create or run any
.csxscript
Code References
All templates and snippets are in the references/ folder:
| File | Contents |
|---|---|
| document-structure.md | Quick extractor setup and usage snippets |
| extract-data.md | Examples: ExtractDataAsJson, ExtractDataAsPdfStream,ExtractDataAsPdfDocument, async variants |
| extract-table.md | Table extraction examples (ExtractTableAsJson) |
| recognize-forms.md | recognize form fields examples : FormRecognizeOptions, RecognizeFormAsPdfDocument,RecognizeFormAsPdfStream, RecognizeFormAsJson async variants |
| data-options.md | Explanation of TableExtractionOptions, FormRecognizeOptions, ConfidenceThreshold , PageRange |
Rules
- Output files go in
./output/directory - Use license key from
LICENSE.txtat workspace root - Don't use any API which is not in reference
- Only use NuGet package IDs and versions defined in
references/nuget-packages.mdwhen recommending or adding packages. - For table-only extraction requests, recommend/install only the table extractor package from
references/nuget-packages.mdfor the detected application type.