pii-removal
DeepRead PII Removal
You are helping a developer redact PII from documents using DeepRead's PII removal API. You know the full API and can write working integration code in any language.
Base URL: https://api.deepread.tech
Auth: X-API-Key header with key from https://www.deepread.tech/dashboard or via the device authorization flow (use /setup)
What PII Removal Does
The developer provides:
- A document (PDF, text file, or image)
- Optional configuration (redaction style, language)
DeepRead returns:
- A redacted document with all PII removed
- A detection report showing what was found, where, and confidence scores
14 PII types detected automatically: SSN, credit cards, emails, phone numbers, names, addresses, dates of birth, passport numbers, driver's license numbers, bank accounts, IP addresses, IBANs, URLs, and medical record numbers.
Prerequisites
You need a DEEPREAD_API_KEY. If the developer doesn't have one, use /setup to obtain one via the device authorization flow.
API Endpoints
POST /v1/pii/redact — Submit a Document for Redaction
Uploads a document for async PII detection and redaction. Returns immediately with a job ID.
Auth: X-API-Key: YOUR_KEY
Content-Type: multipart/form-data
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | — | PDF, TXT, PNG, or JPEG |
redaction_style |
string | No | "black_bar" |
"black_bar", "placeholder", or "partial" |
webhook_url |
string | No | — | HTTPS URL to receive results when done |
language |
string | No | "en" |
Document language: en, zh, es, hi, ar |
Redaction styles:
| Style | What it does | Example |
|---|---|---|
black_bar |
Black rectangles over PII | ███████████ |
placeholder |
Replace with type labels | [NAME], [SSN], [EMAIL] |
partial |
Partial reveal | ***-**-6789, ****-****-****-0366 |
Response (200 OK):
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued"
}
Errors:
| Status | Meaning |
|---|---|
| 400 | Unsupported file format, empty file, invalid parameters, non-HTTPS webhook |
| 401 | Invalid or missing API key |
| 413 | File exceeds size limit |
| 429 | Monthly page quota exceeded or rate limit hit |
GET /v1/pii/{job_id} — Get Job Status & Results
Poll until status is completed or failed. Recommended: poll every 5-10 seconds.
Auth: X-API-Key: YOUR_KEY
Response (completed):
{
"id": "550e8400-...",
"status": "completed",
"progress_percent": 100,
"redacted_file_url": "https://storage.deepread.tech/pii/.../redacted.pdf",
"report": {
"id": "550e8400-...",
"page_count": 3,
"processing_time_ms": 4200,
"pii_detected": {
"SSN": {
"count": 2,
"pages": [1, 2],
"confidence_avg": 0.97
},
"EMAIL": {
"count": 3,
"pages": [1],
"confidence_avg": 0.99
},
"NAME": {
"count": 5,
"pages": [1, 2, 3],
"confidence_avg": 0.92
}
},
"total_redactions": 10,
"redaction_policy": "black_bar",
"confidence_threshold_used": 0.85
},
"error": null
}
Response (failed):
{
"id": "550e8400-...",
"status": "failed",
"progress_percent": 0,
"redacted_file_url": null,
"report": null,
"error": {
"code": "DOCUMENT_CORRUPTED",
"message": "Failed to parse the uploaded document"
}
}
Statuses: queued -> processing -> completed or failed
Detection Report
The report object in a completed response contains:
| Field | Description |
|---|---|
page_count |
Number of pages processed |
processing_time_ms |
Processing time in milliseconds |
pii_detected |
Detections grouped by PII type |
total_redactions |
Total number of redactions applied |
redaction_policy |
Redaction style used |
confidence_threshold_used |
Confidence threshold (default 0.85) |
Each PII type in pii_detected includes:
count— number of instances foundpages— which pages they appear onconfidence_avg— average confidence score (0-1)
PII Types Detected
| Type | Examples |
|---|---|
SSN |
Social Security Numbers (123-45-6789) |
CREDIT_CARD |
Credit/debit card numbers |
EMAIL |
Email addresses |
PHONE |
Phone numbers |
NAME |
Personal names |
ADDRESS |
Physical/mailing addresses |
DATE_OF_BIRTH |
Dates of birth |
PASSPORT_NUMBER |
Passport numbers |
DRIVER_LICENSE |
Driver's license numbers |
BANK_ACCOUNT |
Bank account/routing numbers |
IBAN |
International Bank Account Numbers |
IP_ADDRESS |
IP addresses |
URL |
URLs |
MEDICAL_RECORD |
Medical record numbers (MRN) |
Webhooks
Pass webhook_url when submitting a document to get notified on completion.
Completed payload:
{
"job_id": "550e8400-...",
"status": "completed",
"redacted_file_url": "https://storage.deepread.tech/pii/.../redacted.pdf",
"report": {
"page_count": 3,
"processing_time_ms": 4200,
"pii_detected": { ... },
"total_redactions": 10
}
}
Failed payload:
{
"job_id": "550e8400-...",
"status": "failed",
"error": {
"code": "DOCUMENT_CORRUPTED",
"message": "Failed to parse the uploaded document"
}
}
Important:
- Webhooks are NOT authenticated — always fetch the canonical result via
GET /v1/pii/{job_id}with your API key - Must be HTTPS (HTTP and private IPs are rejected)
- Return 2xx to confirm delivery
- Make your endpoint idempotent (may receive duplicates)
Supported Languages
| Code | Language |
|---|---|
en |
English (default) |
zh |
Chinese |
es |
Spanish |
hi |
Hindi |
ar |
Arabic |
Code Examples
Python
import requests
import time
API_KEY = "sk_live_YOUR_KEY"
BASE = "https://api.deepread.tech"
# Submit document for PII redaction
with open("contract.pdf", "rb") as f:
resp = requests.post(
f"{BASE}/v1/pii/redact",
headers={"X-API-Key": API_KEY},
files={"file": f},
data={
"redaction_style": "black_bar",
"language": "en",
}
)
job_id = resp.json()["id"]
# Poll for results
delay = 5
while True:
time.sleep(delay)
result = requests.get(
f"{BASE}/v1/pii/{job_id}",
headers={"X-API-Key": API_KEY}
).json()
if result["status"] in ("completed", "failed"):
break
delay = min(delay * 1.5, 30)
# Use results
if result["status"] == "completed":
print(f"Download: {result['redacted_file_url']}")
report = result["report"]
print(f"Pages: {report['page_count']}")
print(f"Redactions: {report['total_redactions']}")
for pii_type, info in report["pii_detected"].items():
print(f" {pii_type}: {info['count']} found (avg confidence: {info['confidence_avg']:.0%})")
else:
error = result.get("error", {})
print(f"Failed: {error.get('message', 'Unknown error')}")
JavaScript / Node.js
import fs from "fs";
const API_KEY = "sk_live_YOUR_KEY";
const BASE = "https://api.deepread.tech";
// Submit document for PII redaction
const form = new FormData();
form.append("file", fs.createReadStream("contract.pdf"));
form.append("redaction_style", "black_bar");
const { id: jobId } = await fetch(`${BASE}/v1/pii/redact`, {
method: "POST",
headers: { "X-API-Key": API_KEY },
body: form
}).then(r => r.json());
// Poll for results
let delay = 5000;
let result;
do {
await new Promise(r => setTimeout(r, delay));
result = await fetch(`${BASE}/v1/pii/${jobId}`, {
headers: { "X-API-Key": API_KEY }
}).then(r => r.json());
delay = Math.min(delay * 1.5, 30000);
} while (!["completed", "failed"].includes(result.status));
if (result.status === "completed") {
console.log("Download:", result.redacted_file_url);
console.log("Redactions:", result.report.total_redactions);
for (const [type, info] of Object.entries(result.report.pii_detected)) {
console.log(` ${type}: ${info.count} found`);
}
} else {
console.log("Failed:", result.error?.message);
}
cURL
# Submit document for PII redaction
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: YOUR_KEY" \
-F "file=@contract.pdf" \
-F "redaction_style=black_bar"
# With placeholder style and webhook
curl -X POST https://api.deepread.tech/v1/pii/redact \
-H "X-API-Key: YOUR_KEY" \
-F "file=@medical_record.pdf" \
-F "redaction_style=placeholder" \
-F "webhook_url=https://your-app.com/webhooks/pii"
# Get results (use job_id from response)
curl https://api.deepread.tech/v1/pii/JOB_ID \
-H "X-API-Key: YOUR_KEY"
Error Codes
| Code | Meaning |
|---|---|
INVALID_REQUEST |
Bad parameters, invalid job ID |
UNSUPPORTED_FORMAT |
File type not supported (use PDF, TXT, PNG, JPEG) |
DOCUMENT_CORRUPTED |
Cannot parse the document |
PASSWORD_PROTECTED |
PDF is password-protected |
EMPTY_DOCUMENT |
Uploaded file is empty |
FILE_TOO_LARGE |
File exceeds size limit |
RATE_LIMITED |
Too many requests |
INTERNAL_ERROR |
Server error — retry later |
Troubleshooting
| Error | Fix |
|---|---|
| 400 "Unsupported file format" | Use PDF, TXT, PNG, or JPEG |
| 400 "Uploaded file is empty" | File has zero bytes |
| 400 "Webhook URL must use HTTPS" | Change http:// to https:// |
| 400 "Webhook URL cannot use private IP" | Use a public URL, not 192.168.x.x or localhost |
| 400 "Synthetic redaction style is not available" | Use black_bar, placeholder, or partial |
| 429 "Rate limit exceeded" | Slow down requests or upgrade plan |
| Status "failed" with "DOCUMENT_CORRUPTED" | File may be damaged. Try re-uploading or converting to PDF |
Rate Limits
Plans:
| Plan | Pages/month | Price |
|---|---|---|
| Free | 2,000 | $0 (no credit card) |
| Pro | 50,000 | $99/mo |
| Scale | Custom | Custom |
Help the Developer
- No API key yet -> use
/setupfor the device authorization flow - Redact a document -> POST /v1/pii/redact with
file, show code in their language - Check results -> GET /v1/pii/{job_id}, explain the detection report
- Download redacted doc -> use
redacted_file_urlfrom completed response - Different redaction styles -> explain black_bar vs placeholder vs partial
- Non-English documents -> use
languageparameter (zh, es, hi, ar) - Real-time updates -> set up
webhook_url, build receiver endpoint - Hitting errors -> check API key, plan limits, file format
More from deepread-tech/skills
api
Full DeepRead API reference. All endpoints, auth, request/response formats, blueprints, webhooks, error handling, and code examples.
9setup
Get started with DeepRead. Automatically obtains an API key via device authorization flow, then walks through first request, structured extraction, and blueprints.
8doc-sync
Detects when code changes have made documentation outdated and flags or updates the affected docs. Use after implementing features, changing APIs, or modifying architecture.
2enforce
Validates code changes against DeepRead's mandatory patterns and standards defined in AGENTS.md. Use this after writing or modifying code to catch violations before committing.
2prepare
Session opener. Analyzes a task description and creates a scoped plan with a checklist, affected files, and which skills to run. Use at the start of every coding session before writing any code.
2migrate
Helps create and manage Supabase database migrations. Use when adding or modifying database tables, columns, or constraints.
2