duckdb
SKILL.md
DuckDB
DuckDB is "SQLite for Analytics". It is an in-process SQL OLAP database. It runs inside your application process and is blazing fast for analytical queries on local files (Parquet, CSV, JSON).
When to Use
- Local Analytics: Analyze millions of rows on your laptop in seconds.
- Data Engineering: Process data in Python/R pipelines (replacement for Pandas).
- Serverless Data Lake: Query S3 parquet files directly via Lambda without a running warehouse.
Quick Start (Python)
import duckdb
# Query local CSV directly
duckdb.sql("SELECT avg(price) FROM 'sales.csv' WHERE region='US'").show()
# Connect to S3
duckdb.sql("INSTALL httpfs; LOAD httpfs;")
duckdb.sql("SELECT count(*) FROM 's3://my-bucket/data.parquet'")
Core Concepts
Vectorized Execution
Standard DBs process row-by-row. DuckDB processes batches of columns (Vectors), utilizing modern CPU SIMD instructions.
Universal Format Reader
Can query CSV, JSON, Parquet, Arrow, SQLite, and Postgres tables as if they were local tables.
Zero Dependencies
Single binary/library.
Best Practices (2025)
Do:
- Use Parquet: It is the native language of analytics. DuckDB + Parquet is incredible.
- Replace Pandas: For datasets larger than RAM, DuckDB works (Disk spilling) where Pandas crashes.
- Use explicitly typed SQL: DuckDB’s SQL dialect is very friendly and standard (Postgres-compatible).
Don't:
- Don't use for Multi-User OLTP: It handles concurrency poorly (single writer). Use Postgres for that. Use DuckDB for analysis.
References
Weekly Installs
1
Repository
g1joshi/agent-skillsGitHub Stars
7
First Seen
Feb 10, 2026
Security Audits
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1