duckdb-cli
DuckDB CLI
DuckDB is a fast, in-process analytical database. The CLI is a single, dependency-free executable.
Core Usage
duckdb # In-memory database
duckdb my.duckdb # Persistent database
duckdb :memory: "SELECT 42" # One-shot query
duckdb -c "SELECT 42" # One-shot query (in-memory)
duckdb < script.sql # Run SQL file
duckdb -f script.sql # Run SQL file (alternative)
Command Line Arguments
| Argument | Description |
|---|---|
-c COMMAND |
Run SQL command and exit |
-f FILENAME |
Run script file and exit |
-csv |
Set output to CSV |
-json |
Set output to JSON |
-markdown |
Set output to markdown |
-table |
Set output to ASCII table |
-box |
Set output to box-drawing table |
-line |
Set output to one-value-per-line |
-list |
Set output to pipe-delimited list |
-readonly |
Open database read-only |
-header / -noheader |
Toggle column headers |
-separator SEP |
Set column separator |
-nullvalue TEXT |
Set NULL display text |
-unsigned |
Allow unsigned extensions |
-bail |
Stop after first error |
-batch |
Force batch I/O (non-interactive) |
Arguments are processed in order:
duckdb -csv -c 'SELECT 42 AS x' -json -c 'SELECT 84 AS y'
# First outputs CSV, then JSON
Dot Commands
| Command | Description |
|---|---|
.open FILE |
Close current db, open FILE |
.databases |
List attached databases |
.tables [PATTERN] |
List tables (optional LIKE pattern) |
.schema [PATTERN] |
Show CREATE statements |
.mode MODE |
Set output format |
.headers on/off |
Toggle column headers |
.timer on/off |
Toggle query timing |
.output FILE |
Redirect all output to FILE |
.output |
Revert to stdout |
.once FILE |
Redirect next query output only |
.once -x / .excel |
Open next result in spreadsheet app |
.read FILE |
Execute SQL from FILE |
.import FILE TABLE |
Import data from FILE into TABLE |
.show |
Show current settings |
.quit / .exit |
Exit CLI |
Output Formats
Set via .mode MODE or CLI arguments (-csv, -json, etc.):
duckbox (default), box, table, csv, json, jsonlines, markdown, latex, html, insert, line, list, tabs, column, trash
Querying Files Directly
-- CSV
SELECT * FROM 'data.csv';
SELECT * FROM read_csv('data.csv');
SELECT * FROM read_csv('data/*.csv'); -- glob
-- Parquet
SELECT * FROM 'data.parquet';
SELECT * FROM read_parquet('data.parquet');
SELECT * FROM read_parquet('s3://bucket/data.parquet'); -- remote
-- JSON
SELECT * FROM 'data.json';
SELECT * FROM read_json('data.json');
-- Multiple files
SELECT * FROM read_csv(['file1.csv', 'file2.csv']);
SELECT * FROM 'data/**/*.parquet'; -- recursive glob
Exporting Data
-- To CSV
COPY (SELECT * FROM tbl) TO 'output.csv' (HEADER, DELIMITER ',');
-- To Parquet
COPY (SELECT * FROM tbl) TO 'output.parquet' (FORMAT PARQUET);
-- To JSON
COPY (SELECT * FROM tbl) TO 'output.json' (FORMAT JSON, ARRAY true);
Or via dot commands:
.mode csv
.output results.csv
SELECT * FROM tbl;
.output
Piping (stdin/stdout)
# Read from stdin
cat data.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')"
# Write to stdout
duckdb -c "COPY (SELECT 1 AS x) TO '/dev/stdout' WITH (FORMAT csv, HEADER)"
# Chain
cat data.csv | duckdb -c "COPY (SELECT * FROM read_csv('/dev/stdin') WHERE x > 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" | wc -l
Non-Interactive Patterns
# One-shot query
duckdb :memory: "SELECT 42 AS answer"
# Multiple statements
duckdb :memory: -c "CREATE TABLE t AS SELECT 1 AS x; SELECT * FROM t;"
# From file
duckdb my.duckdb < queries.sql
# JSON output for scripting
duckdb -json -c "SELECT * FROM 'data.csv' LIMIT 5"
# CSV output
duckdb -csv -noheader -c "SELECT col FROM 'data.csv'"
Extensions
INSTALL httpfs; -- HTTP/S3 access
LOAD httpfs;
INSTALL spatial; -- Geospatial
INSTALL fts; -- Full-text search
INSTALL vss; -- Vector similarity search
INSTALL postgres; -- PostgreSQL scanner
INSTALL sqlite; -- SQLite scanner
INSTALL excel; -- Excel support
Text Analysis Functions
Tier 1: Built-in Text Functions (no extension needed)
Exact matching:
SELECT * FROM docs WHERE CONTAINS(text, 'search term');
SELECT * FROM docs WHERE text ILIKE '%pattern%';
SELECT * FROM docs WHERE text LIKE '%pattern%';
Fuzzy string matching (approximate, not semantic):
SELECT jaccard('duck', 'luck'); -- 0-1, higher = more similar
SELECT jaro_winkler_similarity('duckdb', 'duckbd'); -- 0-1, higher = more similar
SELECT damerau_levenshtein('duck', 'dukc'); -- edit distance with transpositions
SELECT editdist3('kitten', 'sitting'); -- Levenshtein edit distance
Regex:
SELECT regexp_matches(text, 'pattern');
SELECT regexp_extract(text, '(\d+)-(\d+)', 1);
SELECT regexp_replace(text, 'old', 'new', 'g');
Tokenization:
SELECT regexp_split_to_table(text, '\s+') AS token FROM docs;
Tier 2: Full Text Search (fts extension)
Keyword-based search with BM25 ranking. Handles word forms via stemming, but not semantic meaning.
INSTALL fts; LOAD fts;
-- Create a full-text index
PRAGMA create_fts_index('docs', 'id', 'text', stemmer = 'english');
-- Search with BM25 scoring
SELECT d.*, score
FROM docs d
JOIN (
SELECT *, fts_main_docs.match_bm25(id, 'search query') AS score
FROM docs
) s ON d.id = s.id
WHERE score IS NOT NULL
ORDER BY score DESC;
-- Stemming function
SELECT stem('learning', 'english'); -- returns 'learn'
Tier 3: Vector Similarity Search (vss extension)
Semantic search using vector embeddings. DuckDB provides index and distance functions; generate embeddings externally (e.g., sentence-transformers, OpenAI API).
INSTALL vss; LOAD vss;
-- Table with embedding column
CREATE TABLE docs (id INT, text VARCHAR, embedding FLOAT[384]);
-- HNSW index for fast nearest-neighbor search
CREATE INDEX idx ON docs USING HNSW (embedding) WITH (metric = 'cosine');
-- Distance functions
SELECT array_cosine_distance(a.embedding, b.embedding) FROM ...;
SELECT array_distance(a.embedding, b.embedding) FROM ...; -- L2
SELECT array_negative_inner_product(a.embedding, b.embedding) FROM ...;
-- Nearest-neighbor search (uses HNSW index)
SELECT * FROM docs
ORDER BY array_cosine_distance(embedding, [0.1, 0.2, ...]::FLOAT[384])
LIMIT 10;
-- Fuzzy join / top-K macros
SELECT * FROM vss_join(table_a, table_b, a_embed, b_embed, 10, 'cosine');
SELECT * FROM vss_match(docs, embedding, query_vector, 10, 'cosine');
Common Recipes
# Analyze a CSV
duckdb -c "SUMMARIZE SELECT * FROM 'data.csv'"
duckdb -c "DESCRIBE SELECT * FROM 'data.csv'"
duckdb -c "SELECT count(*) FROM 'data.csv'"
# Convert between formats
duckdb -c "COPY (SELECT * FROM 'input.csv') TO 'output.parquet' (FORMAT PARQUET)"
duckdb -c "COPY (SELECT * FROM 'input.parquet') TO 'output.csv' (HEADER)"
duckdb -c "COPY (SELECT * FROM 'input.json') TO 'output.parquet' (FORMAT PARQUET)"
-- Query remote files
INSTALL httpfs; LOAD httpfs;
SELECT * FROM 'https://example.com/data.parquet' LIMIT 10;
-- Attach multiple databases
ATTACH 'other.duckdb' AS other;
SELECT * FROM other.main.my_table;
More from ysm-dev/skills
web-search
DuckDuckGo web search from the command line using ddgr. Use this skill whenever you need to search the web for information -- looking up documentation, researching error messages, finding API references, checking current facts, comparing libraries, or answering questions that require up-to-date information. Trigger this proactively whenever a task would benefit from a web search, even if the user didn't explicitly ask you to search. Also use this when the user asks you to install or set up ddgr.
6findweb
Google search from the command line using findweb. Use this skill whenever you need to search the web for information — looking up documentation, researching error messages, finding API references, checking current facts, comparing libraries, or answering questions that require up-to-date information. Trigger this proactively whenever a task would benefit from a web search, even if the user didn't explicitly ask you to search. Also use this when the user asks you to install or set up findweb.
3csv-analyzer
Analyze and process large CSV files (1M+ rows) using DuckDB and Polars. Use when the user asks to analyze, query, filter, aggregate, join, or transform CSV data. Triggers on requests like "analyze this CSV", "query CSV file", "filter large dataset", "aggregate CSV data", "join CSV files", "CSV statistics", or any data analysis task involving CSV/TSV/Parquet files.
2