read-file

Installation
SKILL.md

You are helping the user read and analyze a data file using DuckDB.

Filename given: $0 Question: ${1:-describe the data}

Step 1 — Read it

RESOLVED_PATH is $0. If the user gave a bare filename (no /), resolve it to a full path with find first.

Run a single DuckDB command that defines the read_any macro inline and reads the file.

For remote files, prepend the necessary LOAD/SECRET before the macro:

Protocol Prepend
https:// / http:// LOAD httpfs;
s3:// LOAD httpfs; CREATE SECRET (TYPE S3, PROVIDER credential_chain);
gs:// / gcs:// LOAD httpfs; CREATE SECRET (TYPE GCS, PROVIDER credential_chain);
az:// / azure:// / abfss:// LOAD httpfs; LOAD azure; CREATE SECRET (TYPE AZURE, PROVIDER credential_chain);

For local files, no prefix needed.

duckdb -csv -c "
CREATE OR REPLACE MACRO read_any(file_name) AS TABLE
  WITH json_case AS (FROM read_json_auto(file_name))
     , csv_case AS (FROM read_csv(file_name))
     , parquet_case AS (FROM read_parquet(file_name))
     , avro_case AS (FROM read_avro(file_name))
     , blob_case AS (FROM read_blob(file_name))
     , spatial_case AS (FROM st_read(file_name))
     , excel_case AS (FROM read_xlsx(file_name))
     , sqlite_case AS (FROM sqlite_scan(file_name, (SELECT name FROM sqlite_master(file_name) LIMIT 1)))
     , ipynb_case AS (
         WITH nb AS (FROM read_json_auto(file_name))
         SELECT cell_idx, cell.cell_type,
                array_to_string(cell.source, '') AS source,
                cell.execution_count
         FROM nb, UNNEST(cells) WITH ORDINALITY AS t(cell, cell_idx)
         ORDER BY cell_idx
     )
  FROM query_table(
    CASE
      WHEN file_name ILIKE '%.json' OR file_name ILIKE '%.jsonl' OR file_name ILIKE '%.ndjson' OR file_name ILIKE '%.geojson' OR file_name ILIKE '%.geojsonl' OR file_name ILIKE '%.har' THEN 'json_case'
      WHEN file_name ILIKE '%.csv' OR file_name ILIKE '%.tsv' OR file_name ILIKE '%.tab' OR file_name ILIKE '%.txt' THEN 'csv_case'
      WHEN file_name ILIKE '%.parquet' OR file_name ILIKE '%.pq' THEN 'parquet_case'
      WHEN file_name ILIKE '%.avro' THEN 'avro_case'
      WHEN file_name ILIKE '%.xlsx' OR file_name ILIKE '%.xls' THEN 'excel_case'
      WHEN file_name ILIKE '%.shp' OR file_name ILIKE '%.gpkg' OR file_name ILIKE '%.fgb' OR file_name ILIKE '%.kml' THEN 'spatial_case'
      WHEN file_name ILIKE '%.ipynb' THEN 'ipynb_case'
      WHEN file_name ILIKE '%.db' OR file_name ILIKE '%.sqlite' OR file_name ILIKE '%.sqlite3' THEN 'sqlite_case'
      ELSE 'blob_case'
    END
  );

DESCRIBE FROM read_any('RESOLVED_PATH');
SELECT count(*) AS row_count FROM read_any('RESOLVED_PATH');
FROM read_any('RESOLVED_PATH') LIMIT 20;
"

If this fails:

  • duckdb: command not found → invoke /duckdb-skills:install-duckdb and retry.
  • Missing extension (e.g. spatial files, xlsx, sqlite) → retry with INSTALL spatial; LOAD spatial; or INSTALL sqlite_scanner; LOAD sqlite_scanner; prepended before the macro.
  • Wrong reader / parse error → use the correct read_* function directly instead of read_any.

Step 2 — Answer

Using the schema, row count, and sample rows, answer:

${1:-describe the data: summarize column types, row count, and any notable patterns.}

Weekly Installs
75
GitHub Stars
403
First Seen
Mar 20, 2026
Installed on
codex73
gemini-cli68
opencode68
warp67
kimi-cli67
cursor67