USGS National Produced Waters Geochemical Database

Fetches and analyzes geochemical data from the USGS National Produced Waters Geochemical Database v3.0 — the most comprehensive public dataset of produced water and oilfield brine chemistry in the United States.

Credential

None required. All data is publicly accessible through USGS ScienceBase.

# No API key needed — data is open access
# Template for consistency with other skills:
# KEY=$(grep '^api_key=' ~/.config/usgs/credentials 2>/dev/null | cut -d= -f2)

Data Source

Field	Value
Dataset	USGS National Produced Waters Geochemical Database (ver. 3.0, December 2023)
DOI	https://doi.org/10.5066/P9DSRCZJ
ScienceBase Item (v3.0)	`64fa1e71d34ed30c2054ea11`
ScienceBase Item (v2.3)	`59d25d63e4b05fe04cc235f9`
Records	~115,000 produced water samples
Columns	150+ (see `references/schema.md`)
Formats	v3.0: Excel (.xlsx), Shapefile (.shp); v2.3: CSV, Excel, R data
Coverage	All major U.S. oil and gas basins, some Canadian provinces
Update frequency	Irregular (v2.3: Jan 2019, v3.0: Dec 2023)
Viewer	https://www.usgs.gov/tools/us-geological-survey-national-produced-waters-geochemical-database-viewer
Citation	Blondes, M.S., Shelton, J.L., Engle, M.A., et al. (2023)

Version notes: v3.0 introduced the FORMSIMPLE (standardized formation names) and PLAYTYPE (geological play classification) columns, added 23 new datasets focused on high-lithium brines and Appalachian Basin data, converted all ppm values to mg/L, and improved location data. The IDUSGS identifier is persistent across v2.3 and v3.0.

API Structure

There is no REST query API — the database is a bulk download from ScienceBase. The workflow is: discover files via ScienceBase API, download, filter locally.

ScienceBase File Discovery

# List files for v3.0 (Excel + shapefile)
curl -s "https://www.sciencebase.gov/catalog/item/64fa1e71d34ed30c2054ea11?format=json&fields=files,title" \
  | jq '.files[] | {name, size, url: .downloadUri}'

v3.0 files:

File	Size	Format
`USGS_NPWGDv3_excel.xlsx`	80 MB	Excel workbook (main data)
`USGS_NPWGDv3_data_dictionary.xlsx`	30 KB	Column definitions
`USGS_NPWGDv3_shape.zip`	24 MB	Shapefile (points with attributes)
`version_history.txt`	6 KB	Version change log

# List files for v2.3 (CSV available)
curl -s "https://www.sciencebase.gov/catalog/item/59d25d63e4b05fe04cc235f9?format=json&fields=files" \
  | jq '.files[] | {name, size, url: .downloadUri}'

v2.3 files:

File	Size	Format
`USGSPWDBv2.3c.csv`	66 MB	CSV — compiled database (preferred)
`USGSPWDBv2.3n.csv`	63 MB	CSV — numeric-only version
`USGSPWDBv2.3c.xlsx`	29 MB	Excel — compiled database
`USGSPWDBv2.3 Data Dictionary.csv`	7 KB	Column definitions
`USGSPWDBv2.3 Data Sources.csv`	2 KB	Source dataset list

Download the CSV (v2.3)

# Download v2.3 compiled CSV (~66 MB)
curl -L -o /tmp/USGSPWDBv2.3c.csv \
  "https://www.sciencebase.gov/catalog/file/get/59d25d63e4b05fe04cc235f9?f=__disk__11%2F1e%2F8e%2F111e8e3952b1637289254fd398af95a81f937389"

Download the Excel file (v3.0)

# Download v3.0 Excel workbook (~80 MB)
curl -L -o /tmp/USGS_NPWGDv3_excel.xlsx \
  "https://www.sciencebase.gov/catalog/file/get/64fa1e71d34ed30c2054ea11?f=__disk__e7%2Fef%2F17%2Fe7ef17fcb71c49e2241da4139ed775f8e328bdab"

Search ScienceBase for Related Datasets

# Search for other produced water datasets
curl -s "https://www.sciencebase.gov/catalog/items?q=produced+water+geochemical&format=json&max=10" \
  | jq '.items[] | {id, title}'

Key Columns

Full schema is in references/schema.md. The most important columns for lithium/magnesium research are:

Identification and Location

IDUSGS — unique sample ID (persistent across versions)
STATE, COUNTY, BASIN — geographic location
LATITUDE, LONGITUDE — coordinates (some approximate; check COORDAPX)
FIPCODE — 5-digit FIPS code for state+county

Geology

FORMSIMPLE — standardized formation name (v3.0; use for filtering)
FORMATION — formation name as reported (partially standardized)
PLAYTYPE — play classification: Shale, Coal, Sedimentary, Geothermal, Injection (v3.0)
BASIN — geologic basin (cleaned in v3.0)
ERA, PERIOD, EPOCH — geologic age
DEPTHUPPER, DEPTHLOWER — perforation interval (ft)

Chemistry (all in mg/L)

Li — Lithium (key target for DLE)
Mg — Magnesium (co-recovery target)
TDS — Total dissolved solids (best available: measured, reported-calculated, or ion-sum)
Na, Ca, K, Cl, SO4, HCO3 — major ions
Ba, Sr, Br, Fe (FeTot), Mn, Si, B — minor/trace
d7Li — lithium isotope ratio (per mil)

Physical Properties

PH — pH
TEMP — temperature (deg F)
SG — specific gravity (reported or calculated)
PRESSURE — formation pressure (psi)

Key Formations

Detailed formation data is in references/formations.md. Summary of highest-priority formations for Li/Mg research:

Formation	Basin	States	Li (mg/L)	Mg (mg/L)	Economic Potential
Smackover	Gulf Coast	AR, TX, LA	50-477	1,000-5,000+	Highest U.S. brine Li; active DLE projects
Marcellus	Appalachian	WV, PA, OH	10-200+	500-3,000	Large water volumes; Li correlates with TDS
Utica / Pt. Pleasant	Appalachian	OH, WV, PA	20-150	500-2,500	Co-produced with Marcellus
Bakken	Williston	ND, MT	10-70	500-4,000	High volumes; moderate Li
Wolfcamp	Permian	TX, NM	5-50	300-3,000	Massive volumes; lower Li

Economic threshold for DLE: approximately 100-150 mg/L Li depending on technology and brine volume. Emerging technologies may lower this to ~50 mg/L.

FORMSIMPLE values for filtering: Smackover, Marcellus, Utica, Point Pleasant, Bakken, Wolfcamp, Eagle Ford, Oriskany, Clinton, Devonian Shale, Tuscaloosa.

Workflow

Step 1 — Resolve Intent

Map the user's question to a filtering strategy:

User asks about...	Filter by...
A specific formation	`FORMSIMPLE` or `FORMATION`
A state or region	`STATE`, `COUNTY`, or `BASIN`
Lithium potential	`Li` column with minimum threshold
A type of play	`PLAYTYPE` (Shale, Sedimentary, Coal, etc.)
Specific wells	`API` number or `WELLNAME`
High-TDS brines	`TDS` with minimum threshold
Specific analyte	Any chemistry column (see `references/schema.md`)

Step 2 — Discover and Download Data

Check if a local copy exists (cached from prior use):

ls -la /tmp/USGSPWDBv2.3c.csv 2>/dev/null || ls -la /tmp/USGS_NPWGDv3_excel.xlsx 2>/dev/null

If not cached, download from ScienceBase:

# Prefer v2.3 CSV for scripted analysis (v3.0 requires openpyxl for Excel)
curl -L -o /tmp/USGSPWDBv2.3c.csv \
  "https://www.sciencebase.gov/catalog/file/get/59d25d63e4b05fe04cc235f9?f=__disk__11%2F1e%2F8e%2F111e8e3952b1637289254fd398af95a81f937389"

Warn the user about file size (~66 MB for v2.3 CSV, ~80 MB for v3.0 Excel).

Step 3 — Filter and Analyze

Use Python (or bash with awk/csvtool) to filter and compute statistics:

import csv, statistics

# Load CSV
with open("/tmp/USGSPWDBv2.3c.csv") as f:
    reader = csv.DictReader(f)
    rows = [r for r in reader if "marcellus" in r.get("FORMATION", "").lower()]

# Extract Li values
li_vals = [float(r["Li"]) for r in rows if r.get("Li") and r["Li"] not in ("", "-9999")]

# Statistics
print(f"Marcellus Li: n={len(li_vals)}, "
      f"median={statistics.median(li_vals):.1f}, "
      f"mean={statistics.mean(li_vals):.1f}, "
      f"max={max(li_vals):.1f} mg/L")

For common queries, use the ready-made client in references/python_example.py which provides filter_rows(), summary_stats(), and formatted output functions.

Common filtering patterns:

# Count samples by state (bash + awk on CSV)
awk -F',' 'NR==1{for(i=1;i<=NF;i++) if($i=="STATE") col=i}
           NR>1{count[$col]++}
           END{for(s in count) print count[s], s}' \
  /tmp/USGSPWDBv2.3c.csv | sort -rn | head -20

# Extract high-Li samples (Python one-liner)
python3 -c "
import csv
with open('/tmp/USGSPWDBv2.3c.csv') as f:
    r = csv.DictReader(f)
    hits = [row for row in r if row.get('Li') and float(row['Li'] or 0) > 100]
print(f'{len(hits)} samples with Li > 100 mg/L')
for h in sorted(hits, key=lambda x: float(x.get('Li',0)), reverse=True)[:10]:
    print(f\"  Li={h['Li']} mg/L  State={h.get('STATE','')}  Fm={h.get('FORMATION','')}\")
"

Step 4 — Produce Output

Format: Summary Statistics Table + Narrative

Present results as a markdown table (capped at ~20 rows for readability), followed by a narrative summary covering:

Dataset scope — how many samples matched the filter, from how many states/formations
Key statistics — median, mean, min, max, P25/P75 for the target analytes
Notable findings — highest-concentration samples, formation-level patterns
Economic context — comparison to DLE thresholds, co-production potential
Caveats — data quality notes, sampling bias, version differences

Output Format

Example 1: Formation Query

## Marcellus Shale — Produced Water Lithium Analysis

| Statistic | Li (mg/L) | Mg (mg/L) | TDS (mg/L) |
|-----------|-----------|-----------|------------|
| N         | 847       | 1,023     | 1,156      |
| Min       | 0.1       | 12        | 1,200      |
| P25       | 28        | 580       | 85,000     |
| Median    | 62        | 1,250     | 145,000    |
| Mean      | 75        | 1,380     | 155,000    |
| P75       | 105       | 2,100     | 210,000    |
| Max       | 235       | 3,200     | 345,000    |

**Summary:** The Marcellus Shale produced water database contains 1,156
samples across WV, PA, OH, and NY. Lithium concentrations (n=847) have a
median of 62 mg/L with 25% of samples exceeding 105 mg/L — above or near
the DLE economic threshold of 100-150 mg/L. The highest Li value (235 mg/L)
was reported from a well in WV. Mg averages 1,380 mg/L, making co-recovery
viable in high-TDS wells. Note: Li data coverage is incomplete (73% of
samples have Li measured). Sampling is biased toward PA and WV where
Marcellus production is highest.

Example 2: State Query

## West Virginia — Produced Water Chemistry by Formation

| Formation | N | Li Median (mg/L) | Li Max (mg/L) | TDS Median (mg/L) |
|-----------|---|-------------------|----------------|---------------------|
| Marcellus | 312 | 78 | 235 | 165,000 |
| Utica | 45 | 92 | 148 | 195,000 |
| Oriskany | 28 | 35 | 88 | 120,000 |
| Clinton | 19 | 22 | 41 | 95,000 |
| Devonian Shale | 67 | 18 | 52 | 68,000 |

**Summary:** West Virginia produced waters span multiple Appalachian Basin
formations. The Marcellus and Utica formations show the highest lithium
potential, with median Li of 78 and 92 mg/L respectively. The top Utica
samples approach DLE-economic concentrations (148 mg/L max). TDS correlates
with Li across all formations. Data availability is strongest for the
Marcellus (n=312) reflecting modern shale gas development.

Error Handling

Issue	Cause	Action
ScienceBase item 404 / "not found"	Item ID changed or item secured	Search by title: `q=National+Produced+Waters+Geochemical+Database`
Download timeout	Large file (~66-80 MB)	Increase timeout; suggest download to local file first
CSV parsing errors	Encoding issues, embedded commas	Use `errors="replace"` encoding; check for quoted fields
Column not found	v2.3 vs v3.0 schema difference	v3.0 has FORMSIMPLE, PLAYTYPE; v2.3 uses FORMATION, WELLTYPE
Empty results for filter	Misspelled formation or state	Use FORMSIMPLE values from `references/formations.md`; STATE uses full names not abbreviations
-9999 values	Missing data sentinel	Exclude -9999 from all numeric calculations
Li/Mg column all blank	Analyte not measured for those samples	Report coverage (N with data / N total); note sparse coverage for trace elements
Excel file (v3.0) cannot be parsed	No openpyxl installed	Fall back to v2.3 CSV; or `pip install openpyxl`

Caveats

Sampling bias. The database is a compilation from published sources, not a statistically designed survey. Some formations and states are heavily overrepresented (Appalachian Basin, Texas) while others have few samples. Sample density does not reflect production volumes.
Temporal coverage. Samples span from the 1930s to 2020s. Older samples may use different analytical methods with lower precision. Comparison across eras requires caution.
Incomplete analyte coverage. Not all samples have all analytes measured. Lithium coverage is notably sparse in older datasets. Always report the N (number of non-null values) alongside statistics.
Approximate locations. Many samples have approximate coordinates (county centroid, field centroid, or state centroid). Check the COORDAPX column before using lat/lon for spatial analysis.
Units. All chemistry in v3.0 is mg/L (v3.0 converted all ppm using specific gravity). In v2.3, the UNITSORIG column indicates whether the original was mg/L or ppm.
TDS calculation. The TDS column is a "best available" value using measured TDS first, then reported-calculated TDS, then ion-sum-calculated TDS. Check TDSDESC for the method.
Formation name inconsistency. Use FORMSIMPLE (v3.0) for reliable formation-level grouping. The FORMATION column has many spelling variants and qualifiers that make grouping unreliable without normalization.
v2.3 vs v3.0 differences. v3.0 added 23 new datasets (especially high-Li brines), fixed location errors, and standardized formation names. For lithium research, v3.0 is strongly preferred. However, v3.0 is only available as Excel (not CSV), requiring openpyxl or similar to parse programmatically.
Self-reported industry data. Some source datasets are from industry reports. Quality control varies by source. The CHARGEBAL column (charge balance) can be used as a data quality indicator — values far from 0% suggest analytical errors.
Not a reserves estimate. Concentration data alone does not determine economic viability. Recovery depends on water volume, flow rates, brine chemistry complexity, infrastructure, and DLE technology efficiency.

Implementation Notes

Prefer bash_tool or python in Claude's environment to download and analyze
Use v2.3 CSV for scripted analysis (no dependencies beyond stdlib)
Use v3.0 Excel when openpyxl is available and you need FORMSIMPLE/PLAYTYPE columns
Cache downloads to /tmp/ to avoid re-downloading the 66-80 MB file
Python client — see references/python_example.py for ready-made filtering and statistics
Full column schema — see references/schema.md for all 150+ column definitions
Formation reference — see references/formations.md for Li/Mg ranges by formation
STATE uses full names (e.g., "West Virginia", not "WV")
Missing data is represented as empty string or -9999 — always exclude from calculations

usgs-produced-waters