netl-edx
NETL Energy Data eXchange (EDX)
Searches and retrieves datasets from the DOE NETL Energy Data eXchange (EDX)
using its CKAN v3 REST API at https://edx.netl.doe.gov/api/3/.
EDX hosts thousands of datasets from NETL and DOE-funded research including critical minerals characterization (ClaiMM), produced water geochemistry (NEWTS), carbon storage, unconventional resources, and more.
API Key Handling
Resolution order (stop at first success):
~/.config/netl-edx/credentials(default) -- parseapi_key=VALUEfrom this fileNETL_EDX_API_KEYenv var -- fallback if credentials file is absent- User-provided in conversation -- fallback if neither above is set
- Prompt the user -- "Please provide your NETL EDX API key. Get one free at https://edx.netl.doe.gov/ -- sign up, then go to your profile page and hover to reveal your API key. Store it in
~/.config/netl-edx/credentialsasapi_key=YOUR_KEY(chmod 600)."
Never hardcode or log the key. Pass it as a request header.
Note on auth: Most read-only operations (package_search, package_show, resource_search, tag_list, resource downloads) work without a key for public datasets. A key is required for write operations and private datasets. Always include the key when available to avoid edge cases.
Reading the credentials file (bash):
KEY=$(grep '^api_key=' ~/.config/netl-edx/credentials 2>/dev/null | cut -d= -f2)
[ -z "$KEY" ] && KEY="${NETL_EDX_API_KEY}"
Reading the credentials file (Python):
from pathlib import Path
import os
def resolve_edx_key() -> str | None:
creds = Path.home() / ".config" / "netl-edx" / "credentials"
if creds.exists():
for line in creds.read_text().splitlines():
if line.strip().startswith("api_key="):
return line.strip().removeprefix("api_key=")
return os.environ.get("NETL_EDX_API_KEY")
Key header options (any one works):
EDX-API-Key: YOUR_KEYX-CKAN-API-Key: YOUR_KEYAuthorization: YOUR_KEY
API Structure
Base URL: https://edx.netl.doe.gov/api/3/action/
EDX runs a CKAN v3 instance. All responses are JSON with this envelope:
{
"success": true,
"result": { ... }
}
On error:
{
"success": false,
"error": { "__type": "Not Found Error", "message": "..." }
}
Key actions:
| Action | Method | Description |
|---|---|---|
package_search |
GET | Search datasets by keyword, tags, groups |
package_show |
GET | Get full metadata for a single dataset |
resource_search |
GET | Search resources (files) across all datasets |
group_list |
GET | List available groups/collections |
group_show |
GET | Get group details with its datasets |
tag_list |
GET | List all tags or filter by prefix |
organization_list |
GET | List publishing organizations |
See references/ckan_api.md for full parameter reference and response examples.
package_search Parameters
| Parameter | Example | Notes |
|---|---|---|
q |
q=lithium+produced+water |
Full-text Solr query |
fq |
fq=groups:claimm-datasets |
Filter query (Solr field:value) |
rows |
rows=20 |
Results per page (default 10, max 1000) |
start |
start=0 |
Pagination offset |
sort |
sort=metadata_modified+desc |
Sort field + direction |
facet.field |
facet.field=["groups"] |
Facet on groups, tags, etc. |
Solr Query Syntax
q=lithium AND "produced water" # Boolean + phrase
q=lithium OR magnesium # OR logic
fq=groups:claimm-datasets # Filter by group
fq=tags:"Critical Minerals" # Filter by tag
fq=metadata_modified:[2024-01-01T00:00:00Z TO *] # Date range
Key Collections
EDX organizes datasets into groups. These are the most relevant for PNGE research:
ClaiMM (Critical Minerals and Materials)
The ClaiMM platform is the primary collection for critical minerals data on EDX.
| Group | Slug | Datasets | Focus |
|---|---|---|---|
| ClaiMM Datasets | claimm-datasets |
~188 | All critical minerals datasets |
| ClaiMM Geochemistry | claimm-geochemistry |
~14 | Brine/water geochemistry |
| ClaiMM Mine Waste | claimm-mine-waste |
~11 | Produced waters, tailings, coal ash |
| ClaiMM Databases | claimm-databases |
~3 | Large compiled databases |
| ClaiMM Tools | claimm-tools |
~2 | Software tools (e.g., CM3 Matchmaker) |
Example -- search ClaiMM for lithium data:
curl -s "https://edx.netl.doe.gov/api/3/action/package_search?q=lithium&fq=groups:claimm-datasets&rows=10" \
-H "EDX-API-Key: $NETL_EDX_API_KEY"
NEWTS (National Energy Water Treatment and Speciation)
Water quality data from energy processes -- produced water, flowback, AMD.
| Group | Slug | Datasets | Focus |
|---|---|---|---|
| NEWTS Database | newts |
~12 | Produced water geochemistry, treatment data |
Other Relevant Groups
| Group | Slug | Datasets |
|---|---|---|
| Onshore Storage | onshore-storage |
~22 |
| Offshore Storage | offshore-storage |
~3 |
| Appalachian Basin | appalachian-basin-data-group |
~3 |
| Water Data | water-data-group |
~7 |
| Unconventional Resources | unconventional-resources |
~4 |
| LCA Unit Process Library | lca-unit-process-library |
~12 |
| Archived KMD | archived-kmd |
~630 |
Workflow
Step 1 -- Resolve Intent
Map the user's question to:
- A search query (
q=keyword terms) - An optional group filter (
fq=groups:GROUP_SLUG) - Whether they want to discover datasets, get details for a known dataset, or download a resource
Common intent mappings:
| User says | Action | Parameters |
|---|---|---|
| "Find NETL lithium data" | package_search |
q=lithium |
| "ClaiMM produced water datasets" | package_search |
q=produced+water&fq=groups:claimm-datasets |
| "What's in dataset X?" | package_show |
id=dataset-slug |
| "Download the CSV from that dataset" | Get resource URL from package_show |
Direct download |
| "NEWTS geochemistry data" | package_search |
q=geochemistry&fq=groups:newts |
| "Critical minerals datasets from DOE" | package_search |
q=critical+minerals&fq=groups:claimm-datasets |
| "What groups exist on EDX?" | group_list |
(no params) |
Step 2 -- Search Datasets
# Search with keyword
curl -s "https://edx.netl.doe.gov/api/3/action/package_search?q=lithium+produced+water&rows=10" \
-H "EDX-API-Key: $NETL_EDX_API_KEY"
Parse the response:
result.count-- total matching datasetsresult.results[]-- array of dataset objects- Each dataset has:
name,title,notes(description),num_resources,resources[],tags[],groups[],extras[]
Step 3 -- Get Dataset Details
For a specific dataset, use package_show to get full metadata including all resources:
curl -s "https://edx.netl.doe.gov/api/3/action/package_show?id=DATASET_SLUG" \
-H "EDX-API-Key: $NETL_EDX_API_KEY"
Key fields in the response:
title-- human-readable namenotes-- full description / abstractresources[]-- list of files withname,format,size,urlextras[]-- key-value pairs includingcitation,doi,program_or_projecttags[]-- subject tagsgroups[]-- collection memberships
Step 4 -- Download Resources
Resources have direct download URLs in resources[].url:
# Direct download (public resources work without auth)
curl -o output.csv "https://edx.netl.doe.gov/storage/f/edx/2024/.../filename.csv"
# With auth header (for private resources or reliability)
curl -H "EDX-API-Key: $NETL_EDX_API_KEY" \
-o output.csv "https://edx.netl.doe.gov/storage/f/edx/2024/.../filename.csv"
Large file warning: Some datasets contain files over 100 MB. Check resources[].size
before downloading and warn the user.
Step 5 -- Produce Output
Format: Dataset Table + Resource Details + Narrative
Present a markdown table of matching datasets (cap at ~15 rows), then detail the most relevant dataset's resources, followed by a narrative summary.
Example output structure:
## EDX: Lithium in Produced Water Datasets (36 matches)
| Dataset | Resources | Modified | Groups |
|---------|-----------|----------|--------|
| PA DEP 26r Produced Water Compositions | 19 | 2025-01-12 | ClaiMM, NEWTS |
| Lithium Geochemistry Marcellus PW | 4 | 2024-05-31 | ClaiMM, NEWTS |
| ... | ... | ... | ... |
### Top Result: PA DEP 26r Produced Water Compositions
19 resources (CSV, XLSX, PDF):
| File | Format | Size |
|------|--------|------|
| PA_DEP_26r_processed.csv | CSV | 648 KB |
| PA_OLI_processed_reordered.xlsx | XLSX | 212 KB |
| README_PA_DEP_26r_Produced_Water.pdf | PDF | 187 KB |
| ... | ... | ... |
**Summary:** Found 36 datasets matching "lithium produced water" on NETL EDX.
The most relevant is the PA DEP 26r dataset containing geochemical compositions
of 1000+ produced water streams from Marcellus wells in Pennsylvania, formatted
for OLI Studio and Geochemist Workbench modeling. Data covers mid-2012 to
early-2020. Additional ML-predicted values are included via CoDaRT software.
**Citation:** Mackey et al., DOI: 10.18141/2483335
**Download:** Files can be accessed directly at the URLs above.
No API key required for public resources.
Pagination
For large result sets, paginate using start and rows:
start = 0
rows = 100
all_results = []
while True:
data = search(q="lithium", start=start, rows=rows)
all_results.extend(data["result"]["results"])
start += rows
if start >= data["result"]["count"]:
break
Warn the user if the result count exceeds 100 and ask if they want to narrow the search or paginate through all results.
Error Handling
| HTTP Code | CKAN Error | Action |
|---|---|---|
200 + success: false |
Validation Error | Check parameter names and values; fix and retry |
200 + success: false |
Not Found Error | Dataset or group slug is wrong; search for correct name |
| 403 | Authorization Error | API key missing or invalid; prompt user to verify key |
| 404 | Not Found | Action endpoint does not exist; check CKAN action name |
| 500 | Internal Server Error | Server-side issue; retry once, then try simpler query |
| Connection error | Timeout / DNS | EDX may be down; wait and retry |
Known quirks:
group_listwithall_fields=truereturns a 500 error on EDX. Usegroup_showwithinclude_datasets=trueinstead to get group details.- Some dataset slugs are very long; use the
id(UUID) as an alternative. - The
resource_searchqueryparameter usesfield:valuesyntax, not free text.
Caveats
- API key recommended: While most reads work without auth, some edge cases (rate limiting, private datasets) require a key. Always include it when available.
- Data formats vary: Resources include CSV, XLSX, PDF, GeoJSON, shapefiles, netCDF, and custom formats. Not all are machine-parseable.
- Large files: Some datasets contain multi-gigabyte files. Always check
resources[].sizebefore downloading. - No real-time data: EDX is a repository of research data products, not a live data feed. Datasets are published and occasionally updated.
- Quality varies: Data quality depends on the submitting research team. Check the README/documentation resources within each dataset.
- Citation required: Most datasets have citation requirements in the
extrasarray. Always surface the citation when presenting data. - ClaiMM focus: The ClaiMM platform is actively maintained and growing. Dataset counts are approximate and increase over time.
Implementation Notes
- Prefer
bash_toolwithcurl+jqfor API calls in Claude's environment - Python client -- see
references/python_example.py(stdlib only:urllib,json,pathlib) - CKAN API reference -- see
references/ckan_api.mdfor full action/parameter documentation - All responses are JSON; use
jqfor bash orjsonmodule for Python - Resource download URLs are stable and can be shared directly with users
- EDX does not have a formal rate limit, but be courteous with request volume