spatial-transcriptomics-spatial-data-io
Spatial Transcriptomics Data I/O
Load spatial transcriptomics data from Visium, Xenium, MERFISH, Slide-seq, and other platforms using Squidpy and SpatialData. Read Space Ranger outputs, convert formats, and access spatial coordinates. Use when loading Visium, Xenium, MERFISH, or other spatial data.
What it does
- Takes your platform-specific spatial data directory (e.g., Space Ranger output) as input.
- Identifies and parses the correct platform format (Visium, Xenium, MERFISH, Slide-seq, CosMx, Stereo-seq).
- Reads the raw expression matrix (spot-level or single-cell resolution).
- Automatically loads and scales spatial coordinates (
x,ypixels or physical distances). - Ingests accompanying tissue images and spatial scale factors directly into the data object structure.
- Returns a standardized
AnnDataorSpatialDataobject, binding expression data, shapes, and images together for immediate downstream analysis.
Why this exists
- Uses
squidpyandspatialdata_io, the gold-standard parsers for spatial data. - Automatically handles platform-specific quirks (e.g., Xenium single-cell vs. Visium spot resolution).
- Safely nests coordinates in
obsm['spatial']and images/scale factors inuns['spatial']to ensure compatibility with all major spatial plotting tools. - Provides future-proof integration by offering
SpatialData(Zarr) representations for modern, multi-modal spatial projects.
Usage
"Load my Visium spatial data" → Read spatial transcriptomics outputs (Visium, Xenium, MERFISH, Slide-seq) into AnnData objects with spatial coordinates and tissue images.
- Python:
squidpy.read.visium('spaceranger_out/'),spatialdata.read_zarr()
Load and work with spatial transcriptomics data from various platforms.
Required Imports
import squidpy as sq
import scanpy as sc
import anndata as ad
import spatialdata as sd
import spatialdata_io as sdio
Load 10X Visium Data
Goal: Load Visium spatial transcriptomics data from Space Ranger output into an AnnData object.
Approach: Use Squidpy's read.visium to parse the output directory, which loads expression, spatial coordinates, and tissue images.
# Load Space Ranger output (standard method)
adata = sq.read.visium('path/to/spaceranger/output/')
print(f'Loaded {adata.n_obs} spots, {adata.n_vars} genes')
# Spatial coordinates are in adata.obsm['spatial']
print(f"Spatial coords shape: {adata.obsm['spatial'].shape}")
# Image is in adata.uns['spatial']
library_id = list(adata.uns['spatial'].keys())[0]
print(f'Library ID: {library_id}')
Load Visium with Scanpy
Goal: Load Visium data using Scanpy's built-in reader as an alternative to Squidpy.
Approach: Use sc.read_visium to parse Space Ranger output, then access images and scale factors from adata.uns['spatial'].
# Alternative using Scanpy directly
adata = sc.read_visium('path/to/spaceranger/output/')
# Access tissue image
img = adata.uns['spatial'][library_id]['images']['hires']
scale_factor = adata.uns['spatial'][library_id]['scalefactors']['tissue_hires_scalef']
Load 10X Xenium Data
Goal: Load single-cell resolution Xenium spatial data.
Approach: Use Squidpy's read.xenium to parse Xenium output, yielding per-cell expression and coordinates.
# Load Xenium output
adata = sq.read.xenium('path/to/xenium/output/')
print(f'Loaded {adata.n_obs} cells')
# Xenium has single-cell resolution
print(f"Cell coordinates: {adata.obsm['spatial'].shape}")
Load with SpatialData (Recommended for New Projects)
Goal: Load spatial data into SpatialData objects for unified multi-modal representation.
Approach: Use spatialdata-io readers per platform, which organize expression, shapes, and images into a single object.
import spatialdata_io as sdio
# Load Visium as SpatialData object
sdata = sdio.visium('path/to/spaceranger/output/')
print(sdata)
# Load Xenium
sdata = sdio.xenium('path/to/xenium/output/')
# Access components
table = sdata.tables['table'] # AnnData with expression
shapes = sdata.shapes # Spatial shapes (spots, cells)
images = sdata.images # Tissue images
Load MERFISH Data
Goal: Load MERFISH (Vizgen MERSCOPE) spatial data.
Approach: Use spatialdata-io or Squidpy readers to parse MERSCOPE output with cell-by-gene counts and metadata.
# MERFISH (Vizgen MERSCOPE)
sdata = sdio.merscope('path/to/merscope/output/')
# Or as AnnData
adata = sq.read.vizgen('path/to/vizgen/output/', counts_file='cell_by_gene.csv', meta_file='cell_metadata.csv')
Load Slide-seq Data
# Slide-seq / Slide-seqV2
adata = sq.read.slideseq('beads.csv', coordinates_file='coords.csv')
Load Nanostring CosMx
# CosMx spatial molecular imaging
sdata = sdio.cosmx('path/to/cosmx/output/')
Load Stereo-seq Data
# Stereo-seq (BGI)
sdata = sdio.stereoseq('path/to/stereoseq/output/')
Load from H5AD with Spatial Coordinates
# If you have h5ad with spatial already stored
adata = sc.read_h5ad('spatial_data.h5ad')
# Verify spatial data exists
if 'spatial' in adata.obsm:
print('Has spatial coordinates')
if 'spatial' in adata.uns:
print('Has image data')
Create Spatial AnnData from Scratch
Goal: Construct a spatial AnnData object from raw expression and coordinate arrays.
Approach: Build an AnnData with spatial coordinates in obsm['spatial'] and minimal metadata in uns['spatial'] for Squidpy compatibility.
import numpy as np
import pandas as pd
# Expression matrix
X = np.random.poisson(5, size=(1000, 500))
# Spatial coordinates
spatial_coords = np.random.rand(1000, 2) * 1000 # x, y in pixels
# Create AnnData
adata = ad.AnnData(X)
adata.obs_names = [f'spot_{i}' for i in range(1000)]
adata.var_names = [f'gene_{i}' for i in range(500)]
adata.obsm['spatial'] = spatial_coords
# Add minimal spatial metadata for Squidpy
adata.uns['spatial'] = {
'library_id': {
'scalefactors': {'tissue_hires_scalef': 1.0, 'spot_diameter_fullres': 50},
}
}
Access Spatial Coordinates
# Get coordinates as numpy array
coords = adata.obsm['spatial']
x_coords = coords[:, 0]
y_coords = coords[:, 1]
# Get coordinates as DataFrame
coord_df = pd.DataFrame(adata.obsm['spatial'], index=adata.obs_names, columns=['x', 'y'])
Access Tissue Images
# Get high-resolution image
library_id = list(adata.uns['spatial'].keys())[0]
hires_img = adata.uns['spatial'][library_id]['images']['hires']
lowres_img = adata.uns['spatial'][library_id]['images']['lowres']
# Scale factors
scalef = adata.uns['spatial'][library_id]['scalefactors']
print(f"Hires scale: {scalef['tissue_hires_scalef']}")
print(f"Spot diameter: {scalef['spot_diameter_fullres']}")
Convert Between Formats
Goal: Convert spatial data between SpatialData and AnnData representations.
Approach: Extract tables and coordinate arrays from SpatialData, then save as h5ad or zarr.
# SpatialData to AnnData
sdata = sdio.visium('path/to/data/')
adata = sdata.tables['table'].copy()
adata.obsm['spatial'] = np.array(sdata.shapes['spots'][['x', 'y']])
# Save as h5ad
adata.write_h5ad('spatial_converted.h5ad')
# Save SpatialData
sdata.write('spatial_data.zarr')
Load Multiple Samples
Goal: Load and merge spatial data from multiple Visium samples into a single AnnData.
Approach: Iterate over sample directories, tag each with a sample label, then concatenate with ad.concat.
# Load and concatenate multiple Visium samples
samples = ['sample1', 'sample2', 'sample3']
adatas = []
for sample in samples:
adata = sq.read.visium(f'data/{sample}/')
adata.obs['sample'] = sample
adatas.append(adata)
# Concatenate
adata_combined = ad.concat(adatas, label='sample', keys=samples)
print(f'Combined: {adata_combined.n_obs} spots')
Subset by Spatial Region
Goal: Extract spots within a rectangular spatial region of interest.
Approach: Apply coordinate-based boolean masking on obsm['spatial'] to filter spots by x/y bounds.
# Select spots in a rectangular region
x_min, x_max = 1000, 2000
y_min, y_max = 1500, 2500
coords = adata.obsm['spatial']
in_region = (coords[:, 0] >= x_min) & (coords[:, 0] <= x_max) & (coords[:, 1] >= y_min) & (coords[:, 1] <= y_max)
adata_region = adata[in_region].copy()
print(f'Selected {adata_region.n_obs} spots')
Example Output
Spatial Transcriptomics I/O
===========================
Input: path/to/spaceranger/output/
Platform detected: 10x Visium
Loaded AnnData Object:
n_obs (spots): 2,987
n_vars (genes): 31,053
Spatial Data Bindings:
Coordinates: adata.obsm['spatial'] -> (2987, 2)
Images: adata.uns['spatial']['library_id']['images'] -> ['hires', 'lowres']
Scale factors: adata.uns['spatial']['library_id']['scalefactors']
- tissue_hires_scalef: 0.150015
- spot_diameter_fullres: 89.43
Subsetting check:
Successfully filtered to bounding box (X: 1000-2000, Y: 1500-2500)
Selected 412 spots in region of interest.
Data ready for Squidpy spatial statistics or Scanpy plotting.
Requirements
| Requirement | Version |
|---|---|
| Python | 3.9+ |
| squidpy | 1.3+ |
| spatialdata | 0.1+ |
| spatialdata_io | latest |
| scanpy | 1.10+ |
| anndata | 0.10+ |
| numpy | 1.26+ |
| pandas | 2.2+ |
Inputs
| Name | Type | Format | Description |
|---|---|---|---|
| data_dir | directory | - | Path to the platform-specific output directory (e.g., Space Ranger out, Xenium out) |
| platform | string | - | Specify the platform (visium, xenium, merscope, slideseq, cosmx, stereoseq) |
Outputs
| Name | Type | Format | Description |
|---|---|---|---|
| adata | object | h5ad | AnnData object containing expression, spatial coordinates, and tissue images |
| sdata | object | zarr | SpatialData object for unified multi-modal representation |