pydicom
Pydicom - Medical Imaging Standards
DICOM is more than an image; it's a rich data structure containing patient info, spatial orientation, and pixel data. Pydicom provides access to all these tags.
When to Use
- Processing medical imaging data (CT, MRI, X-ray, ultrasound).
- Extracting patient metadata and clinical information from DICOM files.
- Building AI models for radiology that require both image and metadata.
- Converting DICOM to other formats for analysis.
- Quality assurance and compliance checking in medical imaging workflows.
Core Principles
Datasets as Dicts
Access tags by name (e.g., ds.PatientName) or ID (ds[0x0010, 0x0010]).
Pixel Data
Raw pixel data is stored in PixelData, but should be accessed via pixel_array for NumPy integration.
VR (Value Representation)
Strict typing for dates, ages, and decimals ensures data integrity.
Quick Reference
Standard Imports
import pydicom
from pydicom.data import get_testdata_files
import matplotlib.pyplot as plt
import numpy as np
Basic Patterns
# 1. Read file
ds = pydicom.dcmread("scan.dcm")
# 2. Access Metadata
print(f"Patient: {ds.PatientName}, ID: {ds.PatientID}")
print(f"Modality: {ds.Modality}") # CT, MR, DX
print(f"Study Date: {ds.StudyDate}")
print(f"Slice Thickness: {ds.SliceThickness}")
# 3. Access Image
plt.imshow(ds.pixel_array, cmap="gray")
plt.title(f"{ds.Modality} - {ds.PatientName}")
Critical Rules
✅ DO
- Use pixel_array property - Always access pixel data via
ds.pixel_arrayrather thands.PixelDatafor proper NumPy integration. - Check for missing tags - Use
hasattr(ds, 'TagName')before accessing optional tags. - Respect patient privacy - DICOM files contain PHI (Protected Health Information). Always anonymize before sharing.
- Handle different photometric interpretations - Some images may be inverted or use different color spaces.
❌ DON'T
- Don't modify DICOM files in place - Always create a copy when modifying to preserve original data.
- Don't ignore VR types - DICOM has strict data types. Converting incorrectly can corrupt data.
- Don't assume all DICOM files have images - Some contain only metadata (structured reports).
Advanced Patterns
Working with DICOM Series
import pydicom
from pathlib import Path
# Load a series of DICOM files
dicom_dir = Path("dicom_series")
files = sorted(dicom_dir.glob("*.dcm"))
# Load and stack slices
slices = [pydicom.dcmread(f) for f in files]
volume = np.stack([s.pixel_array for s in slices])
Anonymization
# Remove patient identifiers
ds.PatientName = "ANONYMOUS"
ds.PatientID = "000000"
ds.PatientBirthDate = ""
ds.PatientSex = ""
Pydicom is the foundation of medical imaging in Python, enabling researchers and clinicians to work with the rich, standardized DICOM format that powers modern radiology.
More from tondevrel/scientific-agent-skills
xgboost-lightgbm
Industry-standard gradient boosting libraries for tabular data and structured datasets. XGBoost and LightGBM excel at classification and regression tasks on tables, CSVs, and databases. Use when working with tabular machine learning, gradient boosting trees, Kaggle competitions, feature importance analysis, hyperparameter tuning, or when you need state-of-the-art performance on structured data.
193opencv
Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task.
142matplotlib-pro
Professional sub-skill for Matplotlib focused on high-performance animations, complex multi-figure layouts (GridSpec), interactive widgets, and publication-ready typography (LaTeX/PGF).
31seaborn
A Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Great for exploring relationships between variables and visualizing distributions. Use for statistical data visualization, exploratory data analysis (EDA), relationship plots, distribution plots, categorical comparisons, regression visualization, heatmaps, cluster maps, and creating publication-quality statistical graphics from Pandas DataFrames.
29shapely
Manipulation and analysis of planar geometric objects. Based on the widely deployed GEOS library. Provides data structures for points, curves, and surfaces, and standardized algorithms for geometric operations. Use for 2D geometry operations, spatial relationships, set-theoretic operations (intersection, union, difference), point-in-polygon queries, geometric calculations (area, distance, centroid), buffering, simplifying geometries, linear referencing, and cleaning invalid geometries. Essential for GIS operations, spatial analysis, and geometric computations.
29numba
A Just-In-Time (JIT) compiler for Python that translates a subset of Python and NumPy code into fast machine code. Developed by Anaconda, Inc. Highly effective for accelerating loops, custom mathematical functions, and complex numerical algorithms. Use for @njit, @vectorize, prange, cuda.jit, numba.typed, JIT compilation, parallel loops, GPU acceleration with CUDA, Monte Carlo simulations, numerical algorithms, and high-performance Python computing.
27