Pydicom
Specialized Pydicom automation and integration for medical imaging data processing
Category: productivity Source: K-Dense-AI/claude-scientific-skillsPyDICOM is a community skill for reading and manipulating medical imaging data in DICOM format using the pydicom Python library, covering file reading, metadata extraction, pixel data access, anonymization, and series management for medical image processing.
What Is This?
Overview
PyDICOM provides tools for working with DICOM medical imaging files that store both image data and patient metadata in a standardized format. It covers file reading that loads DICOM datasets from disk with lazy pixel data loading for memory efficiency, metadata extraction that accesses patient demographics, study descriptions, and acquisition parameters from DICOM tags, pixel data access that converts stored pixel arrays into NumPy arrays for image processing, anonymization that removes or modifies patient-identifying information for research sharing, and series management that organizes multiple DICOM files into coherent imaging series. The skill enables developers to process medical images programmatically.
Who Should Use This
This skill serves medical imaging researchers processing DICOM datasets for analysis, clinical informatics teams building imaging data pipelines, and developers creating medical image viewing or processing applications.
Why Use It?
Problems It Solves
DICOM files contain complex nested metadata structures that are difficult to parse without specialized libraries. Medical images stored in DICOM format require pixel value transformations using rescale slope and intercept before display or analysis. Patient-identifying information embedded in DICOM headers must be removed before sharing data for research. Imaging series spanning hundreds of individual DICOM files need organized loading and sorting.
Core Highlights
File reader loads DICOM datasets with efficient lazy pixel data access. Tag extractor retrieves metadata from nested DICOM data elements. Pixel converter transforms stored values into calibrated arrays for analysis. Anonymizer removes patient-identifying information from DICOM headers.
How to Use It?
Basic Usage
import pydicom
import numpy as np
ds = pydicom.dcmread(
'scan.dcm')
print(
f'Patient: '
f'{ds.PatientName}')
print(
f'Modality: '
f'{ds.Modality}')
print(
f'Study: '
f'{ds.StudyDescription}')
print(
f'Size: '
f'{ds.Rows}x'
f'{ds.Columns}')
pixels = (
ds.pixel_array)
slope = float(
getattr(
ds, 'RescaleSlope',
1))
intercept = float(
getattr(
ds, 'RescaleIntercept',
0))
calibrated = (
pixels * slope
+ intercept)
wc = float(
ds.WindowCenter)
ww = float(
ds.WindowWidth)
low = wc - ww / 2
high = wc + ww / 2
display = np.clip(
calibrated,
low, high)
Real-World Examples
import pydicom
from pathlib import Path
class DicomAnonymizer:
TAGS_TO_REMOVE = [
'PatientName',
'PatientID',
'PatientBirthDate',
'InstitutionName',
'ReferringPhysician'
'Name']
def anonymize(
self,
src: str,
dst: str,
new_id: str
):
ds = pydicom.dcmread(
src)
for tag in (
self.TAGS_TO_REMOVE
):
if hasattr(
ds, tag
):
setattr(
ds, tag,
'ANONYMIZED')
ds.PatientID = (
new_id)
ds.save_as(dst)
def batch_anonymize(
self,
src_dir: str,
dst_dir: str,
prefix: str
):
src_path = Path(
src_dir)
dst_path = Path(
dst_dir)
dst_path.mkdir(
exist_ok=True)
for i, f in enumerate(
src_path.glob(
'*.dcm')
):
new_id = (
f'{prefix}_{i:04d}')
self.anonymize(
str(f),
str(dst_path
/ f.name),
new_id)
Advanced Tips
Use dcmread with stop_before_pixels for metadata-only operations to avoid loading large pixel arrays into memory. Sort multi-file series by ImagePositionPatient or InstanceNumber to reconstruct correct spatial ordering for volumetric analysis. Apply transfer syntax handlers for compressed DICOM files that use JPEG or JPEG2000 encoding.
When to Use It?
Use Cases
Extract patient metadata and acquisition parameters from a collection of DICOM files for a research database. Anonymize medical imaging data by removing identifying information before sharing with collaborators. Load CT scan slices and reconstruct a 3D volume array for computational analysis.
Related Topics
pydicom, DICOM, medical imaging, radiology, image processing, clinical informatics, and healthcare data.
Important Notes
Requirements
pydicom Python package for DICOM file reading and manipulation. NumPy for pixel array operations and image processing. Optional GDCM or Pillow for decompressing compressed transfer syntax DICOM files.
Usage Recommendations
Do: always apply RescaleSlope and RescaleIntercept to pixel data before analysis since stored values are not calibrated units. Use lazy loading with stop_before_pixels when processing metadata from large imaging datasets. Verify anonymization by checking all DICOM tags since patient information can appear in unexpected fields.
Don't: assume pixel data represents calibrated values directly since DICOM stores raw detector values that need transformation. Modify original DICOM files in place since this destroys the source data without a backup. Skip transfer syntax handling since compressed files produce corrupted pixel arrays without proper decompression.
Limitations
Compressed DICOM files require additional packages like GDCM for decompression. Large imaging series consume significant memory when loading all pixel data simultaneously. DICOM standard complexity means some vendor-specific private tags may not be parsed automatically.