Deeptools
Automate and integrate deeptools for powerful genomic data analysis pipelines
deepTools is a community skill for analyzing high-throughput sequencing data using the deepTools suite, covering BAM file processing, bigWig signal generation, heatmap visualization, correlation analysis, and quality control for genomics workflows.
What Is This?
Overview
deepTools provides patterns for processing and visualizing next-generation sequencing data from ChIP-seq, ATAC-seq, and RNA-seq experiments. It covers BAM file quality assessment including fragment size distribution and GC bias analysis, signal normalization and bigWig generation for genome browser visualization, reference point and scale-region heatmaps around genomic features, sample correlation matrices for replicate and condition comparisons, and multiBamSummary computation for genome-wide signal quantification. The skill enables bioinformaticians to build reproducible analysis pipelines for epigenomic and transcriptomic data.
Who Should Use This
This skill serves bioinformaticians analyzing ChIP-seq and ATAC-seq datasets, genomics researchers visualizing enrichment patterns around genes and regulatory elements, and core facility staff building quality control pipelines for sequencing experiments.
Why Use It?
Problems It Solves
Visualizing sequencing signal around genomic features requires binning and normalization that manual scripts implement inconsistently. Comparing signal between samples needs normalization methods that account for sequencing depth and library complexity. Quality control of BAM files requires multiple metrics that are tedious to compute individually. Generating publication-ready heatmaps from raw alignment data involves many intermediate processing steps.
Core Highlights
bamCoverage generates normalized bigWig files from BAM alignments with RPKM, CPM, and BPM options. computeMatrix builds signal matrices around reference points or scaled regions for heatmap plotting. plotHeatmap creates publication-quality enrichment heatmaps with clustering. plotCorrelation computes and visualizes sample similarity matrices.
How to Use It?
Basic Usage
import subprocess
import os
def bam_to_bigwig(bam_path: str,
output_path: str,
normalize: str = "RPKM",
bin_size: int = 10) -> str:
cmd = ["bamCoverage",
"-b", bam_path,
"-o", output_path,
"--normalizeUsing", normalize,
"--binSize", str(bin_size),
"-p", "4"]
subprocess.run(cmd, check=True)
return output_path
def compute_matrix(bigwigs: list[str],
bed_file: str,
output: str,
mode: str = "reference-point",
upstream: int = 3000,
downstream: int = 3000) -> str:
cmd = ["computeMatrix", mode,
"-S"] + bigwigs + [
"-R", bed_file,
"-o", output,
"-a", str(downstream),
"-b", str(upstream),
"-p", "4"]
subprocess.run(cmd, check=True)
return output
bam_to_bigwig("sample.bam", "sample.bw")
compute_matrix(["sample.bw"], "genes.bed",
"matrix.gz")Real-World Examples
import subprocess
class ChIPseqPipeline:
def __init__(self, output_dir: str):
self.output_dir = output_dir
os.makedirs(output_dir, exist_ok=True)
def generate_heatmap(
self, bigwigs: list[str],
regions: str,
labels: list[str]) -> str:
matrix = os.path.join(
self.output_dir, "matrix.gz")
cmd = ["computeMatrix", "reference-point",
"-S"] + bigwigs + [
"-R", regions,
"-o", matrix,
"-a", "3000", "-b", "3000"]
subprocess.run(cmd, check=True)
plot = os.path.join(
self.output_dir, "heatmap.png")
cmd = ["plotHeatmap",
"-m", matrix,
"-o", plot,
"--samplesLabel"] + labels + [
"--colorMap", "RdBu_r"]
subprocess.run(cmd, check=True)
return plot
def sample_correlation(
self, bam_files: list[str],
labels: list[str]) -> str:
summary = os.path.join(
self.output_dir, "summary.npz")
cmd = ["multiBamSummary", "bins",
"-b"] + bam_files + [
"-o", summary,
"--binSize", "10000"]
subprocess.run(cmd, check=True)
plot = os.path.join(
self.output_dir, "correlation.png")
cmd = ["plotCorrelation",
"-in", summary,
"-o", plot,
"--corMethod", "pearson",
"--labels"] + labels
subprocess.run(cmd, check=True)
return plotAdvanced Tips
Use the effectiveGenomeSize parameter in bamCoverage for accurate normalization of mappable regions. Combine multiple region files in computeMatrix to compare signal across different genomic feature classes. Run plotFingerprint before analysis to assess ChIP enrichment quality and identify failed experiments.
When to Use It?
Use Cases
Build a ChIP-seq quality control pipeline that assesses enrichment and generates correlation plots between replicates. Create a promoter signal visualization that displays histone modification patterns around transcription start sites. Implement a differential binding analysis workflow that compares ChIP-seq signals between conditions.
Related Topics
ChIP-seq analysis, ATAC-seq processing, epigenomics, genome browser visualization, and next-generation sequencing quality control.
Important Notes
Requirements
deepTools installed via pip or conda. Sorted and indexed BAM files as input. BED files defining genomic regions of interest for heatmap generation.
Usage Recommendations
Do: normalize bigWig files with appropriate methods for the experiment type, using RPKM for ChIP-seq and CPM for ATAC-seq. Index BAM files before running deepTools commands. Use consistent bin sizes across samples for fair signal comparison.
Don't: compare bigWig files generated with different normalization methods. Use reference-point mode for regions with highly variable lengths where scale-region mode is appropriate. Skip quality control steps like plotFingerprint that identify problematic samples early.
Limitations
deepTools operates on aligned reads and does not perform read mapping. Heatmap rendering time scales with the number of regions and samples. Some normalization methods require knowing the effective genome size for the reference assembly.
More Skills You Might Like
Explore similar skills to enhance your workflow
Chatbotkit Automation
Automate Chatbotkit operations through Composio's Chatbotkit toolkit
Mamba
Streamline Mamba state space model implementation and automated training integration
Ffmpeg
Automate and integrate FFmpeg for powerful audio and video processing and conversion
Agent Workflow Designer
Agent Workflow Designer automation and integration
Transformers
Automate and integrate Hugging Face Transformers for powerful NLP and AI model workflows
Codacy Automation
Automate Codacy operations through Composio's Codacy toolkit via Rube MCP