Pyopenms

Advanced PyOpenMS automation and integration for mass spectrometry data processing

Source: K-Dense-AI/claude-scientific-skills

PyOpenMS is a community skill for mass spectrometry data analysis using the pyOpenMS Python bindings, covering spectrum processing, peak detection, feature finding, peptide identification, and quantification workflows for proteomics and metabolomics research.

What Is This?

Overview

PyOpenMS provides tools for processing and analyzing mass spectrometry data through Python bindings to the OpenMS C++ library. It covers spectrum processing that reads, filters, and transforms raw mass spectra from instrument files, peak detection that identifies and centroids peaks from profile mode spectral data, feature finding that groups related peaks across retention time and mass dimensions into quantifiable features, peptide identification that matches experimental spectra against protein sequence databases, and quantification workflows that measure analyte abundances across samples. The skill enables researchers to build reproducible, scriptable mass spectrometry analysis pipelines in Python that can be version-controlled and shared across teams.

Who Should Use This

This skill serves proteomics researchers processing LC-MS/MS data for protein identification, metabolomics scientists analyzing small molecule mass spectra, and bioinformaticians building automated mass spectrometry analysis pipelines.

Why Use It?

Problems It Solves

Mass spectrometry vendor software uses proprietary formats requiring specialized tools for cross-platform data access. Manual peak picking and feature extraction from raw spectra is time-consuming and irreproducible. Peptide identification from tandem mass spectra requires database search algorithms with statistical validation. Quantitative comparison across multiple LC-MS runs needs alignment and normalization to correct for technical variation, particularly when comparing samples acquired across different instrument sessions or laboratories.

Core Highlights

Spectrum reader loads mass spectrometry data from mzML and vendor formats. Peak picker detects and centroids peaks from raw profile spectra. Feature finder groups peaks into quantifiable features across retention time. Database searcher matches spectra against protein sequences for identification.

How to Use It?

Basic Usage

from pyopenms import (
  MSExperiment,
  MzMLFile,
  PeakPickerHiRes)

exp = MSExperiment()
MzMLFile().load(
  'sample.mzML', exp)

print(
  f'Spectra: '
  f'{exp.size()}')

spec = exp[0]
mzs, ints = (
  spec.get_peaks())
print(
  f'Peaks: '
  f'{len(mzs)}')
print(
  f'RT: '
  f'{spec.getRT():.2f}s')
print(
  f'MS level: '
  f'{spec.getMSLevel()}')

picker = PeakPickerHiRes()
picked = MSExperiment()
picker.pickExperiment(
  exp, picked)
print(
  f'Picked spectra: '
  f'{picked.size()}')

Real-World Examples

from pyopenms import (
  MSExperiment,
  MzMLFile,
  FeatureFinder,
  FeatureMap)

class MSFeatureFinder:
  def __init__(
    self,
    mzml_path: str
  ):
    self.exp = (
      MSExperiment())
    MzMLFile().load(
      mzml_path,
      self.exp)

  def find_features(
    self,
    algorithm: str
      = 'centroided'
  ) -> list[dict]:
    ff = FeatureFinder()
    features = (
      FeatureMap())
    seeds = (
      FeatureMap())
    params = ff\
      .getParameters(
        algorithm)
    ff.run(
      algorithm,
      self.exp,
      features,
      params, seeds)
    result = []
    for f in features:
      result.append({
        'mz': f.getMZ(),
        'rt': f.getRT(),
        'intensity':
          f.getIntensity(),
        'charge':
          f.getCharge(),
        'quality':
          f
          .getOverallQuality()})
    return result

  def filter_features(
    self,
    features:
      list[dict],
    min_intensity:
      float = 1000.0
  ) -> list[dict]:
    return [
      f for f in
      features
      if f['intensity']
      >= min_intensity]

Advanced Tips

Use the MapAlignmentAlgorithm to align retention times across multiple LC-MS runs before quantitative comparison. Apply noise filtering with GaussFilter or SavitzkyGolayFilter before peak picking to improve detection sensitivity. Export processed data to pandas DataFrames for downstream statistical analysis and visualization. When processing large sample cohorts, consider writing intermediate results to disk using FeatureXMLFile to avoid reprocessing and reduce memory pressure.

When to Use It?

Use Cases

Process raw LC-MS/MS data through peak picking and feature detection for untargeted metabolomics analysis. Build a peptide identification pipeline that searches tandem mass spectra against a protein database. Align and quantify features across multiple samples for differential abundance analysis.

Important Notes

Requirements

PyOpenMS Python package with compiled OpenMS C++ bindings. Mass spectrometry data in mzML format or converted from vendor formats using tools such as MSConvert. Protein sequence database in FASTA format for peptide identification searches.

Usage Recommendations

Do: convert vendor-specific raw files to mzML format before processing for maximum compatibility. Apply centroiding to profile mode data before feature finding since most algorithms expect centroided input. Validate feature detection results by inspecting extracted ion chromatograms for key analytes.

Don't: skip spectrum filtering before peak picking since noise in raw data produces false positive peak detections. Use default parameters without tuning for your specific instrument and acquisition method. Compare feature intensities across runs without retention time alignment and normalization.

Limitations

OpenMS installation can be complex due to compiled C++ dependencies across different platforms. Processing large LC-MS datasets consumes significant memory when loading all spectra simultaneously. Some advanced OpenMS algorithms are not yet exposed through the Python bindings.

More Skills You Might Like

Explore similar skills to enhance your workflow