Pymc

Advanced PyMC automation and integration for Bayesian statistical modeling and inference

Source: K-Dense-AI/claude-scientific-skills

PyMC is a community skill for Bayesian statistical modeling using the PyMC Python library, covering probabilistic model definition, MCMC sampling, prior specification, posterior analysis, and model comparison for Bayesian inference.

What Is This?

Overview

PyMC provides tools for building and fitting Bayesian statistical models using Markov Chain Monte Carlo methods. It covers probabilistic model definition that specifies relationships between parameters and observed data using probability distributions, MCMC sampling that draws from posterior distributions using the NUTS algorithm for efficient exploration, prior specification that encodes domain knowledge as probability distributions on model parameters, posterior analysis that summarizes and visualizes parameter estimates with credible intervals, and model comparison that evaluates competing models using information criteria. The skill enables statisticians to fit complex Bayesian models in Python.

Who Should Use This

This skill serves statisticians building hierarchical and regression models with uncertainty quantification, data scientists who need credible intervals rather than point estimates, and researchers fitting domain-specific probabilistic models to experimental data.

Why Use It?

Problems It Solves

Frequentist methods provide point estimates without natural uncertainty quantification for parameters. Complex hierarchical models with nested group effects are difficult to fit with standard maximum likelihood methods. Incorporating prior domain knowledge into statistical models requires Bayesian frameworks that standard tools do not provide. Model comparison based solely on p-values does not account for model complexity or predictive performance.

Core Highlights

Model builder defines probabilistic relationships using intuitive distribution syntax. NUTS sampler explores posterior distributions efficiently for continuous parameters. Prior encoder translates domain knowledge into informative parameter distributions. Diagnostic toolkit assesses sampling convergence and identifies mixing problems.

How to Use It?

Basic Usage

import pymc as pm
import numpy as np
import arviz as az

np.random.seed(42)
X = np.random.randn(
  100)
y = 2.5 * X + 1.0 + (
  np.random.randn(100)
  * 0.5)

with pm.Model() as model:
  alpha = pm.Normal(
    'alpha',
    mu=0, sigma=10)
  beta = pm.Normal(
    'beta',
    mu=0, sigma=10)
  sigma = pm.HalfNormal(
    'sigma', sigma=1)

  mu = alpha + (
    beta * X)
  likelihood = (
    pm.Normal(
      'y_obs',
      mu=mu,
      sigma=sigma,
      observed=y))

  trace = pm.sample(
    2000,
    tune=1000,
    cores=2)

print(
  az.summary(trace))

Real-World Examples

import pymc as pm
import numpy as np

class HierarchicalModel:
  def __init__(
    self,
    groups: np.ndarray,
    X: np.ndarray,
    y: np.ndarray
  ):
    self.groups = groups
    self.X = X
    self.y = y
    self.n_groups = len(
      np.unique(groups))

  def build(self):
    with pm.Model() as m:
      mu_a = pm.Normal(
        'mu_a',
        mu=0, sigma=5)
      sigma_a = (
        pm.HalfNormal(
          'sigma_a',
          sigma=2))
      alpha = pm.Normal(
        'alpha',
        mu=mu_a,
        sigma=sigma_a,
        shape=(
          self.n_groups))
      beta = pm.Normal(
        'beta',
        mu=0, sigma=5)
      sigma = (
        pm.HalfNormal(
          'sigma',
          sigma=1))
      mu = alpha[
        self.groups
      ] + beta * self.X
      pm.Normal(
        'obs',
        mu=mu,
        sigma=sigma,
        observed=
          self.y)
    self.model = m
    return m

  def fit(
    self,
    draws: int = 2000
  ):
    with self.model:
      self.trace = (
        pm.sample(
          draws,
          tune=1000))
    return self.trace

Advanced Tips

Use informative priors based on domain knowledge to improve sampling efficiency and produce more realistic posterior estimates. Check trace plots and R-hat diagnostics to verify that chains have converged before interpreting results. Use WAIC or LOO-CV from ArviZ to compare models with different complexity levels.

When to Use It?

Use Cases

Fit a hierarchical regression model that estimates group-level effects with partial pooling across related groups. Estimate treatment effects from experimental data with full posterior uncertainty quantification. Compare competing statistical models using information criteria to select the best predictive model.

Important Notes

Requirements

PyMC Python package with PyTensor backend for model compilation. ArviZ for posterior analysis and diagnostic visualization. NumPy for data preparation and array operations.

Usage Recommendations

Do: check R-hat values and effective sample sizes to verify sampling convergence before interpreting results. Use weakly informative priors rather than flat priors to improve sampling stability. Run multiple chains to diagnose convergence problems through between-chain comparison.

Don't: interpret posterior samples from unconverged chains since they do not represent the true posterior distribution. Use very tight priors that dominate the likelihood effectively ignoring the observed data. Compare models using only in-sample fit without penalizing for complexity.

Limitations

MCMC sampling can be slow for models with many parameters or complex posterior geometries. Discrete parameters require specialized samplers since NUTS is designed for continuous distributions. Model specification errors may produce valid-looking posteriors that do not reflect the true data generating process.

More Skills You Might Like

Explore similar skills to enhance your workflow