Statistical Analyst

Run hypothesis tests, analyze A/B experiment results, calculate sample sizes, and interpret statistical significance with effect sizes. Use when you n

What Is Statistical Analyst?

Statistical Analyst is a specialized Claude Code skill designed to empower data-driven teams with robust statistical analysis capabilities. It helps users run hypothesis tests, analyze A/B experiment results, calculate sample sizes, and interpret statistical significance alongside effect sizes. The skill is ideal for validating whether observed differences in experimental data are real, ensuring experiments are properly sized before launch, and interpreting results in a rigorous, actionable way. By distinguishing between statistical and practical significance, Statistical Analyst supports sound decision-making and reduces the risk of acting on misleading data.

Why Use Statistical Analyst?

In modern product development, marketing optimization, and scientific research, decisions must be based on evidence instead of intuition. However, interpreting data from experiments and A/B tests can be challenging, especially when navigating complex statistical concepts such as p-values, confidence intervals, power, and effect sizes. Misunderstanding these elements can lead to false discoveries, underpowered experiments, or missed opportunities.

Statistical Analyst addresses these challenges by providing clear, reproducible workflows for both post-experiment analysis (e.g., determining if a new feature improved conversion rate) and pre-experiment planning (e.g., calculating how many users are needed to detect a targeted effect). With built-in expertise in test selection, result interpretation, and experiment sizing, the skill ensures teams make confident, statistically grounded decisions.

How to Get Started

To use Statistical Analyst, you need access to the Claude Code environment with the skill installed from this repository. The typical workflow is structured around two primary modes:

Mode 1:

Analyze Experiment Results (A/B Test)

  1. Clarify Inputs: Gather the relevant data — metric type (e.g., conversion rate, mean, count), sample sizes, and observed values for each variant.

  2. Choose the Test:

    • For proportions (e.g., conversion rates), select a Z-test.
    • For continuous means (e.g., average order value), select a t-test.
    • For categorical outcomes (e.g., distribution across categories), select a Chi-square test.
  3. Run the Analysis: Use the provided hypothesis_tester.py script with the appropriate method.

    Example (Python):

    # For comparison of proportions (e.g., conversion rates)
    from hypothesis_tester import run_z_test
    
    # Inputs: successes_A, n_A, successes_B, n_B
    p_value, ci, effect_size = run_z_test(120, 1500, 135, 1480)
    print(f"P-Value: {p_value}, CI: {ci}, Cohen's h: {effect_size}")
  4. Interpret Results: Review the output, which includes the p-value, confidence interval, and effect size (using measures such as Cohen’s d or Cohen’s h).

  5. Decide: Use a structured framework to determine next steps (ship/hold/extend), considering both statistical and practical significance.

Mode 2:

Size an Experiment (Pre-Launch)

  1. Define Parameters: Establish baseline rate, minimum detectable effect, desired power (commonly 80%), and significance level (usually 0.05).

  2. Calculate Sample Size: Use the sample size calculator to ensure your experiment will have sufficient power.

    Example (Python):

    from hypothesis_tester import calculate_sample_size
    
    # Inputs: baseline_rate, min_detectable_effect, power, alpha
    n_per_group = calculate_sample_size(0.10, 0.02, power=0.8, alpha=0.05)
    print(f"Required sample size per group: {n_per_group}")

Key Features

  • Automated Test Selection: The skill guides you to the correct statistical test based on your metric type, reducing risk of analytical errors.
  • Comprehensive Output: Reports p-values, confidence intervals, and effect sizes (Cohen’s d for means, Cohen’s h for proportions, Cramér’s V for categorical variables).
  • Experiment Sizing: Calculates required sample sizes before you run an experiment, helping avoid inconclusive results.
  • Decision Framework: Provides structured recommendations (ship, hold, extend) based on both statistical (p-value, confidence interval) and practical (effect size) significance.
  • Code-Ready: Includes Python scripts for seamless integration into your analysis workflows.

Best Practices

  • Clarify Metrics Early: Always specify what outcome you’re measuring and its type before selecting a test.
  • Don’t Rely Solely on P-Values: Always consider effect size and confidence intervals to assess practical significance.
  • Pre-Register Analysis: Define your hypotheses, analysis plan, and success criteria before running experiments to avoid bias.
  • Size Experiments Properly: Use the sample size calculator to ensure sufficient power, preventing wasted resources or misleading results.
  • Report with Context: When sharing results, include p-values, effect sizes, confidence intervals, and a clear decision recommendation.

Important Notes

  • Statistical vs. Practical Significance: A statistically significant result (e.g., p < 0.05) does not guarantee practical or business relevance. Always evaluate effect sizes in context.
  • Assumptions Matter: All statistical tests have assumptions (e.g., normality, independence). Ensure your data meets these or use robust alternatives.
  • Multiple Testing: If running multiple tests, adjust your significance threshold (e.g., Bonferroni correction) to control for false positives.
  • Power and Sample Size: Underpowered experiments can miss real effects; overpowered ones can make trivial differences seem important.
  • Transparency: Document your analysis process and decisions, making it easier for others to review or reproduce your findings.

Statistical Analyst bridges the gap between raw data and actionable insights, ensuring teams make decisions grounded in statistical rigor and practical relevance. By following its workflows and best practices, you can run smarter experiments and reduce the risk of costly mistakes.