Experiment Designer
Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with prac
Category: productivity Source: alirezarezvani/claude-skillsWhat Is Experiment Designer?
Experiment Designer is a productivity skill developed for Claude that streamlines the end-to-end process of designing, prioritizing, and interpreting product experiments. Tailored for product managers, analysts, and growth teams, it brings statistical rigor and practical guidance to the often intricate world of A/B and multivariate testing. This skill provides structured frameworks for hypothesis generation, experiment planning, sample size estimation, and test prioritization, helping teams make defensible, data-driven decisions. By following best practices in experimental design, Experiment Designer minimizes common pitfalls and elevates the quality of insights derived from product tests.
Why Use Experiment Designer?
Running product experiments without a robust framework risks wasted engineering effort, misleading conclusions, and ultimately, missed opportunities for product improvement. Experiment Designer addresses these challenges by:
- Enforcing statistical validity: Ensures experiments are powered correctly and reduce false positives.
- Structuring hypothesis development: Encourages clarity in experimental intent and expected outcomes.
- Prioritizing for impact: Helps teams allocate resources to the most promising tests.
- Interpreting results with rigor: Guides users to make correct product decisions based on statistical output.
- Reducing bias and ambiguity: Standardizes experimentation workflows for team alignment.
In a landscape where product iteration speed and accuracy are critical, Experiment Designer is an indispensable tool for embedding scientific discipline into product development cycles.
How to Get Started
Experiment Designer is available as an open-source skill for integration with Claude. To begin, clone the repository from the official GitHub source and review the provided scripts and documentation.
A core utility provided is the sample size calculator, which can be run as follows:
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute
This command estimates the sample size needed for an experiment targeting a baseline conversion rate of 12%, with a minimum detectable effect (MDE) of 2% (absolute change). Adjust the parameters to fit your experiment’s specifics.
Before launching an experiment, follow these steps:
- Formulate your hypothesis using the If/Then/Because structure.
- Define primary, guardrail, and secondary metrics with clear success criteria.
- Estimate the required sample size with the provided script.
- Prioritize experiments using ICE scoring (Impact, Confidence, Ease).
- Set stopping rules (fixed sample size or duration) before launch.
Key Features
1. Hypothesis Framework:
Experiment Designer enforces a clear If/Then/Because structure for hypothesis writing:
- If we change
[intervention] - Then
[metric]will change by[expected direction/magnitude] - Because
[behavioral mechanism]
2. Metric Definition:
- Primary metric: The core decision metric for the experiment.
- Guardrail metrics: Metrics that ensure quality or mitigate risk (e.g., retention, churn).
- Secondary metrics: Informational metrics for diagnostic purposes only.
3. Sample Size Planning:
The included Python script allows users to estimate the required sample size for statistical significance:
python3 scripts/sample_size_calculator.py --baseline-rate 0.08 --mde 0.015 --mde-type absolute
Arguments include:
--baseline-rate: Baseline conversion or mean.--mde: Minimum Detectable Effect.--mde-type: 'absolute' or 'relative' effect.- Optional: significance (
--alpha) and power (--power) levels.
4. Experiment Prioritization with ICE:
ICE (Impact, Confidence, Ease) scoring helps you rank experiments:
ICE Score = (Impact * Confidence * Ease) / 10
Assign each factor a 1-10 score (e.g., Impact: 8, Confidence: 7, Ease: 5), then calculate to compare tests for prioritization.
5. Launch and Interpretation Guidance:
- Set fixed sample sizes or durations before starting.
- Review statistical outputs with practical recommendations for decision-making.
Best Practices
- Always pre-register hypotheses: Document your If/Then/Because statement before launching.
- Select power and significance thoughtfully: Default values are 80% power and 5% alpha; adjust only with statistical justification.
- Never peek at results early: Stick to predetermined stopping rules to avoid invalidating statistical inference.
- Use guardrail metrics: Protect against unintended negative impacts by monitoring key secondary metrics.
- Prioritize based on rigor, not just intuition: Use ICE scoring objectively, and revisit scores as evidence evolves.
- Document all assumptions: Baseline rates, expected effects, and business context should be explicit for transparency and repeatability.
Important Notes
- Skill limitations: Experiment Designer provides guidance and computational tools but does not replace deep statistical expertise for complex cases (e.g., multi-arm bandits, non-parametric tests).
- Data quality is paramount: Garbage-in, garbage-out applies; ensure event tracking and data pipelines are robust before relying on experimental data.
- Interpretation caution: Statistical significance does not guarantee business impact; always consider effect sizes and practical significance when making decisions.
- Open-source updates: Check the GitHub repository for improvements, bug fixes, and community contributions.
Experiment Designer brings a disciplined, repeatable approach to product experimentation, enabling teams to drive product growth with confidence and clarity.