Statistics (inference & tests)

Quick guide

Start with sample size when the question is still about planning precision before data collection.
Use power analysis when the planning question is required n, achieved power, or minimum detectable effect.
Use CI/tests to quantify uncertainty and compare groups.
Use effect size after the inferential step when the next question is practical magnitude rather than only statistical significance.
Use confusion matrix metrics after a classifier run when accuracy alone is not enough and you need precision, recall, specificity, or F1.
Use specificity and sensitivity when the stakeholder language is already about true positive rate versus true negative rate and you need FPR or FNR beside them.
Use likelihood ratio when you want an odds-style explanation for how strongly a positive or negative result changes the evidence after the threshold is fixed.
Use Bayes theorem when you need to explain the full prior → evidence → posterior update rather than only a threshold metric.
Use the odds/probability converter when the missing step is moving between raw probability and the odds that likelihood ratios actually multiply.
Use post-test odds when you want to show the explicit odds step between a prior and a posterior instead of jumping straight to the final probability.
Use pre/post-test probability when LR+ or LR− is already known and the next question is the updated probability for one case.
Use NPV and PPV when the question is how reliable positive or negative calls remain after prevalence changes between training and production.
Use Youden's J when you want one threshold score that rewards both sensitivity and specificity while you compare nearby cutoffs.
Use F-beta when precision and recall are both important but one side should count more than the other, such as recall-heavy screening or precision-heavy review queues.
Use precision and recall when one operating threshold is already fixed and the real trade-off is false alarms versus missed positives on the positive class.
Use MCC when you want one summary metric that reacts to all four confusion-matrix cells and stays useful under class imbalance.
Use balanced accuracy when plain accuracy may flatter the dominant class and you need equal weight on recall and specificity.
Use ROC AUC when the model outputs scores and the next question is how threshold choice changes sensitivity and specificity across the full sweep.
Use diagnostic odds ratio when you want one ratio that summarizes how strongly the threshold separates positive and negative classes. Keep LR+ and LR- beside it for interpretation.
Use ARR / NNT when you are past diagnostic metrics and need absolute effect plus people-needed interpretation for an intervention comparison.
Use number needed to screen when the question is how many people must be screened to detect one target case under a prevalence and detection-yield assumption.
Use NNH when the next question is harm-focused people-needed interpretation from an absolute risk increase rather than a ratio or a broader ARR / NNT summary.
Use risk difference when you want the signed absolute gap in percentage points before converting that gap into NNH or comparing it with ratio metrics.
Use risk difference with confidence interval when the same 2x2 table should report both the signed absolute gap and its 90 / 95 / 99% interval width before you move to ARR / NNT or RR comparison pages.
Use attributable risk when the question is how much of exposed-group risk can be attributed to exposure, not just the raw signed gap.
Use attributable risk percent when the exposed-group question is the share of exposed risk attributable to exposure, not the population burden.
Use population attributable risk when you need the population-level burden after combining attributable risk with exposure prevalence.
Use population attributable fraction when the same population burden should be explained as a fraction or percent of total population risk rather than only an absolute risk value.
Use population attributable risk percent when the reporting language should stay in percent form and you want the same population burden stated directly as a percentage of total population risk.
Use attributable fraction among exposed when the interpretation must stay inside the exposed group and you want the attributable share stated directly in fraction or percent form.
Use risk ratio from table when a simple 2x2 count table is already available and the fastest question is the direct risk multiplier plus the absolute gap.
Use risk ratio with confidence interval when the same 2x2 table should report both the RR point estimate and its 90 / 95 / 99% interval width before you move to broader comparison pages.
Use odds ratio from table when the same 2x2 counts should be reported in odds terms, such as case-control summaries or rare-event interpretations.
Use odds ratio with confidence interval when the same 2x2 table should report both the OR point estimate and its 90 / 95 / 99% interval width before you move to broader comparison pages.
Use Fisher exact test when the 2x2 table is small or sparse and the first question is an exact p-value before you decide whether RR, OR, or chi-square-style summaries are worth reading next.
Use McNemar test when the same items are measured twice and the paired 2x2 question is whether discordant pairs tilt in one direction. It is the small paired-table counterpart to Fisher-style exact reads.
Use binomial test when one success count must be compared against one null proportion and the question is an exact one-proportion p-value before you move to sample-size or CI pages.
Use the sign test when paired before/after data should be reduced to direction only and the next question is an exact p-value for more positives than negatives before you move to Wilcoxon-style rank weighting.
Use the Wilcoxon signed-rank test when paired before/after data are ordinal or skewed continuous values and you want a paired nonparametric alternative to the paired t-test.
Use Mann-Whitney U when 2 groups are independent but ordinal or skewed enough that you want a rank-based alternative to the two-sample t-test.
Use Kruskal-Wallis when 3 or more independent groups should be compared with ranks instead of the mean-based assumptions behind one-way ANOVA. It is the natural 3-or-more-group extension after Mann-Whitney U.
Use the Friedman test when the same items are measured across 3 or more conditions and you want a repeated-measures nonparametric alternative to repeated-measures ANOVA. It is the natural 3-or-more-condition extension after the Wilcoxon signed-rank test.
Use relative risk versus odds ratio when two event-rate groups must be compared directly and the team needs to avoid mixing risk ratios with odds-based reporting.
Use Cohen's kappa when 2 raters classify the same items and percent agreement alone would hide how much of the match is expected from chance.
Use entropy and divergence when the question is uncertainty inside one distribution or mismatch between P and Q rather than a mean difference or classifier score.
Use ANOVA when you compare the mean outcome across 3 or more groups.
Use correlation when the first question is strength and direction of association.
Use regression to fit lines and explain relationships.
Need a non-parametric check? Try a permutation test.

Before you run a test

Write your question in one line. Define the metric and the groups.

Check units, sample size, and missing values before testing.

Use confidence intervals with p-values for clearer reporting.

Share the calculator URL so others can reproduce the same setup.

Keep the analysis plan simple and fixed before you look at results.

Write the null and alternative hypotheses before opening any calculator.

Report effect sizes with uncertainty, not only a single p-value.

Tools

Permutation test – exact p-value for A/B and paired
Run a permutation (randomization) test for two independent groups or paired samples.
Quick charts from pasted data — scatter / box plot
Paste spreadsheet data to generate scatter and box plots instantly.

Calculators

Sample Size Calculator
Plan sample size for surveys, means, and balanced A/B tests.
Power Analysis Calculator
Plan required sample size, achieved power, or minimum detectable effect for one mean, two means, one proportion, and two proportions using a fast normal-approximation workflow.
Effect Size Calculator
Calculate Cohen's d, Hedges' g, or eta-squared from two-group summaries or one-way ANOVA summaries, then read a cautious magnitude guide beside the value.
Confusion Matrix Calculator
Calculate confusion matrix metrics for binary classification from TP, FP, TN, and FN.
Confidence Interval & Hypothesis Test Wizard
Build confidence intervals and hypothesis tests for means and proportions with t/z workflows, Welch, paired, Wilson, and Newcombe methods.
Sensitivity and Specificity Calculator
Calculate sensitivity, specificity, false positive rate, false negative rate, balanced accuracy, and prevalence from TP, FP, TN, and FN for one binary threshold.
Likelihood Ratio Calculator
Calculate positive and negative likelihood ratios from TP, FP, TN, and FN.
Bayes Theorem Calculator
Update a prior probability with Bayes' theorem. Use either sensitivity/specificity for a diagnostic result or direct conditional probabilities for a general event.
Odds Probability Converter
Convert probability to odds or odds back to probability. Useful for Bayes updates, likelihood ratios, and diagnostic interpretation.
Post-Test Odds Calculator
Start from pre-test probability or prior odds, apply a likelihood ratio, and review post-test odds and posterior probability together.
Pre-Test Post-Test Probability Calculator
Convert pre-test probability into post-test probability with LR+ or LR-.
NPV & PPV Calculator
Calculate PPV, NPV, sensitivity, specificity, and observed prevalence from TP, FP, TN, and FN, then test how PPV and NPV shift when prevalence changes.
Youden’s J Calculator
Calculate Youden’s J from TP, FP, TN, and FN. Review sensitivity, specificity, false-positive rate, false-negative rate, and balanced accuracy at one threshold.
F-Beta Score Calculator
Calculate F-beta from precision and recall, or from TP, FP, and FN.
Precision & Recall Calculator
Calculate precision, recall, F1, specificity, prevalence, and accuracy from TP, FP, TN, and FN.
MCC Calculator
Calculate Matthews correlation coefficient (MCC), balanced accuracy, precision, recall, specificity, prevalence, and accuracy from TP, FP, TN, and FN for binary classification.
Balanced Accuracy Calculator
Calculate balanced accuracy, recall, specificity, precision, prevalence, and accuracy from TP, FP, TN, and FN.
ROC AUC Calculator
Calculate a binary ROC curve, AUC, and threshold table from pasted score,label rows.
Diagnostic Odds Ratio Calculator
Calculate diagnostic odds ratio from TP, FP, TN, and FN. Review DOR together with LR+, LR-, sensitivity, specificity, and prevalence at one binary threshold.
ARR NNT Calculator
Calculate absolute risk reduction, relative risk reduction, and NNT or NNH from control and experimental event rates.
Number Needed to Screen Calculator
Convert screening detection yield into number needed to screen and review prevalence, detection rate, and yield together.
Number Needed to Harm Calculator
Convert absolute risk increase into number needed to harm and review control risk, exposed risk, and signed risk change together.
Risk Difference Calculator
Compare two event-rate groups and review signed risk difference, absolute difference, and each group risk together.
Attributable Risk Calculator
Compare exposed and unexposed groups to review attributable risk and attributable fraction together.
Attributable Risk Percent Calculator
Estimate attributable risk percent from exposed and unexposed risk, or from a direct attributable risk and exposed-group risk.
Population Attributable Risk Calculator
Estimate population attributable risk from exposure prevalence and attributable risk, or from exposed and unexposed risk together.
Population Attributable Risk Percent Calculator
Estimate population attributable risk percent from exposure prevalence with exposed and unexposed risk, or from direct attributable risk with total population risk.
Risk Ratio from 2x2 Table Calculator
Enter event and non-event counts for two groups to read each group risk, the risk ratio, and risk difference from one 2x2 table.
Odds Ratio from 2x2 Table Calculator
Enter event and non-event counts for two groups to read each group odds and the odds ratio from one 2x2 table.
Odds Ratio Confidence Interval Calculator
Enter a 2x2 table to read the odds ratio with a 90%, 95%, or 99% confidence interval.
Relative Risk and Odds Ratio Calculator
Enter event and non-event counts for two groups to compare relative risk, odds ratio, and absolute risk difference in one place.
Cohen's Kappa Calculator
Calculate Cohen's kappa from paired labels or a contingency table.
Entropy and KL Divergence Calculator
Calculate entropy, cross-entropy, KL divergence, and JS divergence from probability vectors P and Q.
Correlation Calculator
Paste XY pairs to calculate Pearson or Spearman correlation, two-sided p-value, squared coefficient, sample count, and a readable scatter plot in your browser.
ANOVA Calculator
Paste grouped raw values to run a one-way ANOVA, inspect the ANOVA table, F statistic, p-value, eta-squared, and a readable grouped summary chart in your browser.
Kruskal-Wallis Test Calculator
Run a Kruskal-Wallis test for 3+ independent groups. Review H, the p-value, tie correction, and rank summaries.
Friedman Test Calculator
Run a Friedman test for 3+ paired groups. Review Q, the p-value, tie correction, Kendall's W, and rank summaries.
Linear Regression & Correlation — Scatter, OLS/WLS, R²
Paste x,y[,w] data or upload a CSV to fit OLS, WLS or Theil–Sen regression.
Error Propagation Calculator with Steps
Propagate uncertainty for sums, products, powers, and custom formulas with gradient×covariance steps, correlation options, and Monte Carlo checks.
Deviation Score (T-score) & Percentile Calculator
Calculate z-score, deviation score (T-score / hensachi), and percentile.
Normal Distribution and z-Score Calculator
Calculate normal probabilities, z-scores, percentiles, and inverse cutoffs from the mean and standard deviation.

Statistics & Probability (overview): Inference | CalcBE
Overview hub for statistics & probability. For focused lists, open Inference & tests, Probability & simulation, or Data visualization.
Business Finance & Accounting (Margin/Breakeven/NPV/IRR) | CalcBE
Estimate pricing, margin/markup, breakeven, and investment evaluation (NPV/IRR). Compare assumptions and share by URL.
Finance Functions (PV/FV/PMT/NPV/IRR) | CalcBE
Use calculators for PV/FV/PMT, NPV/IRR, and interest-rate conversion. Compare scenarios and share assumptions with URL.
Investing & Wealth Building (NISA/SIP/Compound/Retirement) | CalcBE
Compare long-term investing scenarios for NISA, SIP, compound growth, retirement, and savings goals. Save and share assumptions with URL.
Paint and Coating Calculators | CalcBE
Estimate paint amount (L/gal), coverage, cans, exterior/roof/fence/deck areas, paint cost, and industrial DFT/WFT—browser-only.

Quick guide

Before you run a test

Tools

Calculators

Related categories