Quick guide
- Start with sample size when the question is still about planning precision before data collection.
- Use power analysis when the planning question is required n, achieved power, or minimum detectable effect.
- Use CI/tests to quantify uncertainty and compare groups.
- Use effect size after the inferential step when the next question is practical magnitude rather than only statistical significance.
- Use confusion matrix metrics after a classifier run when accuracy alone is not enough and you need precision, recall, specificity, or F1.
- Use specificity and sensitivity when the stakeholder language is already about true positive rate versus true negative rate and you need FPR or FNR beside them.
- Use likelihood ratio when you want an odds-style explanation for how strongly a positive or negative result changes the evidence after the threshold is fixed.
- Use Bayes theorem when you need to explain the full prior → evidence → posterior update rather than only a threshold metric.
- Use the odds/probability converter when the missing step is moving between raw probability and the odds that likelihood ratios actually multiply.
- Use post-test odds when you want to show the explicit odds step between a prior and a posterior instead of jumping straight to the final probability.
- Use pre/post-test probability when LR+ or LR− is already known and the next question is the updated probability for one case.
- Use NPV and PPV when the question is how reliable positive or negative calls remain after prevalence changes between training and production.
- Use Youden's J when you want one threshold score that rewards both sensitivity and specificity while you compare nearby cutoffs.
- Use F-beta when precision and recall are both important but one side should count more than the other, such as recall-heavy screening or precision-heavy review queues.
- Use precision and recall when one operating threshold is already fixed and the real trade-off is false alarms versus missed positives on the positive class.
- Use MCC when you want one summary metric that reacts to all four confusion-matrix cells and stays useful under class imbalance.
- Use balanced accuracy when plain accuracy may flatter the dominant class and you need equal weight on recall and specificity.
- Use ROC AUC when the model outputs scores and the next question is how threshold choice changes sensitivity and specificity across the full sweep.
- Use diagnostic odds ratio when you want one ratio that summarizes how strongly the threshold separates positive and negative classes. Keep LR+ and LR- beside it for interpretation.
- Use ARR / NNT when you are past diagnostic metrics and need absolute effect plus people-needed interpretation for an intervention comparison.
- Use number needed to screen when the question is how many people must be screened to detect one target case under a prevalence and detection-yield assumption.
- Use NNH when the next question is harm-focused people-needed interpretation from an absolute risk increase rather than a ratio or a broader ARR / NNT summary.
- Use risk difference when you want the signed absolute gap in percentage points before converting that gap into NNH or comparing it with ratio metrics.
- Use risk difference with confidence interval when the same 2x2 table should report both the signed absolute gap and its 90 / 95 / 99% interval width before you move to ARR / NNT or RR comparison pages.
- Use attributable risk when the question is how much of exposed-group risk can be attributed to exposure, not just the raw signed gap.
- Use attributable risk percent when the exposed-group question is the share of exposed risk attributable to exposure, not the population burden.
- Use population attributable risk when you need the population-level burden after combining attributable risk with exposure prevalence.
- Use population attributable fraction when the same population burden should be explained as a fraction or percent of total population risk rather than only an absolute risk value.
- Use population attributable risk percent when the reporting language should stay in percent form and you want the same population burden stated directly as a percentage of total population risk.
- Use attributable fraction among exposed when the interpretation must stay inside the exposed group and you want the attributable share stated directly in fraction or percent form.
- Use risk ratio from table when a simple 2x2 count table is already available and the fastest question is the direct risk multiplier plus the absolute gap.
- Use risk ratio with confidence interval when the same 2x2 table should report both the RR point estimate and its 90 / 95 / 99% interval width before you move to broader comparison pages.
- Use odds ratio from table when the same 2x2 counts should be reported in odds terms, such as case-control summaries or rare-event interpretations.
- Use odds ratio with confidence interval when the same 2x2 table should report both the OR point estimate and its 90 / 95 / 99% interval width before you move to broader comparison pages.
- Use Fisher exact test when the 2x2 table is small or sparse and the first question is an exact p-value before you decide whether RR, OR, or chi-square-style summaries are worth reading next.
- Use McNemar test when the same items are measured twice and the paired 2x2 question is whether discordant pairs tilt in one direction. It is the small paired-table counterpart to Fisher-style exact reads.
- Use binomial test when one success count must be compared against one null proportion and the question is an exact one-proportion p-value before you move to sample-size or CI pages.
- Use the sign test when paired before/after data should be reduced to direction only and the next question is an exact p-value for more positives than negatives before you move to Wilcoxon-style rank weighting.
- Use the Wilcoxon signed-rank test when paired before/after data are ordinal or skewed continuous values and you want a paired nonparametric alternative to the paired t-test.
- Use Mann-Whitney U when 2 groups are independent but ordinal or skewed enough that you want a rank-based alternative to the two-sample t-test.
- Use Kruskal-Wallis when 3 or more independent groups should be compared with ranks instead of the mean-based assumptions behind one-way ANOVA. It is the natural 3-or-more-group extension after Mann-Whitney U.
- Use the Friedman test when the same items are measured across 3 or more conditions and you want a repeated-measures nonparametric alternative to repeated-measures ANOVA. It is the natural 3-or-more-condition extension after the Wilcoxon signed-rank test.
- Use relative risk versus odds ratio when two event-rate groups must be compared directly and the team needs to avoid mixing risk ratios with odds-based reporting.
- Use Cohen's kappa when 2 raters classify the same items and percent agreement alone would hide how much of the match is expected from chance.
- Use entropy and divergence when the question is uncertainty inside one distribution or mismatch between P and Q rather than a mean difference or classifier score.
- Use ANOVA when you compare the mean outcome across 3 or more groups.
- Use correlation when the first question is strength and direction of association.
- Use regression to fit lines and explain relationships.
- Need a non-parametric check? Try a permutation test.
Before you run a test
Write your question in one line. Define the metric and the groups.
Check units, sample size, and missing values before testing.
Use confidence intervals with p-values for clearer reporting.
Share the calculator URL so others can reproduce the same setup.
Keep the analysis plan simple and fixed before you look at results.
Write the null and alternative hypotheses before opening any calculator.
Report effect sizes with uncertainty, not only a single p-value.
Tools
- Permutation test – exact p-value for A/B and paired.
Run a permutation (randomization) test for two independent groups or paired samples.
- Quick charts from pasted data — scatter / box plot.
Paste spreadsheet data to generate scatter and box plots instantly.
Calculators
- Sample Size Calculator.
Plan sample size for surveys, means, and balanced A/B tests.
- Power Analysis Calculator.
Plan required sample size, achieved power, or minimum detectable effect for one mean, two means, one proportion.
- Effect Size Calculator.
Calculate Cohen's d, Hedges' g, or eta-squared from two-group summaries or one-way ANOVA summaries, then read.
- Confusion Matrix Calculator.
Calculate confusion matrix metrics for binary classification from TP, FP, TN, and FN.
- Confidence Interval & Hypothesis Test Wizard.
Build confidence intervals and hypothesis tests for means and proportions with t/z workflows, Welch, paired, Wilson.
- Sensitivity and Specificity Calculator.
Calculate sensitivity, specificity, false positive rate, false negative rate, balanced accuracy, and prevalence.
- Likelihood Ratio Calculator.
Calculate positive and negative likelihood ratios from TP, FP, TN, and FN.
- Bayes Theorem Calculator.
Update a prior probability with Bayes' theorem.
- Odds Probability Converter.
Convert probability to odds or odds back to probability. Useful for Bayes updates, likelihood ratios.
- Post-Test Odds Calculator.
Start from pre-test probability or prior odds, apply a likelihood ratio, and review post-test odds and posterior.
- Pre-Test Post-Test Probability Calculator.
Convert pre-test probability into post-test probability with LR+ or LR.
- NPV & PPV Calculator.
Calculate PPV, NPV, sensitivity, specificity, and observed prevalence from TP, FP, TN, and FN, then test how PPV.
- Youden’s J Calculator.
Calculate Youden’s J from TP, FP, TN, and FN.
- F-Beta Score Calculator.
Calculate F-beta from precision and recall, or from TP, FP, and FN.
- Precision & Recall Calculator.
Calculate precision, recall, F1, specificity, prevalence, and accuracy from TP, FP, TN, and FN.
- MCC Calculator.
Calculate Matthews correlation coefficient (MCC), balanced accuracy, precision, recall, specificity, prevalence.
- Balanced Accuracy Calculator.
Calculate balanced accuracy, recall, specificity, precision, prevalence, and accuracy from TP, FP, TN, and FN.
- ROC AUC Calculator.
Calculate a binary ROC curve, AUC, and threshold table from pasted score,label rows.
- Diagnostic Odds Ratio Calculator.
Calculate diagnostic odds ratio from TP, FP, TN, and FN.
- ARR NNT Calculator.
Calculate absolute risk reduction, relative risk reduction, and NNT or NNH from control and experimental event rates.
- Number Needed to Screen Calculator.
Convert screening detection yield into number needed to screen and review prevalence, detection rate, and yield.
- Number Needed to Harm Calculator.
Convert absolute risk increase into number needed to harm and review control risk, exposed risk, and signed risk.
- Risk Difference Calculator.
Compare two event-rate groups and review signed risk difference, absolute difference, and each group risk together.
- Attributable Risk Calculator.
Compare exposed and unexposed groups to review attributable risk and attributable fraction together.
- Attributable Risk Percent Calculator.
Estimate attributable risk percent from exposed and unexposed risk, or from a direct attributable risk.
- Population Attributable Risk Calculator.
Estimate population attributable risk from exposure prevalence and attributable risk, or from exposed.
- Population Attributable Risk Percent Calculator.
Estimate population attributable risk percent from exposure prevalence with exposed and unexposed risk, or from direct.
- Risk Ratio from 2x2 Table Calculator.
Enter event and non-event counts for two groups to read each group risk, the risk ratio, and the absolute risk.
- Odds Ratio from 2x2 Table Calculator.
Enter event and non-event counts for two groups to read each group odds and the odds ratio from one 2x2 table.
- Odds Ratio Confidence Interval Calculator.
Enter a 2x2 table to read the odds ratio with a 90%, 95%, or 99% confidence interval.
- Relative Risk and Odds Ratio Calculator.
Enter event and non-event counts for two groups to compare relative risk, odds ratio, and absolute risk difference.
- Cohen's Kappa Calculator.
Calculate Cohen's kappa from paired labels or a contingency table.
- Entropy and KL Divergence Calculator.
Calculate entropy, cross-entropy, KL divergence, and JS divergence from probability vectors P and Q.
- Correlation Calculator.
Paste XY pairs to calculate Pearson or Spearman correlation, two-sided p-value, squared coefficient, sample count.
- ANOVA Calculator.
Paste grouped raw values to run a one-way ANOVA, inspect the ANOVA table, F statistic, p-value, eta-squared.
- Kruskal-Wallis Test Calculator.
Run a Kruskal-Wallis test for 3+ independent groups. Review H, the p-value, tie correction, and rank summaries.
- Friedman Test Calculator.
Run a Friedman test for 3+ paired groups. Review Q, the p-value, tie correction, Kendall's W, and rank summaries.
- Linear Regression & Correlation — Scatter, OLS/WLS, R².
Paste x,y[,w] data or upload a CSV to fit OLS, WLS or Theil–Sen regression.
- Error Propagation Calculator with Steps.
Propagate uncertainty for sums, products, powers, and custom formulas with gradient×covariance steps, correlation.
- Deviation Score (T-score) & Percentile Calculator.
Calculate z-score, deviation score (T-score / hensachi), and percentile.
- Normal Distribution Calculator.
Calculate normal probabilities, z-scores, percentiles, and inverse cutoffs from the mean and standard deviation.
Related categories
- Statistics & Probability (overview): Inference | CalcBEOverview hub for statistics & probability. For focused lists, open Inference & tests, Probability & simulation, or Data visualization.
- Business Finance & Accounting (Margin/Breakeven/NPV/IRR) | CalcBEEstimate pricing, margin/markup, breakeven, and investment evaluation (NPV/IRR). Compare assumptions and share by URL.
- Finance Functions (PV/FV/PMT/NPV/IRR) | CalcBEUse calculators for PV/FV/PMT, NPV/IRR, and interest-rate conversion. Compare scenarios and share assumptions with URL.
- Investing & Wealth Building (NISA/SIP/Compound/Retirement) | CalcBECompare long-term investing scenarios for NISA, SIP, compound growth, retirement, and savings goals. Save and share assumptions with URL.
- Paint and Coating Calculators | CalcBEEstimate paint amount (L/gal), coverage, cans, exterior/roof/fence/deck areas, paint cost, and industrial DFT/WFT—browser-only.