Quick guide
- Start with sample size when the question is still about planning precision before data collection.
- Use power analysis when the planning question is required n, achieved power, or minimum detectable effect.
- Use CI/tests to quantify uncertainty and compare groups.
- Use effect size after the inferential step when the next question is practical magnitude rather than only statistical significance.
- Use confusion matrix metrics after a classifier run when accuracy alone is not enough and you need precision, recall, specificity, or F1.
- Use specificity and sensitivity when the stakeholder language is already about true positive rate versus true negative rate and you need FPR or FNR beside them.
- Use likelihood ratio when you want an odds-style explanation for how strongly a positive or negative result changes the evidence after the threshold is fixed.
- Use Bayes theorem when you need to explain the full prior → evidence → posterior update rather than only a threshold metric.
- Use the odds/probability converter when the missing step is moving between raw probability and the odds that likelihood ratios actually multiply.
- Use post-test odds when you want to show the explicit odds step between a prior and a posterior instead of jumping straight to the final probability.
- Use pre/post-test probability when LR+ or LR− is already known and the next question is the updated probability for one case.
- Use NPV and PPV when the question is how reliable positive or negative calls remain after prevalence changes between training and production.
- Use Youden's J when you want one threshold score that rewards both sensitivity and specificity while you compare nearby cutoffs.
- Use F-beta when precision and recall are both important but one side should count more than the other, such as recall-heavy screening or precision-heavy review queues.
- Use precision and recall when one operating threshold is already fixed and the real trade-off is false alarms versus missed positives on the positive class.
- Use MCC when you want one summary metric that reacts to all four confusion-matrix cells and stays useful under class imbalance.
- Use balanced accuracy when plain accuracy may flatter the dominant class and you need equal weight on recall and specificity.
- Use ROC AUC when the model outputs scores and the next question is how threshold choice changes sensitivity and specificity across the full sweep.
- Use diagnostic odds ratio when you want one ratio that summarizes how strongly the threshold separates positive and negative classes. Keep LR+ and LR- beside it for interpretation.
- Use ARR / NNT when you are past diagnostic metrics and need absolute effect plus people-needed interpretation for an intervention comparison.
- Use number needed to screen when the question is how many people must be screened to detect one target case under a prevalence and detection-yield assumption.
- Use NNH when the next question is harm-focused people-needed interpretation from an absolute risk increase rather than a ratio or a broader ARR / NNT summary.
- Use risk difference when you want the signed absolute gap in percentage points before converting that gap into NNH or comparing it with ratio metrics.
- Use risk difference with confidence interval when the same 2x2 table should report both the signed absolute gap and its 90 / 95 / 99% interval width before you move to ARR / NNT or RR comparison pages.
- Use attributable risk when the question is how much of exposed-group risk can be attributed to exposure, not just the raw signed gap.
- Use attributable risk percent when the exposed-group question is the share of exposed risk attributable to exposure, not the population burden.
- Use population attributable risk when you need the population-level burden after combining attributable risk with exposure prevalence.
- Use population attributable fraction when the same population burden should be explained as a fraction or percent of total population risk rather than only an absolute risk value.
- Use population attributable risk percent when the reporting language should stay in percent form and you want the same population burden stated directly as a percentage of total population risk.
- Use attributable fraction among exposed when the interpretation must stay inside the exposed group and you want the attributable share stated directly in fraction or percent form.
- Use risk ratio from table when a simple 2x2 count table is already available and the fastest question is the direct risk multiplier plus the absolute gap.
- Use risk ratio with confidence interval when the same 2x2 table should report both the RR point estimate and its 90 / 95 / 99% interval width before you move to broader comparison pages.
- Use odds ratio from table when the same 2x2 counts should be reported in odds terms, such as case-control summaries or rare-event interpretations.
- Use odds ratio with confidence interval when the same 2x2 table should report both the OR point estimate and its 90 / 95 / 99% interval width before you move to broader comparison pages.
- Use Fisher exact test when the 2x2 table is small or sparse and the first question is an exact p-value before you decide whether RR, OR, or chi-square-style summaries are worth reading next.
- Use McNemar test when the same items are measured twice and the paired 2x2 question is whether discordant pairs tilt in one direction. It is the small paired-table counterpart to Fisher-style exact reads.
- Use binomial test when one success count must be compared against one null proportion and the question is an exact one-proportion p-value before you move to sample-size or CI pages.
- Use the sign test when paired before/after data should be reduced to direction only and the next question is an exact p-value for more positives than negatives before you move to Wilcoxon-style rank weighting.
- Use the Wilcoxon signed-rank test when paired before/after data are ordinal or skewed continuous values and you want a paired nonparametric alternative to the paired t-test.
- Use Mann-Whitney U when 2 groups are independent but ordinal or skewed enough that you want a rank-based alternative to the two-sample t-test.
- Use Kruskal-Wallis when 3 or more independent groups should be compared with ranks instead of the mean-based assumptions behind one-way ANOVA. It is the natural 3-or-more-group extension after Mann-Whitney U.
- Use the Friedman test when the same items are measured across 3 or more conditions and you want a repeated-measures nonparametric alternative to repeated-measures ANOVA. It is the natural 3-or-more-condition extension after the Wilcoxon signed-rank test.
- Use relative risk versus odds ratio when two event-rate groups must be compared directly and the team needs to avoid mixing risk ratios with odds-based reporting.
- Use Cohen's kappa when 2 raters classify the same items and percent agreement alone would hide how much of the match is expected from chance.
- Use entropy and divergence when the question is uncertainty inside one distribution or mismatch between P and Q rather than a mean difference or classifier score.
- Use ANOVA when you compare the mean outcome across 3 or more groups.
- Use correlation when the first question is strength and direction of association.
- Use regression to fit lines and explain relationships.
- Need a non-parametric check? Try a permutation test.
Statistics (inference & tests): how to choose the right calculator path
This topic page works best when you treat it as a decision map rather than a flat list of tools. Start by writing the exact decision you need to make, then pick calculators in sequence so each output becomes an input to the next step. In practice, teams get faster and make fewer errors when they run a baseline model first, pressure-test assumptions second, and only then export a final number. For many workflows in this topic, a reliable sequence is to begin with Confidence Interval & Hypothesis Test Wizard, cross-check with Linear Regression & Correlation — Scatter, OLS/WLS, R², and finalize with Error Propagation Calculator with Steps when you need a publishable result.
How to choose calculators in this topic
- Define the decision question first: estimate, compare, optimize, or validate.
- Run one baseline scenario with conservative assumptions before trying edge cases.
- Separate planning assumptions from reporting assumptions so stakeholders can audit differences.
- Save URLs after each milestone so the same setup can be reproduced in review meetings.
Common mistakes
- Jumping directly to advanced tools without confirming baseline inputs and units.
- Mixing assumptions across calculators (time horizon, rounding rule, or category definition).
- Treating one scenario as a forecast instead of comparing multiple plausible ranges.
- Copying only final numbers and losing the parameter context needed for later audits.
Practical workflow example
Suppose your team must deliver a recommendation by end of day. Use the first 10 minutes to define scope, constraints, and acceptance criteria in plain language. Run a baseline calculation, then a conservative and an optimistic case using the same structure. If outputs diverge materially, capture the sensitivity driver and decide which assumption needs escalation. Only after this pass should you export or share numbers. This process keeps the topic useful for real decisions, not just one-off calculations.
When results will influence spending, policy, or operations, keep a short note beside each output that records source data date, assumptions, and rounding policy. That one step dramatically reduces rework when someone asks for a rerun next week.
See also
Before you run a test
Write your question in one line. Define the metric and the groups.
Check units, sample size, and missing values before testing.
Use confidence intervals with p-values for clearer reporting.
Share the calculator URL so others can reproduce the same setup.
Keep the analysis plan simple and fixed before you look at results.
Write the null and alternative hypotheses before opening any calculator.
Report effect sizes with uncertainty, not only a single p-value.
Tools
- Permutation test – exact p-value for A/B and paired.
Run a permutation (randomization) test for two independent groups or paired samples.
- Quick charts from pasted data — scatter / box plot.
Paste spreadsheet data to generate scatter and box plots instantly.
Calculators
- Sample Size Calculator | Surveys, Means & A/B Tests.
Plan sample size for surveys, means, and balanced A/B tests.
- Power Analysis Calculator | Sample Size, Power & MDE.
Plan required sample size, achieved power, or minimum detectable effect for one mean, two means, one proportion.
- Effect Size Calculator | Cohen's d, Hedges' g & Eta-Squared.
Calculate Cohen's d, Hedges' g, or eta-squared from two-group summaries or one-way ANOVA summaries, then read.
- Confusion Matrix Calculator | Precision, Recall, F1 &.
Calculate confusion matrix metrics for binary classification from TP, FP, TN, and FN.
- Confidence Interval & Hypothesis Test Wizard.
Build confidence intervals and hypothesis tests for means and proportions with t/z workflows, Welch, paired, Wilson.
- Sensitivity & Specificity Calculator.
Calculate sensitivity, specificity, false positive rate, false negative rate, balanced accuracy, and prevalence.
- Likelihood Ratio Calculator | LR+ / LR- from Sensitivity &.
Calculate positive and negative likelihood ratios from TP, FP, TN, and FN.
- Bayes Theorem Calculator | Prior, Posterior & Bayes Factor.
Update a prior probability with Bayes' theorem.
- Odds Probability Converter | Probability to odds and back.
Convert probability to odds or odds back to probability. Useful for Bayes updates, likelihood ratios.
- Post-Test Odds Calculator | Prior odds, LR update,.
Start from pre-test probability or prior odds, apply a likelihood ratio, and review post-test odds and posterior.
- Pre-Test / Post-Test Probability Calculator.
Convert pre-test probability into post-test probability with LR+ or LR.
- NPV & PPV Calculator | Predictive Values and Prevalence Shift.
Calculate PPV, NPV, sensitivity, specificity, and observed prevalence from TP, FP, TN, and FN, then test how PPV.
- Youden’s J Calculator | Sensitivity + Specificity - 1.
Calculate Youden’s J from TP, FP, TN, and FN.
- F-Beta Score Calculator.
Calculate F-beta from precision and recall, or from TP, FP, and FN.
- Precision & Recall Calculator | F1, Specificity & Prevalence.
Calculate precision, recall, F1, specificity, prevalence, and accuracy from TP, FP, TN, and FN.
- MCC Calculator.
Calculate Matthews correlation coefficient (MCC), balanced accuracy, precision, recall, specificity, prevalence.
- Balanced Accuracy Calculator.
Calculate balanced accuracy, recall, specificity, precision, prevalence, and accuracy from TP, FP, TN, and FN.
- ROC AUC Calculator | ROC Curve, AUC & Threshold Table.
Calculate a binary ROC curve, AUC, and threshold table from pasted score,label rows.
- Diagnostic Odds Ratio Calculator | DOR from TP, FP, TN, FN.
Calculate diagnostic odds ratio from TP, FP, TN, and FN.
- ARR NNT Calculator | Absolute risk reduction, RRR, NNT, NNH.
Calculate absolute risk reduction, relative risk reduction, and NNT or NNH from control and experimental event rates.
- Number Needed to Screen Calculator | Detection yield to NNS.
Convert screening detection yield into number needed to screen and review prevalence, detection rate, and yield.
- Number Needed to Harm Calculator | Absolute risk increase.
Convert absolute risk increase into number needed to harm and review control risk, exposed risk, and signed risk.
- Risk Difference Calculator | Signed and absolute risk gap.
Compare two event-rate groups and review signed risk difference, absolute difference, and each group risk together.
- Attributable Risk Calculator | Exposed vs unexposed risk gap.
Compare exposed and unexposed groups to review attributable risk and attributable fraction together.
- Attributable Risk Percent Calculator | Fraction of.
Estimate attributable risk percent from exposed and unexposed risk, or from a direct attributable risk.
- Population Attributable Risk Calculator.
Estimate population attributable risk from exposure prevalence and attributable risk, or from exposed.
- Population Attributable Risk Percent Calculator.
Estimate population attributable risk percent from exposure prevalence with exposed and unexposed risk, or from direct.
- Risk Ratio from 2x2 Table Calculator | Quick relative risk.
Enter event and non-event counts for two groups to read each group risk, the risk ratio, and the absolute risk.
- Odds Ratio from 2x2 Table Calculator | Quick OR from counts.
Enter event and non-event counts for two groups to read each group odds and the odds ratio from one 2x2 table.
- Odds Ratio Confidence Interval Calculator | 2x2 table OR.
Enter a 2x2 table to read the odds ratio with a 90%, 95%, or 99% confidence interval.
- Relative Risk and Odds Ratio Calculator | Compare two 2x2.
Enter event and non-event counts for two groups to compare relative risk, odds ratio, and absolute risk difference.
- Cohen's Kappa Calculator | Inter-Rater Agreement.
Calculate Cohen's kappa from paired labels or a contingency table.
- Entropy & KL Divergence Calculator.
Calculate entropy, cross-entropy, KL divergence, and JS divergence from probability vectors P and Q.
- Correlation Calculator | Pearson & Spearman from XY Data.
Paste XY pairs to calculate Pearson or Spearman correlation, two-sided p-value, squared coefficient, sample count.
- ANOVA Calculator | One-Way F-Test from Grouped Data.
Paste grouped raw values to run a one-way ANOVA, inspect the ANOVA table, F statistic, p-value, eta-squared.
- Kruskal-Wallis Test Calculator | Rank-based 3+ group p-value.
Run a Kruskal-Wallis test on 3 or more independent groups.
- Friedman Test Calculator | Paired rank-based 3+ group p-value.
Run a Friedman test on 3 or more paired groups.
- Linear Regression & Correlation — Scatter, OLS/WLS, R².
Paste x,y[,w] data or upload a CSV to fit OLS, WLS or Theil–Sen regression.
- Error Propagation Calculator with Steps.
Propagate uncertainty for sums, products, powers, and custom formulas with gradient×covariance steps, correlation.
- Deviation Score (T-score) & Percentile Calculator.
Calculate z-score, deviation score (T-score / hensachi), and percentile.
- Normal Distribution Calculator | z-Score, Percentile & CDF.
Calculate normal probabilities, z-scores, percentiles, and inverse cutoffs from the mean and standard deviation.
How to use this calculator effectively
This guide helps you use Statistics (inference & tests) in a repeatable way: define a baseline, change one variable at a time, and interpret outputs with explicit assumptions before you share or act on results.
How it works
The page applies deterministic logic to your inputs and shows rounded output for readability. Treat it as a comparison workflow: run one baseline case, adjust a single parameter, and measure both absolute and percentage deltas. If a result seems off, verify units, time basis, and sign conventions before drawing conclusions. This approach keeps your analysis reproducible across teammates and sessions.
When to use
Use this page when you need a fast estimate, a classroom check, or a practical what-if comparison. It works best for planning and prioritization steps where you need direction and magnitude quickly before investing in deeper modeling, manual spreadsheets, or formal external review.
Common mistakes to avoid
- Changing multiple parameters at once, which hides the true cause of output movement.
- Mixing units (percent vs decimal, monthly vs yearly, gross vs net) across scenarios.
- Comparing with another tool without aligning defaults, constants, and rounding rules.
- Using rounded display values as exact downstream inputs without re-checking precision.
Interpretation and worked example
Run a baseline scenario and keep that result visible. Next, modify one assumption to reflect your realistic alternative and compare direction plus size of change. If the direction matches your domain expectation and the size is plausible, your setup is usually coherent. If not, check hidden defaults, boundary conditions, and interpretation notes before deciding which scenario to adopt.
See also
FAQ
What should I do first on this page?
Start with the minimum required inputs or the first action shown near the primary button. Keep optional settings at defaults for a baseline run, then change one setting at a time so you can explain what caused each output change.
Why does this page differ from another tool?
Different pages often use different defaults, units, rounding rules, or assumptions. Align those settings before comparing outputs. If differences remain, compare each intermediate step rather than only the final number.
How reliable are the displayed values?
Values are computed in the browser and rounded for display. They are good for planning and educational checks, but for regulated or high-stakes decisions you should validate assumptions with official guidance or professional review.
Can I share and reproduce this result?
Yes. Use the share or URL controls when available. Keep a baseline case and one changed case so others can reproduce your reasoning and verify that the direction and scale of change are consistent.
Is my input uploaded somewhere?
Core calculations run locally in your browser. Some pages encode parameters in a shareable URL, but no automatic upload is performed unless you explicitly share that link.
How to use Statistics (inference & tests) effectively
Topic overview
This topic page connects related methods, calculators, and practical contexts. Use it as an entry point: identify your objective first, then jump to the tool that isolates one decision variable and compare outputs with your constraints.
Recommended reading order
Start with the conceptual entry, then open one calculator page, then return here for alternatives. Reusing this loop avoids jumping directly into advanced inputs before you have enough context to interpret outputs correctly.
Cross-page consistency
Keep terminology aligned across calculators in the same topic family. If names or units drift between pages, the same user intent can produce conflicting interpretations even when numerical outcomes appear similar.
Quality checks
After each calculation, confirm input units, baseline assumptions, and edge-case handling. Topic-level consistency checks help your team retain interpretability over time and reduce accidental misinterpretation in future edits.