← Math & statistics

Statistics Classification

Confusion Matrix Calculator

Other languages 日本語 | English

Enter TP, FP, TN, and FN to calculate accuracy, precision, recall, specificity, F1 score, and prevalence from a binary confusion matrix.

This first release is intentionally binary-only. Use it when you need a quick audit of classification metrics without moving into multiclass, threshold tuning, or ROC/PR curves.

How to use

  1. Enter the confusion matrix counts: TP, FP, TN, and FN.
  2. Optionally rename the positive and negative classes so the matrix reads like your own dataset.
  3. Read precision and recall beside accuracy, especially when one class is rare.

Wave 3 statistics expansion

Binary classification metrics from one matrix

Use this page after a model or rule has already produced binary predictions. It is a readout page for classification metrics, not a threshold-tuning workflow.

Inputs

Run a calculation to see classification metrics from the confusion matrix.

Accuracy is not the whole story

Accuracy can stay high when the negative class dominates. In that situation, a classifier may look fine overall while still missing many positive cases. That is why this page keeps recall, specificity, precision, and prevalence beside accuracy.

Precision vs recall

Precision asks, “when the model says positive, how often is it right?” Recall asks, “of all actual positives, how many did it catch?” Improving one can hurt the other, so the right balance depends on the cost of false positives versus false negatives.

Binary-first release

This first release stays with binary classification so each metric remains easy to audit from TP, FP, TN, and FN. Multiclass matrices, ROC curves, and threshold sweeps belong in a later expansion rather than being mixed into the first version.

Frequently asked questions

Why is accuracy alone not enough?

Accuracy can look high when one class is rare. In imbalanced datasets, precision, recall, specificity, and prevalence often tell you more about model behavior than accuracy alone.

What is the difference between precision and recall?

Precision asks how often predicted positives are correct. Recall asks how many actual positives the model catches. One focuses on false positives, the other on false negatives.

Does this page support multiclass confusion matrices?

No. This first release is intentionally limited to binary classification so the core metrics stay easy to audit and compare.

Does the share URL include my counts or labels?

No. The share URL stores only lightweight settings such as decimal places. Counts and custom labels stay in your browser.

Comments (optional)

To reduce load, comments are fetched only when needed.