How to use
- Choose paired labels if you have one rated item per line, or contingency table if the matrix is already summarized.
- Run the calculator and compare observed agreement against expected agreement from the category marginals.
- Interpret kappa with caution when one category dominates, because high percent agreement can still coexist with only modest agreement beyond chance.
Wave 4 agreement expansion
Agreement beyond chance for 2 raters
Use this page when 2 people, systems, or coding passes classify the same items. If your main question is model thresholding, use ROC AUC or confusion matrix instead.
Inputs
Paste the raw ratings or the already-counted matrix. Counts stay in your browser and are not included in the share URL.
Run the calculator to compare observed agreement against chance agreement.
Agreement matrix
Category totals
Compare each category's row marginal, column marginal, and diagonal agreement count. This helps explain why kappa can move even when raw agreement looks similar.
| Category | Rater A total | Rater B total | Diagonal agreement |
|---|
How to read kappa
Percent agreement says how often the raters matched. Cohen's kappa asks how much of that match remains after subtracting the agreement you would expect from the marginals alone.
Why high agreement can still mean modest kappa
If one category is very common, raters can agree often just by falling into that category together. In that setting expected agreement becomes large, so kappa may stay modest even when raw agreement feels high.
Agreement analysis is not the same as classifier evaluation
Use confusion matrix when one axis is model prediction and the other is truth. Use this page when both sides are raters and the main question is agreement beyond chance.
Frequently asked questions
Why is percent agreement alone not enough?
Percent agreement treats every agreement as equally informative, even when raters would agree often by chance because one category dominates. Cohen's kappa subtracts that chance agreement before reporting agreement beyond chance.
Why can kappa stay low even when agreement looks high?
When one category is much more common than the others, expected agreement by chance becomes large. In that situation the observed agreement may look high while kappa stays modest because much of the agreement was expected from the marginals.
How is this different from a confusion matrix?
A confusion matrix is usually used for model predictions against truth. Cohen's kappa is an agreement analysis between 2 raters, where the main question is how much agreement remains after accounting for chance.
Does this page include weighted kappa?
No. This first release is limited to unweighted Cohen's kappa for nominal categories and 2 raters only.
Related
Comments (optional)
To reduce load, comments are fetched only when needed.