Entropy and KL Divergence Calculator | Cross-entropy and JS

How to use

Choose the metric first. Entropy needs only P, while cross-entropy, KL divergence, and JS divergence require both P and Q.
Paste values separated by commas, spaces, tabs, or new lines. Turn normalization on if you are entering counts or unscaled weights.
Read the main value beside the normalized vector preview so you can see whether zeros, imbalance, or normalization changed the interpretation.

Discrete uncertainty and distribution mismatch

Use entropy to describe uncertainty inside one vector. Use cross-entropy, KL divergence, or JS divergence when you need to compare how one distribution differs from another.

Inputs

Probability vector P

Probability vector Q

Cross-entropy, KL divergence, and JS divergence require both P and Q.

Choose a metric, paste P and optionally Q, then run the calculation.

Normalized vector preview

Index	P used	Q used	Midpoint M

No normalized vector preview yet.

How to read entropy and divergence

Entropy measures uncertainty inside one distribution. If probability mass is spread fairly evenly, entropy rises. If one category dominates, entropy drops.

Entropy vs cross-entropy

Cross-entropy compares two distributions. It tells you how costly it would be to encode outcomes from P while pretending the world follows Q. If Q tracks P closely, cross-entropy stays near the entropy of P.

KL divergence is directional

KL divergence is not a distance in the usual symmetric sense. KL(P||Q) and KL(Q||P) ask different questions because the weighting comes from the distribution on the left.

Zero probabilities matter

If Q gives zero probability to an event that still appears in P, cross-entropy and KL divergence become infinite. This is why smoothing or explicit small probabilities often matter in practical modeling workflows.

Frequently asked questions

What is the difference between entropy and cross-entropy?

Entropy measures the uncertainty inside one distribution P. Cross-entropy measures how many bits or nats are needed when data follow P but you encode with Q, so it depends on both distributions.

Why is KL divergence not symmetric?

KL divergence weights the mismatch by the reference distribution on the left. KL(P||Q) asks how surprising Q is when outcomes follow P, while KL(Q||P) asks the reverse question. Those weights are usually different.

What happens if Q gives zero probability to an outcome that still appears in P?

Cross-entropy and KL divergence become infinite because Q would need an impossible code length for an event that still has positive mass in P. This page shows that case explicitly instead of hiding it.

Does the share URL include my vectors?

No. The share URL stores only lightweight settings such as metric, base, normalization, and decimal places. P and Q stay in your browser.

Comments (optional)

To reduce load, comments are fetched only when needed.