What is a Dirichlet distribution?
A Dirichlet distribution is a distribution over probability vectors (x1,…,xK) where each component is non‑negative and the total sums to 1. This space is called a simplex.
- α (alpha) can be interpreted like pseudo‑counts. The relative sizes of α determine the mean vector.
- α0 = Σα_i is the concentration (strength): larger α0 ⇒ tighter around the mean; smaller α0 ⇒ more variability.
- If some α_i < 1, samples tend to be sparse and stick to corners/edges; if all α_i > 1, mass is often inside the simplex.
- K=2 is a special case:
x1 ~ Beta(α1,α2)(this tool shows the Beta overlay and links to the Beta tool).
Common use cases: Bayesian priors for categorical probabilities, topic proportions, mixture weights, and probability‑like test data. You don’t need to enter personal information to use it.
Presets
Pick a practical preset (it regenerates instantly; you can tweak after applying).
Tip: For large K, use profile JSON for sharing instead of long URLs.
Generator
Choose a parameterization, generate samples, then inspect means, marginals, and diagnostics.
Per-component stats
| Component | Theory mean | Sample mean | Theory var | Sample var |
|---|
Samples preview (first 20)
Profile JSON (save/restore settings)
Share URLs contain settings only. For large K, use profile JSON to save/restore without long URLs.
Tip: Don’t include confidential labels (customer names, etc.) in shared profiles.
How to use this tool
Use this page to generate probability vectors that must stay non-negative and sum to 1.
Use in 3 steps
- Start with a small dimension such as
K=3and a preset that is easy to interpret. - Generate the sample, then review theoretical means, marginals, and row previews together.
- Change one
αvalue or the total concentration at a time so you can isolate mean shifts from concentration shifts.
How to read the result
Each row is one probability vector. The means show the expected share of each component, while the concentration controls how tightly samples stay near that mean. Because all components must sum to 1, increases in one component reduce room for others.
Boundary checks
- If any
α_i<1, expect more mass near simplex corners or edges. - Rounded exports can make a displayed row look slightly different from an exact sum of 1.
- When
K=2, compare with the Beta tool because it is the matching special case.