Result
PMF formula
P(X=k)=C(K,k)·C(N−K,n−k)/C(N,n)
Tip: “at least one” is P(X≥1)=1−P(X=0).
Distribution (PMF table & bar chart)
| k | P(X=k) | CDF |
|---|
Simulation (Monte Carlo)
Use trials and seed to reproduce runs. For large ranges, the tool bins the histogram to stay fast.
Worked examples & interpretation
What the inputs mean
- N: population size (total items).
- K: number of “successes” in the population (items you care about).
- n: number of draws (sample size).
- k: successes in your sample (the random variable X).
When to use this (vs. binomial)
- Hypergeometric: sampling without replacement (the success probability changes after each draw).
- Binomial: independent trials with a fixed success probability (often “with replacement” or a very large population).
- Rule of thumb: if your sampling fraction is small (n/N is small), hypergeometric and binomial with
p = K/Ncan be close.
Worked examples (try the presets)
Cards: draw 5 from 52, how likely to get exactly 2 aces?
Set N=52, K=4, n=5, query “Exactly” with k=2.
Inspection: 10 defectives in 100, sample 8 — probability of at least 1 defective?
Set N=100, K=10, n=8, query “At least” with k=1.
“At least one” quickly (complement trick)
Compute P(X ≥ 1) = 1 − P(X = 0). You can also use the helper button “P(X≥1)”.
Common pitfalls
- Invalid k: the support is
k_min=max(0,n−(N−K))andk_max=min(n,K). Outside it,P(X=k)=0. - Define “success” clearly: “success” is just the label for the item type you are counting (e.g., “defective”, “red”, “ace”).
- Large ranges: the PMF table can be omitted for performance; use the probability result and/or simulation.
References
FAQ
What is the hypergeometric distribution?
It models the number of successes when you draw n items from a finite population of size N with K successes, without replacement.
How is it different from the binomial distribution?
Hypergeometric sampling is without replacement (changing success probability), while binomial sampling assumes independent trials with a fixed success probability.
How do I find the valid range of k?
The support is k_min = max(0, n − (N − K)) and k_max = min(n, K). Outside this range, P(X=k)=0.
How do I compute “at least one success”?
Use the complement: P(X ≥ 1) = 1 − P(X = 0).
What does the simulation seed do?
A seed makes runs reproducible: the same seed produces the same simulated sequence and histogram.
How it’s calculated
C(K,k)·C(N−K,n−k)/C(N,n)computed in log-space for stability.k_min=max(0,n−(N−K)),k_max=min(n,K).