Frequency Test Wizard
We have not tested the accuracy of these calculations. Use these tools cautiously.
Compute confidence intervals and significance tests for corpus frequencies.
Inspired by the
SIGIL Corpus Frequency Test Wizard by Baroni & Evert, and
the
Lancaster Log‑Likelihood Wizard by Paul Rayson.
One sample: frequency estimate (confidence interval)
Wilson CI + optional H₀ significance test
Given a frequency count k in a sample of size n, estimate the underlying proportion p using a Wilson score confidence interval, and optionally test the null hypothesis H0: p = p0.
Two samples: frequency comparison
Wilson CIs + log-likelihood (G²) test
Compare two corpora/subcorpora using the log‑likelihood (G²) test (Dunning 1993; the same statistic Rayson's LL wizard uses). Wilson confidence intervals are also reported for each sample.
About the statistics
- Wilson score interval: more accurate than the normal-approximation interval, especially for small n or extreme p (Wilson 1927).
- One-sample test: z-statistic z = (k/n − p₀) / √(p₀(1−p₀)/n), with optional continuity correction.
- Log-likelihood G² (Dunning 1993): G² = 2 Σ Oᵢ ln(Oᵢ/Eᵢ), distributed as χ² with 1 df. Same as the Lancaster LL wizard.
- References: SIGIL Wizard, Rayson LL Wizard.