Frequency Test Wizard

We have not tested the accuracy of these calculations. Use these tools cautiously.
Compute confidence intervals and significance tests for corpus frequencies. Inspired by the SIGIL Corpus Frequency Test Wizard by Baroni & Evert, and the Lancaster Log‑Likelihood Wizard by Paul Rayson.

One sample: frequency estimate (confidence interval)

Wilson CI + optional H₀ significance test

Given a frequency count k in a sample of size n, estimate the underlying proportion p using a Wilson score confidence interval, and optionally test the null hypothesis H₀: p = p₀.

Two samples: frequency comparison

Wilson CIs + log-likelihood (G²) test

Compare two corpora/subcorpora using the log‑likelihood (G²) test (Dunning 1993; the same statistic Rayson's LL wizard uses). Wilson confidence intervals are also reported for each sample.

About the statistics

Wilson score interval: more accurate than the normal-approximation interval, especially for small n or extreme p (Wilson 1927).
One-sample test: z-statistic z = (k/n − p₀) / √(p₀(1−p₀)/n), with optional continuity correction.
Log-likelihood G² (Dunning 1993): G² = 2 Σ Oᵢ ln(Oᵢ/Eᵢ), distributed as χ² with 1 df. Same as the Lancaster LL wizard.
References: SIGIL Wizard, Rayson LL Wizard.