Confidence Interval Calculator
Build a CI for a mean or proportion â with margin of error broken out.
Confidence Intervals: What the Number Actually Tells You (And What It Doesn't)
There is arguably no statistic more routinely misread than the confidence interval. Researchers write papers around them, news outlets cite them, and product managers paste them into slide decks â and yet the majority of people reading those numbers hold an interpretation that is technically wrong. Understanding what a confidence interval actually guarantees, how the machinery behind it works, and when to reach for a z-distribution versus a t-distribution separates competent data analysis from statistical theater.
The Frequentist Promise
A 95% confidence interval does not mean "there is a 95% probability the true parameter lies inside this interval." In frequentist statistics, the true population mean Ξ (or proportion p) is a fixed, unknown constant â it is not a random variable, so it cannot have a probability of being in a range. The interval itself is the random object, constructed fresh from each sample you draw.
The correct statement is procedural: if you repeated your sampling procedure and interval construction an infinite number of times, 95% of the resulting intervals would contain the true parameter. For any single interval you've already built, the true value is either inside it or it isn't. The 95% is a long-run property of the method, not a single-trial probability. This distinction matters enormously if you're making decisions based on whether a parameter is "probably" in a range.
The Anatomy of a Confidence Interval
Every confidence interval for a mean or proportion follows the same skeleton:
CI = Point Estimate Âą Margin of Error
where the margin of error itself decomposes as:
ME = Critical Value à Standard Error
Breaking it down this way is not just pedagogical bookkeeping. Each component tells you something actionable. The standard error reflects how much natural sampling variability exists in your estimate â it shrinks as ân, which is why quadrupling your sample size only halves your interval width. The critical value encodes your chosen confidence level â higher confidence demands a larger critical value, which widens the interval. You cannot simultaneously achieve high confidence and a narrow interval without collecting more data.
Z-Distribution vs. T-Distribution: The Correct Choice
The decision between using a z critical value and a t critical value hinges on two things: whether you know the population standard deviation Ï, and your sample size.
In practice, Ï is almost never known. When you estimate it from your sample using s, you introduce an additional layer of uncertainty. The t-distribution accounts for this by having heavier tails than the normal â those tails represent the extra uncertainty from estimating Ï. The degrees of freedom parameter (df = n â 1) governs how heavy those tails are. With df = 4 (n = 5), the 95% critical value is 2.776, compared to z = 1.960. With df = 29 (n = 30), it's already 2.042 â quite close to the z value. By the time df exceeds roughly 120, the t and z distributions are functionally indistinguishable.
The practical takeaway: always use the t-distribution when you're estimating Ï from sample data, particularly with smaller samples. Using z when you should use t will produce intervals that are slightly too narrow â they'll contain the true parameter less often than the stated confidence level promises.
For proportions, the standard approach uses the z-distribution because the standard error formula â(pĖ(1âpĖ)/n) estimates the variability directly from the proportion itself. But this normal approximation only performs well when both np and n(1âp) are at least 10. If you're studying a rare event â say, a defect rate of 1% with a sample of 50 â the normal approximation breaks down and you need an exact method like the Clopper-Pearson interval.
Why Margin of Error Is the Number Worth Obsessing Over
Journalists and analysts often report confidence intervals without isolating the margin of error, which is a missed opportunity. The margin of error is what you control through study design. Want a margin of error of Âą2 percentage points at 95% confidence for a proportion near 50%? You need approximately 2,401 observations. Cut your budget in half to 600 observations and your margin expands to Âą4 points â which might render your survey useless for detecting differences between subgroups.
The formula for the required sample size to achieve a target margin of error E for a proportion is:
n = (z* / E)Âē Ã pĖ(1 â pĖ)
When you have no prior estimate of p, using pĖ = 0.5 maximizes the expression pĖ(1 â pĖ) and gives the most conservative (largest) required n. This is standard practice in pre-study power calculations.
Common Errors That Produce Wrong Intervals
Using the population standard deviation formula (dividing by n) instead of the sample standard deviation (dividing by n â 1) when computing s is a frequent mistake that produces a biased standard error. The Bessel correction (the n â 1 denominator) exists precisely because sample variance computed with n in the denominator systematically underestimates ÏÂē.
A subtler error is applying a two-sided interval critical value when you actually want a one-sided bound. If you're only interested in whether the mean exceeds a threshold â say, whether a drug's effectiveness is above 40% â you want a one-sided lower confidence bound using z = 1.645 (for 95%), not the two-sided 1.960. The two-sided interval is more conservative in both directions simultaneously, which is not always what the decision requires.
Finally, confusing statistical significance with practical significance is endemic. A confidence interval that excludes zero (or any other null value) tells you the effect is detectable with your sample size. It says nothing about whether the effect is large enough to matter. A drug that reduces blood pressure by 0.3 mmHg might produce an interval of (0.1, 0.5) â statistically significant, clinically irrelevant.
Interpreting the Output in Context
When you calculate a confidence interval, the width of that interval is itself informative. A very wide interval â say, a mean income estimate of ($35,000, $85,000) â is honest, not embarrassing. It's telling you the data are too sparse or too variable to pin down the parameter precisely. Reporting such an interval correctly communicates uncertainty. Refusing to calculate the interval because the data are thin, or cherry-picking a different confidence level to make the interval look narrower, is analytical malpractice.
A narrow interval that excludes a practically meaningful threshold is your clearest signal. If you're testing whether the average response time of an API is under 200 ms, and your 99% interval is (145 ms, 178 ms), you've demonstrated with very high confidence that performance meets the target. The margin of error â roughly Âą16 ms â tells you how precisely you've pinned down the true mean, and the confidence level tells you how reliable that precision is.
The confidence interval is not magic. It is a rigorous, reproducible summary of what a sample can tell you about a population, given explicit assumptions about distributional form and sampling independence. Use it with those assumptions in mind, decompose the margin of error to understand your precision, and choose z versus t based on what you actually know about Ï â and you'll be using one of statistics' most powerful tools correctly.