What are Confidence Intervals

There are two broad branches of statistics: Descriptive statistics and inferential statistics. Rather sensibly, descriptive statistics is about describing something, and inferential statistics is about making instances.

Suppose, for instance you were interested in the heights of children in 4th grade. If you only wanted to know how tall your child was relative to all the children in his or her class, you could use descriptive statistics – you could measure every child, find the average, the standard deviation, the quartiles an so on, and there would be no need for confidence intervals. But suppose you wanted to say something about all schoolchildren in the United States. You couldn’t possibly measure every child! So, in this situation, you take a sample. The most important thing here is to make it a random sample.

Suppose you take a random sample of 100 4th graders (how to do this would be for another article). You can take the mean, the standard deviation, and so on, but these will be true only for your sample. It won’t exactly match the numbers for the whole population – but we want to be able to say something about that population, and, thanks to statistics, we can. If we have a random sample, we can calculate a confidence interval around the mean, and be reasonably sure that the mean of the entire population is in that confidence interval.

The method for constructing the confidence interval depends on what statistic you are interested in. Here, we might be interested in the mean. If we were doing a poll about an upcoming election, we would be interested in a proportion. If we were trying to find the differences in income between men and women, we might want to look at the results of a t-test. In all of these cases, and many more, we can find a confidence interval.

I can’t cover all of these in this article, but here is how it is done for the mean.

Step 1: Find the mean of your sample. Let’s say, for your sample of 100 4th graders, it is 48 inches

Step 2: Find the standard deviation of your sample. Let’s say, here, it is 3.2 inches

Step 3: Find the standard error. This is just the standard deviation divided by the square root of the sample size. Here, we have a sample size of 100, the square root of 100 is 10, and 3.2/10 = 0.32, and that’s the standard error.

Step 3a: if the sample size (the number of subjects) is above about 30, and we want a 95% confidence interval, you can use this formula:

CI = mean +/- 1.96*standard error

Which, here, would be 48 +/- 1.96*.32 = 48 +/- 0.63 = 47.37 to 48.63

That means that we can be 95% sure that the population mean (the average height of all 4th graders in America) is between 47.37 inches and 48.63 inches.

Step 3b: If the sample size is less than about 30, you will need to substitute for 1.96 by using a t-table (or you can rely on an online calculator).

In general, you will probably rely on some statistical software to do this calculation for you; the important thing is not the exact formula (which can always be looked up in a statistics book, or online) but the concept: You take a random sample, and, from that, you make a guess about the population and you use a confidence interval to say how good that guess is.