Calculating Confidence Intervals in R
Confidence intervals show up everywhere in statistics. They allow us to express estimated values from sample data with some degree of confidence by providing an interval likely to contain the true population parameter we’re trying to estimate. There are several ways to calculate them, depending on the context.
This article is about the general case of confidence intervals for sample estimates and how to calculate them in R. I will not talk about plus four confidence intervals, confidence intervals for mean or individual responses, etc. It is also not intended to explain in detail what a confidence interval is or the statistical theory behind it.
Confidence Intervals for Proportions, M.I.A.
Most of the time, you’ll probably write your own code for calculating confidence intervals for proportions since you’ll typically have just two values, a sample size (\(n\)) and sample proportion (\(\hat{p}\)). However, if you have a data frame with a categorical variable, you can leverage built-in R functions.
Let’s consider a Gallup poll from October 2010 in which U.S. adults were asked “Generally speaking, do you believe the death penalty is applied fairly or unfairly in this country today?” The sample size is not listed on the website but according to STATS: Data and Models (Voeux, 2019), it was 510. The poll results were as follows (percentages add up to 101% due to rounding error):
Fairly | Unfairly | No opinion |
---|---|---|
58% | 36% | 7% |
Given the above data, what is the 95% confidence interval for the proportion of U.S. adults that believed the death penalty was applied fairly when that poll was taken?
With appropriate assumptions and conditions, the sampling distribution of a proportion is normally distributed so we use a critical value (\(z^*\)) of the standard normal distribution to determine how many standard errors to consider for each side of the confidence interval (CI). Ninety-five percent of the standard normal distribution lies between the critical values -1.96 to 1.96. This is a well-known approximation but I will use a more precise value in my calculations in order to compare them with results from some R functions that calculate CIs.
From the Gallup poll, we have: \(n=510\) and \(\hat{p}=0.58\). Here is some R code that calculates the interval, using SE
for standard error, ME
for margin of error and z_star
for Z critical value.
> library(glue)
>
> n <- 510
> p <- 0.58
>
> SE <- sqrt(p * (1 - p) / n)
> z_star <- qnorm(1 - (1 - 0.95) / 2)
> ME <- z_star * SE
>
> glue("({p - ME}, {p + ME})")
(0.537164716561037, 0.622835283438962)
Let’s do this using R’s built-in confint
and lm
.
sample <- c(rep(1, .581*n), rep(0, .42*n))
confint(lm(sample ~ 1), level = 0.95)
2.5 % 97.5 %
(Intercept) 0.5374182 0.6233661
A few things to note:
You may have noticed that I used 0.581
instead of 0.58
. More on that later. You may have also noticed that the values are close to what we previously calculated “by hand” but not the same. There’s more than one reason for this. First, I’ll explain what I did, then point out the differences with this method.
I used confint
to calculate the confidence intervals. The first parameter to confint
is a fitted model object. Since I fitted an lm
model, R invokes the appropriate version of confint
that’s available for lm
objects, namely confint.lm
. This fact is not too important; it just means that the behaviour of confint
can change depending on the fitted model. So, in order to fit an lm
model, I created a vector with 510 entries, 58% of them being ones, the rest zeros. The formula sample ~ 1
represents a null model. This allows a response with no predictors and has some interesting properties.
> summary(lm(sample ~ 1))
Call:
lm(formula = sample ~ 1)
Residuals:
Min 1Q Median 3Q Max
-0.5804 -0.5804 0.4196 0.4196 0.4196
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.58039 0.02187 26.53 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.494 on 509 degrees of freedom
The estimate of this model is the expected value of the ones and zeros, 0.58039
(not exactly 58% but close enough) and the residual standard error of 0.02187
is very close to the standard error we previously calculated:
> SE
[1] 0.02185514
So why are the calculations from confint
and lm
slightly off? Well, for one, it’s not possible to split 510 into exactly 58% and 42% so our original calculations introduced some rounding error. I also used 0.581
to ensure that I ended up with 510 elements in the sample
vector and a proportion of ones as close to 58% as possible.
Another difference is that calculations in lm
are based on the t-distribution. The differences between the normal and t-distributions are negligible for large sample sizes and even for our sample size of 510, it didn’t affect our confidence interval for the first 3 decimal places.
Let’s try to reproduce what confint
and lm
did.
> p <- mean(sample)
> SE <- sd(sample) / sqrt(n)
> t_star <- qt(1 - (1 - 0.95)/2, df = n - 1)
> ME <- t_star * SE
> glue("({p - ME}, {p + ME})")
(0.537418167397406, 0.623366146328085)
This is the same confidence interval that confint
returned. Finally, let’s redo our original calculations based on the more precise value of \(\hat{p}\) from the data (as calculated above).
SE <- sqrt(p * (1-p) / n)
z_star <- qnorm(1 - (1 - 0.95) / 2)
ME <- z_star * SE
glue("({p - ME}, {p + ME})")
(0.537562403935917, 0.623221909789574)
Still only the same up to 3 decimal places. Something to think about if you decide to use confint
with lm
when dealing with proportions.
broom::tidy
The tidy
function from the broom
package can also calculate confidence intervals. It functions very similarly to confint
in that it can handle different types of objects. We’ll use lm
again to compare.
(summary <- broom::tidy(lm(sample ~ 1), conf.int = TRUE, conf.level = 0.95))
# A tibble: 1 x 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.580 0.0219 26.5 4.78e-98 0.537 0.623
Since we can only see to 3 d.p. it’s not obvious whether the calculations are based on the normal or t-distribution but it’s the latter.
summary$conf.low; summary$conf.high
[1] 0.5374182
[1] 0.6233661
The End
This blog post was originally intended to define confidence intervals and their nuances, discuss different types of confidence intervals, as well as bootstrapping confidence intervals for non-normally distributed data. However, as I came across many articles on the Intenet that did a good job of explaining confidence intervals, I decided to focus on R. Over the last eight months of doing anaylses in R, I had in my mind that there were many different packages and functions that I used for CI calculations in R. After going through all my past notebooks while writing this article, I realized that while I calculated many CIs, they were mostly based on confint
and broom::tidy
. The conf.int
parameter showed up in two other places, quantile
and autoplot
but only when using specific packages that dealt with survival analysis and time series. I thought I’d share this, in case you felt the article was a bit short and less than satisfying. :)
Reference
De Veaux, R.D., P.F. Velleman, D.E. Bock, A.M. Vukov, and A.C.M. Wong. 2018. Stats: Data and Models, Third Canadian Edition. Pearson Education Canada. https://books.google.ca/books?id=muxnswEACAAJ.