ConfidenceIntervals

Confidence Intervals: calculating and plotting them with R

I think it is sound to be honest and clear about the limits of the precision in the statistics you publish. One way of doing this is to include information about the confidence intervals. This is mostly a note to help myself to recall how to do this in R.

Dichotomous variables

Below is the code that produces figure 1.

misstankta.summary.function <- function () {
  load("/home/hans/text/datamängder/misstankta.RData")
  prop.fire.setters.by.age <- 100*prop.table(as.vector(table(misstankta$Ålder[misstankta$brand == T])))
  prop.non.fire.setters.by.age <- 100*prop.table(as.vector(table(misstankta$Ålder[misstankta$brand == F])))
  no.of.fire.setters <- length(which(misstankta$brand == T))
  no.of.non.fire.setters <- length(which(misstankta$brand == F))
  list(no.of.non.fire.setters=no.of.non.fire.setters, no.of.fire.setters=no.of.fire.setters, prop.non.fire.setters.by.age=prop.non.fire.setters.by.age, prop.fire.setters.by.age=prop.fire.setters.by.age)
}
misstankta.summary.results <- misstankta.summary.function()

plot(15:20, misstankta.summary.results$prop.fire.setters.by.age, type = "l", ylab = "Proportion of suspects", xlab = "Age", col = "blue", ylim = c(0,31))
lines(15:20, misstankta.summary.results$prop.non.fire.setters.by.age, col = "red")

This figure is based on proportions, thus requiring a bit extra effort to calculating confidence intervals which are based on absolute numbers. Here are the absolute numbers involved:

as.vector(table(misstankta$Ålder[misstankta$brand == T]))
[1] 483 402 198 215 164 117
as.vector(table(misstankta$Ålder[misstankta$brand == F]))
[1] 425 388 392 345 281 269

For the first year, we have 483 and 425 respectively, for the second year 402 and 388 and so on. For a dichotomous variable, one can use the binom.test() to get the confidence intervals:

binom.test(c(483,425), p = 483/(483+425))

	Exact binomial test

data:  c(483, 425)
number of successes = 483, number of trials = 908, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5319383
95 percent confidence interval:
 0.4988732 0.5647956
sample estimates:
probability of success
             0.5319383

While the terminology in the output of binom.test() might confuse some, my interpretation is like this: If you get 483 fire-setters out of a sample 908 juvenile delinquents, then the confidence intervals for that measurement is 0.4988732 * 908 and 0.5647956 * 908, respectively (all this at a 0.95 confidence level).

To get absolut integers rather then probabilities, use:

test.result <- binom.test(c(483,425), p = 483/(483+425))
round(test.result$conf.int[1] * (483+425))
[1] 453
> round(test.result$conf.int[2] * (483+425))
[1] 513

But in this case, the probabilities was exactly what we wanted, since they are on the same scale as the proportions already displayed.

for(i in 1:length(a)){ tmp <- binom.test(c(a[i], b[i])) ; ci.low[i] <- tmp$conf.int¹ ; ci.high[i] <- tmp$conf.int² } library(gplots)

plotCI(x = 1:length(success), y = success, li=my.conf.int.vector[1,], ui=my.conf.int.vector[2,], ylab = my.ylab, xlab = my.xlab, xaxt="n")

Continous variables

Let us say that we would like a confidence interval for the total number of delinquents in various ages. The measured data are as follows:

## success <- table(week.map[skolbränder$"Tid vecka"])
## my.ylab = "Antal Bränder"
## my.xlab = "Två-veckorsperiod"

success <- c(908, 790, 590, 560, 445, 386)
names(success) <- c(15, 20, 25, 30, 35, 40) # must be numerical
my.ylab = "Antal mordbrandsdömda"
my.xlab = "Ålder"

Here, the variable is not dichotomous so binom.test is not appropriate. Rather, prop.test() should be used. For each measurement, the confidence interval is based on the relation between that number and the total number of cases. The following function can be plotted with the function plotCI from the package gplots. A limitation here is that the names must be numerical, unlike when plot.table is used.

## usage:
## success <- c(34, 55, 19, 82)
## names(success) <- c(15, 20, 25, 30)
## my.plot.conf.int.continous(success)

my.plot.conf.int.continous <- function(success, ...) {
  require(gplots)
  n <- sum(success)
  my.conf.int.vector <- sapply(1:length(success), function(x) {
    prop.test(success[x], n)$conf.int*n
  })
  plotCI(x = as.numeric(names(success)), y = success, li=my.conf.int.vector[1,], ui=my.conf.int.vector[2,], ...)
}

my.plot.conf.int.continous(success, barcol="red", lwd="2", type = "b", xlab = "", ylab = "Self-reported fire-setters, percent",  main = paste("Prevalence of fire-setting during 1995-2005,", "\n", "general population at age 15"), col.main = "blue", font.main = 1)

comments powered by Disqus

Back to the index

Blog roll

R-bloggers, Debian Weekly

Last modified: oktober 17, 2019