I think it is sound to be honest and clear about the limits of the precision in the statistics you publish. One way of doing this is to include information about the confidence intervals. This is mostly a note to help myself to recall how to do this in R.
Below is the code that produces figure 1.
misstankta.summary.function <- function () { load("/home/hans/text/datamängder/misstankta.RData") prop.fire.setters.by.age <- 100*prop.table(as.vector(table(misstankta$Ålder[misstankta$brand == T]))) prop.non.fire.setters.by.age <- 100*prop.table(as.vector(table(misstankta$Ålder[misstankta$brand == F]))) no.of.fire.setters <- length(which(misstankta$brand == T)) no.of.non.fire.setters <- length(which(misstankta$brand == F)) list(no.of.non.fire.setters=no.of.non.fire.setters, no.of.fire.setters=no.of.fire.setters, prop.non.fire.setters.by.age=prop.non.fire.setters.by.age, prop.fire.setters.by.age=prop.fire.setters.by.age) } misstankta.summary.results <- misstankta.summary.function() plot(15:20, misstankta.summary.results$prop.fire.setters.by.age, type = "l", ylab = "Proportion of suspects", xlab = "Age", col = "blue", ylim = c(0,31)) lines(15:20, misstankta.summary.results$prop.non.fire.setters.by.age, col = "red")
This figure is based on proportions, thus requiring a bit extra effort to calculating confidence intervals which are based on absolute numbers. Here are the absolute numbers involved:
as.vector(table(misstankta$Ålder[misstankta$brand == T])) [1] 483 402 198 215 164 117 as.vector(table(misstankta$Ålder[misstankta$brand == F])) [1] 425 388 392 345 281 269
For the first year, we have 483 and 425 respectively, for the second year 402 and 388 and so on. For a dichotomous variable, one can use the binom.test()
to get the confidence intervals:
binom.test(c(483,425), p = 483/(483+425)) Exact binomial test data: c(483, 425) number of successes = 483, number of trials = 908, p-value = 1 alternative hypothesis: true probability of success is not equal to 0.5319383 95 percent confidence interval: 0.4988732 0.5647956 sample estimates: probability of success 0.5319383
While the terminology in the output of binom.test()
might confuse some, my interpretation is like this: If you get 483 fire-setters out of a sample 908 juvenile delinquents, then the confidence intervals for that measurement is 0.4988732 * 908 and 0.5647956 * 908, respectively (all this at a 0.95 confidence level).
To get absolut integers rather then probabilities, use:
test.result <- binom.test(c(483,425), p = 483/(483+425)) round(test.result$conf.int[1] * (483+425)) [1] 453 > round(test.result$conf.int[2] * (483+425)) [1] 513
But in this case, the probabilities was exactly what we wanted, since they are on the same scale as the proportions already displayed.
for(i in 1:length(a)){ tmp <- binom.test(c(a[i], b[i])) ; ci.low[i] <- tmp$conf.int1 ; ci.high[i] <- tmp$conf.int2 } library(gplots)
plotCI(x = 1:length(success), y = success, li=my.conf.int.vector[1,], ui=my.conf.int.vector[2,], ylab = my.ylab, xlab = my.xlab, xaxt="n")
Let us say that we would like a confidence interval for the total number of delinquents in various ages. The measured data are as follows:
## success <- table(week.map[skolbränder$"Tid vecka"]) ## my.ylab = "Antal Bränder" ## my.xlab = "Två-veckorsperiod" success <- c(908, 790, 590, 560, 445, 386) names(success) <- c(15, 20, 25, 30, 35, 40) # must be numerical my.ylab = "Antal mordbrandsdömda" my.xlab = "Ålder"
Here, the variable is not dichotomous so binom.test is not appropriate. Rather, prop.test()
should be used. For each measurement, the confidence interval is based on the relation between that number and the total number of cases. The following function can be plotted with the function plotCI
from the package gplots
. A limitation here is that the names must be numerical, unlike when plot.table
is used.
## usage: ## success <- c(34, 55, 19, 82) ## names(success) <- c(15, 20, 25, 30) ## my.plot.conf.int.continous(success) my.plot.conf.int.continous <- function(success, ...) { require(gplots) n <- sum(success) my.conf.int.vector <- sapply(1:length(success), function(x) { prop.test(success[x], n)$conf.int*n }) plotCI(x = as.numeric(names(success)), y = success, li=my.conf.int.vector[1,], ui=my.conf.int.vector[2,], ...) } my.plot.conf.int.continous(success, barcol="red", lwd="2", type = "b", xlab = "", ylab = "Self-reported fire-setters, percent", main = paste("Prevalence of fire-setting during 1995-2005,", "\n", "general population at age 15"), col.main = "blue", font.main = 1)