A colleague presented a crosstable on the willingess of kinder-garten staff to report suspected child abuse
Kindergarten (percentages) Urban Rural Would report 67.7 37.5 Would not report 32.3 62.5
I wanted to enhance the analysis by adding the confidence intervals, for which I got the absolute numbers:
Urban Rural Total Kindergarten 31 24 55
To create R objects that represent this data, use:
K.urban <- round(c(.677, .323) * 31) K.rural <- round(c(.375, .625) * 24) my.matrix <- matrix(c(K.urban, K.rural), byrow = T, ncol = 2) rownames(my.matrix) <- c("Urban", "Tural") colnames(my.matrix) <- c("Would report", "Would not report")
Inspect the resulting table
my.matrix Would report Would not report Urban 21 10 Tural 9 15
Ensure that the original table of percentages can be reached from the proportions of my.matrix
:
sapply(1:2, function(x) { 100*round(my.matrix[x,]/sum(my.matrix[x,]),3) })
[,1] [,2]
Would report 67.7 37.5
Would not report 32.3 62.5
So, we have a good object to work on.
Now, for each estimate, what is the confidence interval at a confidence level of 0.95?
Firstly, the confidence intervals of the first item "Would report", is given by binom.test
:
binom.test(K.urban)$conf.int[1:2] > [1] 0.4862702 0.8331764 binom.test(K.rural)$conf.int[1:2] > [1] 0.1879929 0.5940636
Since 0.48 is less than 0.59, this sample cannot show a statistically signficant difference between the willingness to report cases of suspected child abuse in rural and urban kindergartens. Or?
chisq.test(my.matrix)
Pearson's Chi-squared test with Yates' continuity correction
data: my.matrix
X-squared = 3.8447, df = 1, p-value = 0.0499
Here, p < 0.05 thus we have a signficant difference.
The difference between the binom.test
and the chisq.test
lies in the size of n
. In the first case, n
is defined as the number of cases in each category rural (n=24) and urban (n=31). In the chisq.test, n
is defined as the total number of cases (n=55).
So, my idea of providing confidence intervals was a bad one. The Chi-square test is better since it does not require slizing the data into small (less useful) pieces.
prop.test
While the chi-square test shows that there is (most likely) a non-random correlation between context, or type of location, and willingness to report, it does not specifically test the hypothesis that the willingness to report is lesser in rural than in urban kindergartens (or, differently stated, higher in urban than in rural kindergartens). To test that hypotesis we need a one-sided prop.test
Since the first row of my.matrix
happens to be urban and reporting rather than not reporting is the first colum (the column representing "success" in test terminology), the question tested could be stated like this:
Is the willingness to report (proportion of "success") higher ("greater") in urban (first row) than in rural (second row) kindergartens?
prop.test(my.matrix, alternative = c("greater"))
2-sample test for equality of proportions with continuity correction
data: my.matrix
X-squared = 3.8447, df = 1, p-value = 0.02495
alternative hypothesis: greater
95 percent confidence interval:
0.05216612 1.00000000
sample estimates:
prop 1 prop 2
0.6774194 0.3750000
Indeed it is, the p-value now drops to 0.02.
If we, for some reason, want to express the hypothesis the other way around we must exchange the order of the rows.
my.matrix <- matrix(c(K.rural, K.urban), byrow = T, ncol = 2) rownames(my.matrix) <- c("Rural", "Urban") prop.test(my.matrix, alternative = c("less")) 2-sample test for equality of proportions with continuity correction data: my.matrix X-squared = 3.8447, df = 1, p-value = 0.02495 alternative hypothesis: less 95 percent confidence interval: -1.00000000 -0.05216612 sample estimates: prop 1 prop 2 0.3750000 0.6774194
I knew something was not quite right with the prop.test
, the help page of binom.test
has it: the prop.test
is an approximate test. the help page of prop.test
is not optimal here, it references the binom.test
if you want "an exact test of a binomial hypothesis". However, if you have two categories of units (two groups) and want to compare the "proportion of success" in these categories (or groups), what you want is really the fischer.test
(see, http://en.wikipedia.org/wiki/Fisher%27s_exact_test for a background).
fisher.test(my.matrix, alternative = c("less")) Fisher's Exact Test for Count Data data: my.matrix p-value = 0.02461 alternative hypothesis: true odds ratio is less than 1 95 percent confidence interval: 0.0000000 0.8414671 sample estimates: odds ratio 0.2927832
Reading up more on the Fisher's exact test, I realized that there exists another test that for small n
is probably even better than fishers exact test: Barnards test For an explanation of the advantages of the Barnard test see Conditional versus Unconditional Exact Tests for Comparing Two Binomials and Analysis of 2 x 2 tables of frequencies: matching test to experimental design.
Strangely enough, there is no implementation of the Barnard test in CRAN, but Tal Galili provides one here.
Barnardextest(my.matrix) $contingency.table A B A 21 10 B 9 15 $Wald.Statistic [1] 2.233813 $Nuisance.parameter [1] 0.8901 $p.value.one.tailed [1] 0.01558175
EDIT 2010-09-21 START: After reading "Testing the Equality of Two Independent Binomial Proportions", by Roderick J. A. Little, I have reverted back to Fischer's test again, Barnards test seems too dependent on ancillary statistics for my taste. Little retells an example originally given by Fischer in a correspondence to Barnard where the color of flowering plants was of interest. The plants could be red och white, or not develop flowers at all. Barnards test takes into account the number of flowers that actually flowered and the number of plants that did not flower, while Fischer's test is applied only on the plants that did flower. Fischer's point is that Barnards test rewards the inferior horticulturist who only brings 3/4 of all plants to flowering with a higher p value, but that reward is not earned (or persuasive). Consider a good horticulturist which brings all plants to flowering, who bring up for plants, and all of them becomes red. The p value of the hypothesis that the likelihood for red is higher then the likelyhood for white color is then (1/2)*(1/2)*(1/2)*(1/2) = 0.0625, for an horticulturist who only succeeds in 3 out of 4 plants the p value (of Barnards test) is (1/2)*(1/2)*(1/2)*(1/2)*(3/4)*(3/4)*(3/4)*(3/4) = 0.0198.
Barnard himself was persuaded by this argument of Fischer.
I will no more hold "Lacking Barnards test" against R. EDIT 2010-09-21 STOP
In this case, demonstrating a significant difference between rural and urban areas was quite sufficient, the data was from a pilot study.
But in other cases one would rather report some kind of effect measure rather then just if p
is below 0.05 or not. The output of the Fischer test includes confidence intervals for the odds-ratio, which is a relevant measure of effect.
fisher.test(my.matrix, alternative = c("less")) Fisher's Exact Test for Count Data data: my.matrix p-value = 0.02461 alternative hypothesis: true odds ratio is less than 1 95 percent confidence interval: 0.0000000 0.8414671 sample estimates: odds ratio 0.2927832
The odds for reporting in rural and urban areas respectively are defined as the proportion reporting / the proportion not reporting.
odds.rural <- .375/.625 odds.urban <- .677/.323 round(odds.rural,2) [1] 0.6 > round(odds.urban,2) [1] 2.1
The odds-ratio is - of course - defined as the ratio of these odds:
odds.ratio <- odds.rural/odds.urban
odds.ratio
[1] 0.2862629
But the value given for the odds ratio from the Fisher Exact test is a more elaborate measure, an estimate that is not so sensitive for small numbers in one cell of the table, see http://en.wikipedia.org/wiki/Odds_ratio#Alternative_estimators_of_the_odds_ratio.
TODO: Interpretation of the odds-ratio
prop.test()
or fisher.test()
.Barnards
test: 0.01558 and the other two, Fisher
: 0.02461 and prop.test
0.02495 are considerable.There was also another data set, where the differences were much smaller.
School (percentages) Urban Rural Would report 69.2 58.6 Would not report 30.8 41.4
And the absolute numbers were as follows:
Urban Rural Total School 39 29 68
S.urban <- round(c(.692, .308) * 39) S.rural <- round(c(.586, .414) * 29) my.matrix <- matrix(c(S.rural, S.urban), byrow = T, ncol = 2) fisher.test(my.matrix, alternative = c("less")) Fisher's Exact Test for Count Data data: my.matrix p-value = 0.2577 alternative hypothesis: true odds ratio is less than 1 95 percent confidence interval: 0.000000 1.651135 sample estimates: odds ratio 0.6340203
So, for schools this sample does not indicate any difference in willingness to report between rural and urban areas.