Using a chi-square to see if two or more proportions are equal

The default chisquare analyses test if two variables are independent. In particular the expected frequencies in the i-jth cell for a test of independence are $$\frac{N_text{i+}N_text{+j}}{N}$$ where $$N_text{i+}$$ observations are in the i-th row and $$N_text{+j}$$ observations in the j-th column with a total number of observations (obtained summing either row or column totals) of N.

Now suppose we wish to test if the proportions in each of the columns (or rows) are equally spread across the rows.

Expected values here are the average numbers in each row or column. Hence, for example, the expected value in each cell in row i is $$\sum_text{j}N_{ij}/J$$ assuming equal numbers occur in each of the J columns.

We can test both the independence and equality hypotheses by computing Pearson's chi-square equal to $$ \sum_{i,j} (O_text{ij}-E_text{ij})^text{2}/E_text{ij} $$ and comparing it to a chi-square with (I-1)(J-1) degrees of freedom.

Example

Brain-damaged

Normal

Young

23

12

Old

13

11

Suppose we wish to see if similar proportions of young and old people are either brain damaged or have normal healthy brains.

Assuming that people with brain damage and those with undamaged brains are randomly allocated (with probability of 0.5) to each age group the expected values for both 'Young' cells are (23+12)/2=17.5 and (13+11)/2=12 for the two 'Old' cells. This gives a Pearson chi-square of $$\frac{2x5.5x5.5}{17.5} + \frac{2x1x1}{12}$$ = 3.62 < 3.84 = chi-square on 1 df.

So we conclude there is no evidence to suggest that there are unequal proportions of brain injured and normals in either the young or old groups.

You can also obtain a 95% Confidence Interval for the difference in two independent proportions using an asymptotic method here. This calculator uses confidence intervals of form $$p1 - p2 \pm z(\alpha_text{0.025})$$ sqrt[se2 (p1) + se2 (p2))] with se(pk) = [pk(1-pk)]/[nk].

Alternatively an odds ratio may be computed using the above website with a confidence interval of form OR $$\pm z(\alpha_text{0.025}) se(OR)$$ where se(OR) is approximately equal to the odds ratio times the square root of the sum of the reciprocals of the four frequencies in the 2x2 table ie OR $$ \sqrt{ \sum_{ij} 1/n_text{ij}}$$. This uses a result in Everitt BS (1996) and a mathematical procedure called the delta method. The delta method may be used to evaluate standard errors of a transformed statistic such as, $$1/\bar{x}$$, using the standard error of the untransformed version, which would be $$\bar{x}$$ in this case. For some examples see here.

Some SPSS syntax for 95% CIs is available here taken from here.

References

Everitt BS (1996) Making sense of statistics in psychology. OUP. The method used to calculate a confidence interval for the difference

Newcombe RG (1998). Interval estimation for the difference between independent proportions: Comparison of eleven methods. Statistics in Medicine, 17, 873-890.