2-way Contingency Table Analysis

Revised: 07/23/2013 -- added NNM -- the Number Needed to Mis-diagnose (thanks to Farrokh Habibzadeh)

This page computes various statistics from a 2-by-2 table. It will calculate the Yates-corrected chi-square, the Mantel-Haenszel chi-square,  the Fisher Exact Test, and other indices relevant to various special kinds of 2-by-2 tables:

1. analysis of risk factors for unfavorable outcomes (odds ratio, relative risk, difference in proportions,  absolute and relative reduction in risk, number needed to treat)
2. analysis of the effectiveness of a diagnostic criterion for some condition (sensitivity, specificity, pos & neg predictive values, pos & neg likelihood ratios, diagnostic  and error odds ratios)
3. measures of inter-rater reliability (% correct or consistent, mis-classification rate, kappa, Forbes' NMI)
4. other measures of association (contingency coefficient, Cramer's phi coefficient, Yule's Q)

Confidence intervals for the estimated parameters are computed by a general method (based on "constant chi-square boundaries") given in: Statistical Methods for Rates and Proportions (2nd Ed.) Section 5.6,  by Joseph L. Fleiss (Pub: John Wiley & Sons, New York, 1981). This method is also described in Numerical Recipes in C (2nd Ed.) Section 15.6, by William H. Press et al. (Pub: Cambridge University Press, Cambridge UK, 1992)

Enter numbers into the four cells below. Make sure that the row and column totals add up correctly. Then click the Compute button.

Warning: Do not enter cell counts with a leading zero! That is, if a cell count is 34, enter it as 34, not as 034.  Some browsers will mis-interpret some numbers entered with leading zeros, and will produce wrong results (with no warning message). For more information about this, and for other things to be aware of before using this page for the first time, make sure you read the JavaStat user interface guidelines.

Observed Contingency Table

 * Outcome Occurred Outcome did not Occur Totals Risk Factor Present or Dx Test Positive = a = b = r1 Risk Factor Absent or Dx Test Negative = c = d = r2 Totals = c1 = c2 = t

Confidence Level: %

Chi-Square Tests
 Type of Test Chi Square d.f. p-value Pearson Uncorrected Yates Corrected Mantel-Haenszel

Fisher Exact Test
 Type of comparison (Alternate Hypothesis) p-value Two-tailed (to test if the Odds Ratio is significantly different from 1): If you don't know which Fisher Exact p-value to use, use this one. This is the p-value produced by SAS, SPSS, R, and other software. Left-tailed (to test if the Odds Ratio is significantly less than 1): Right-tailed (to test if the Odds Ratio is significantly greater than 1): Two-tailed p-value calculated as described in Rosner's book: (2 times whichever is smallest: left-tail, right-tail, or 0.5) It tends to agree closely with Yates Chi-Square p-value. Probability of getting exactly the observed table: (This is not really a p-value; don't use this as a significance test.) Verification of computational accuracy: (This number should be very close to 1.0; the closer, the better.)

Quantities derived from a 2-by-2 table
 Quantities Derived from the 2-by-2 Contingency Table Value Odds Ratio (OR) = (a/b)/(c/d); Relative Risk (RR) = (a/r1)/(c/r2); Kappa Overall Fraction Correct = (a+d)/t ; (often referred to simply as "Accuracy") Mis-classification Rate, = 1 - Overall Fraction Correct; Sensitivity = a/c1; (use exact Binomial confidence intervals instead of these) Specificity = d/c2; (use exact Binomial confidence intervals instead of these) Positive Predictive Value (PPV) = a/r1; (use exact Binomial confidence intervals instead of these) Negative Predictive Value (NPV) = d/r2; (use exact Binomial confidence intervals instead of these) Difference in Proportions (DP) = a/r1 - c/r2; Number Needed to Treat (NNT) = 1 / absolute value of DP; which = 1 / absolute value of ARR; Absolute Risk Reduction (ARR) = c/r2 - a/r1; which = - DP Relative Risk Reduction (RRR) = ARR/(c/r2); Positive Likelihood Ratio (+LR) = Sensitivity / (1 - Specificity); Negative Likelihood Ratio (-LR) = (1 - Sensitivity) / Specificity; Diagnostic Odds Ratio = (Sensitivity/(1-Sensitivity))/((1-Specificity)/Specificity); Error Odds Ratio = (Sensitivity/(1-Sensitivity))/(Specificity/(1-Specificity)); Youden's J = Sensitivity + Specificity - 1; Number Needed to Diagnose (NND) = 1 / (Sensitivity - (1 - Specificity) ) = 1 / (Youden's J); Number Needed to Mis-diagnose (NNM) = 1 / ( 1 - Accuracy ); Forbes' NMI Index; Contingency Coefficient; Adjusted Contingency Coefficient; Tetrachoric (terachoric) Correlation Coefficient = Cos( Pi / (1 + Sqrt( OR ) ) ); Phi Coefficient (= Cramer's Phi, and = Cohen's w Index, for 2x2 table); Yule's Q = (a*d-b*c)/(a*d+b*c) = (OR - 1) / (OR + 1); Equitable Threat Score = (a-e)/(a+b+c-e), where e = r1*c1/t; Entropy H(r) = - ( (r1/t)log2(r1/t) + (r2/t)log2(r2/t) ) Entropy H(c) = - ( (c1/t)log2(c1/t) + (c2/t)log2(c2/t) ) Entropy H(r,c) = - ( (a/t)log2(a/t) + (b/t)log2(b/t) + (c/t)log2(c/t) + (d/t)log2(d/t) ) Information shared by descriptors r and c: B = H(r) + H(c) - H(r,c) A = H(r,c) - H(r) C = H(r,c) - H(c) Similarity of descriptors r and c: S(r,c) = B / (A + B + C) Distance between r and c: D(r,c) = (A + C) / (A + B + C)

If you don't see your favorite "quantity" in this list,
drop me a line and let me know how that quantity is calculated from the four cell counts,
and I'll add it to the collection!

Or you can calculate the limits for any derived quantity yourself!  Here's how...

This is the lower limiting table...

And this is the upper limiting table...

If you use these numbers, instead of your observed numbers, in the formula for any derived quantity, you'll get the lower and upper confidence limits for that quantity.

(The row and column sums for these tables are the same as for your observed table.)

Reference: Bernard Rosner, Fundamentals of Biostatistics, 6th Ed., 2006