JavaStat -- 2-way Contingency Table Analysis

**Observed Contingency Table**
	Condition /disease
	Present	Absent	Totals
T e s t	Positive	= a (TP)	= b (FP)
Negative	= c (FN)	= d (TN)
	Totals

Chi-Square Tests
Type of Test	Chi Square	d.f.	p-value
Pearson Uncorrected
Yates Corrected
Mantel-Haenszel

Fisher Exact Test

Type of comparison (Alternate Hypothesis) p-value

Two-tailed (to test if the Odds Ratio is significantly different from 1):
If you don't know which Fisher Exact p-value to use, use this one.
This is the p-value produced by SAS, SPSS, R, and other software.

Left-tailed (to test if the Odds Ratio is significantly less than 1):

Right-tailed (to test if the Odds Ratio is significantly greater than 1):

Two-tailed p-value calculated as described in Rosner's book:
(2 times whichever is smallest: left-tail, right-tail, or 0.5)
It tends to agree closely with Yates Chi-Square p-value.

Probability of getting exactly the observed table:
(This is not really a p-value; don't use this as a significance test.)

Verification of computational accuracy:
(This number should be very close to 1.0; the closer, the better.)

Fisher Exact Test
Type of comparison (Alternate Hypothesis)	p-value
Two-tailed (to test if the Odds Ratio is significantly different from 1): If you don't know which Fisher Exact p-value to use, use this one. This is the p-value produced by SAS, SPSS, R, and other software.
Left-tailed (to test if the Odds Ratio is significantly less than 1):
Right-tailed (to test if the Odds Ratio is significantly greater than 1):
Two-tailed p-value calculated as described in Rosner's book: (2 times whichever is smallest: left-tail, right-tail, or 0.5) It tends to agree closely with Yates Chi-Square p-value.
Probability of getting exactly the observed table: (This is not really a p-value; don't use this as a significance test.)
Verification of computational accuracy: (This number should be very close to 1.0; the closer, the better.)

Quantities derived from a 2-by-2 table

Quantities Derived from the 2-by-2 Contingency Table Value

Odds Ratio (OR) = (a/b)/(c/d);

Relative Risk (RR) = (a/r1)/(c/r2);

Kappa

Overall Fraction Correct = (a+d)/t ; (often referred to simply as "Accuracy")

Mis-classification Rate, = 1 - Overall Fraction Correct;

Sensitivity = a/c1; (use exact Binomial confidence intervals instead of these)

Specificity = d/c2; (use exact Binomial confidence intervals instead of these)

Prevalence (estimated from sample) = c1/t

Positive Predictive Value (PPV) = a/r1; (use exact Binomial confidence intervals instead of these)

Negative Predictive Value (NPV) = d/r2; (use exact Binomial confidence intervals instead of these)

Difference in Proportions (DP) = a/r1 - c/r2;

Number Needed to Treat (NNT) = 1 / absolute value of DP; which = 1 / absolute value of ARR;

Absolute Risk Reduction (ARR) = c/r2 - a/r1; which = - DP

Relative Risk Reduction (RRR) = ARR/(c/r2); <more info>

Positive Likelihood Ratio (+LR) = Sensitivity / (1 - Specificity);

Negative Likelihood Ratio (-LR) = (1 - Sensitivity) / Specificity;

Diagnostic Odds Ratio = (Sensitivity/(1-Sensitivity))/((1-Specificity)/Specificity);

Error Odds Ratio = (Sensitivity/(1-Sensitivity))/(Specificity/(1-Specificity));

Youden's J = Sensitivity + Specificity - 1;

Number Needed to Diagnose (NND) = 1 / (Sensitivity - (1 - Specificity) ) = 1 / (Youden's J); <more info>

Number Needed to Mis-diagnose (NNM) = 1 / ( 1 - Accuracy ); <more info>

Forbes' NMI Index; <more info>

Contingency Coefficient;

Adjusted Contingency Coefficient;

Tetrachoric (terachoric) Correlation Coefficient = Cos( Pi / (1 + Sqrt( OR ) ) );

Phi Coefficient (= Cramer's Phi, and = Cohen's w Index, for 2x2 table);

Yule's Q = (a*d-b*c)/(a*d+b*c) = (OR - 1) / (OR + 1); <more info>

Equitable Threat Score = (a-e)/(a+b+c-e), where e = r1*c1/t; <more info>

Entropy H(r) = - ( (r1/t)log₂(r1/t) + (r2/t)log₂(r2/t))

Entropy H(c) = - ( (c1/t)log₂(c1/t) + (c2/t)log₂(c2/t))

Entropy H(r,c) = - ( (a/t)log₂(a/t) + (b/t)log₂(b/t) + (c/t)log₂(c/t) + (d/t)log₂(d/t))

Information shared by descriptors r and c: B = H(r) + H(c) - H(r,c)

A = H(r,c) - H(r)

C = H(r,c) - H(c)

Similarity of descriptors r and c: S(r,c) = B / (A + B + C)

Distance between r and c: D(r,c) = (A + C) / (A + B + C)

Relative Improvement Over Chance (RIOC)

If you don't see your favorite "quantity" in this list,
drop me a line and let me know how that quantity is calculated from the four cell counts,
and I'll add it to the collection!

Or you can calculate the limits for any derived quantity yourself! Here's how...

This is the lower limiting table...

And this is the upper limiting table...

If you use these numbers, instead of your observed numbers, in the formula for any derived quantity, you'll get the lower and upper confidence limits for that quantity.

(The row and column sums for these tables are the same as for your observed table.)

Reference: Bernard Rosner, Fundamentals of Biostatistics, 6^th Ed., 2006

Return to the Interactive Statistics page
Send e-mail to statpages.org@gmail.com

Prevalence (eg. 0.1)		* Enter your population prevalence estimate here
Sensitivity (eg. 0.8)		A) if you want to run a diagnostic test B) if it turns out that the sample and actual population prevalence differs substantially. Consider reporting the adjusted PPV/NPV's.
Specificity (eg. 0.8)
Total sample size
		<more info>

Quantities Derived from the 2-by-2 Contingency Table	Value
Odds Ratio (OR) = (a/b)/(c/d);
Relative Risk (RR) = (a/r1)/(c/r2);
Kappa
Overall Fraction Correct = (a+d)/t ; (often referred to simply as "Accuracy")
Mis-classification Rate, = 1 - Overall Fraction Correct;
Sensitivity = a/c1; (use exact Binomial confidence intervals instead of these)
Specificity = d/c2; (use exact Binomial confidence intervals instead of these)
Prevalence (estimated from sample) = c1/t
Positive Predictive Value (PPV) = a/r1; (use exact Binomial confidence intervals instead of these)

Negative Predictive Value (NPV) = d/r2; (use exact Binomial confidence intervals instead of these)

Difference in Proportions (DP) = a/r1 - c/r2;
Number Needed to Treat (NNT) = 1 / absolute value of DP; which = 1 / absolute value of ARR;
Absolute Risk Reduction (ARR) = c/r2 - a/r1; which = - DP
Relative Risk Reduction (RRR) = ARR/(c/r2); <more info>
Positive Likelihood Ratio (+LR) = Sensitivity / (1 - Specificity);
Negative Likelihood Ratio (-LR) = (1 - Sensitivity) / Specificity;
Diagnostic Odds Ratio = (Sensitivity/(1-Sensitivity))/((1-Specificity)/Specificity);
Error Odds Ratio = (Sensitivity/(1-Sensitivity))/(Specificity/(1-Specificity));
Youden's J = Sensitivity + Specificity - 1;
Number Needed to Diagnose (NND) = 1 / (Sensitivity - (1 - Specificity) ) = 1 / (Youden's J); <more info>
Number Needed to Mis-diagnose (NNM) = 1 / ( 1 - Accuracy ); <more info>
Forbes' NMI Index; <more info>
Contingency Coefficient;
Adjusted Contingency Coefficient;
Tetrachoric (terachoric) Correlation Coefficient = Cos( Pi / (1 + Sqrt( OR ) ) );
Phi Coefficient (= Cramer's Phi, and = Cohen's w Index, for 2x2 table);
Yule's Q = (ad-bc)/(ad+bc) = (OR - 1) / (OR + 1); <more info>
Equitable Threat Score = (a-e)/(a+b+c-e), where e = r1*c1/t; <more info>
Entropy H(r) = - ( (r1/t)log₂(r1/t) + (r2/t)log₂(r2/t))
Entropy H(c) = - ( (c1/t)log₂(c1/t) + (c2/t)log₂(c2/t))
Entropy H(r,c) = - ( (a/t)log₂(a/t) + (b/t)log₂(b/t) + (c/t)log₂(c/t) + (d/t)log₂(d/t))
Information shared by descriptors r and c: B = H(r) + H(c) - H(r,c)
A = H(r,c) - H(r)
C = H(r,c) - H(c)
Similarity of descriptors r and c: S(r,c) = B / (A + B + C)
Distance between r and c: D(r,c) = (A + C) / (A + B + C)
Relative Improvement Over Chance (RIOC)