Differences between revisions 2 and 19 (spanning 17 versions)

Checking for outliers in regression

According to Hoaglin and Welsch (1978) leverage values above 2(p+1)/n where p predictors are in the regression on n observations (items) are influential values. If the sample size is < 30 a stiffer criterion such as 3(p+1)/n is suggested.

Leverage is also related to the i-th observation's Mahalanobis distance, $$\mbox{MD}_text{i}$$, such that for sample size, N

Leverage for observation i = $$\frac{\mbox{MD}_text{i}}{\mbox{N-1}} + \frac{\mbox{1}}{\mbox{N}}$$

Critical $$\mbox{MD}_text{i} = (\frac{\mbox{2(p+1)}}{\mbox{N}} - \frac{1}{\mbox{N}})(\mbox{N-1}) $$

(See Tabachnick and Fidell)

Other outlier detection methods using boxplots are in the Exploratory Data Analysis Graduate talk located here or by using z-scores using tests such as Grubb's test - further details and an on-line calculator are located here.

Hair, Anderson, Tatham and Black (1998) suggest Cook's distances greater than 1 are influential.

References

Hair, J., Anderson, R., Tatham, R. and Black W. (1998). Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall.

Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and ANOVA. The American Statistician 32, 17-22.

Return to Statistics FAQ page

Return to Statistics main page

Return to CBU main page

These pages are maintained by Ian Nimmo-Smith and Peter Watson

-  ⇤ ← Revision 2 as of 2006-06-30 22:55:30 → 
  Size: 974
  Editor: Scripting Subsystem
  Comment:
+   ← Revision 19 as of 2013-03-08 10:17:44 → ⇥
  Size: 1666
  Editor: localhost
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 3:
+Leverage is also related to the i-th observation's [[FAQ/mahal|Mahalanobis distance]], $$\mbox{MD}_text{i}$$, such that for sample size, N

Leverage for observation i = $$\frac{\mbox{MD}_text{i}}{\mbox{N-1}} + \frac{\mbox{1}}{\mbox{N}}$$

so 

Critical $$\mbox{MD}_text{i} = (\frac{\mbox{2(p+1)}}{\mbox{N}} - \frac{1}{\mbox{N}})(\mbox{N-1}) $$

(See Tabachnick and Fidell)

Other outlier detection methods using boxplots are in the Exploratory Data Analysis Graduate talk located [[StatsCourse2009|here]] or by using z-scores using tests such as Grubb's test - further details and an on-line calculator are located [[http://www.graphpad.com/quickcalcs/Grubbs1.cfm|here.]]
-Line 8:
+Line 20:
-'''Hair, J., Anderson, R., and Tatham, R. (1992).''' Multivariate Data Analysis (third edition). Englewood Cliffs, NJ: Prentice-Hall.
+'''Hair, J., Anderson, R., Tatham, R. and Black W. (1998).''' Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall.
-Line 12:
+Line 24:
-[wiki:FAQ Return to Statistics FAQ page]
+[[FAQ|Return to Statistics FAQ page]]
-Line 14:
+Line 26:
-[wiki:CbuStatistics Return to Statistics main page]
+[[CbuStatistics|Return to Statistics main page]]
-Line 16:
+Line 28:
-[http://www.mrc-cbu.cam.ac.uk/ Return to CBU main page]
+[[http://www.mrc-cbu.cam.ac.uk/|Return to CBU main page]]
-Line 18:
+Line 30:
-These pages are maintained by [mailto:ian.nimmo-smith@mrc-cbu.cam.ac.uk Ian Nimmo-Smith] and [mailto:peter.watson@mrc-cbu.cam.ac.uk Peter Watson]
+These pages are maintained by [[mailto:ian.nimmo-smith@mrc-cbu.cam.ac.uk|Ian Nimmo-Smith]] and [[mailto:peter.watson@mrc-cbu.cam.ac.uk|Peter Watson]]

MRC CBU Wiki

Quick Links

Search Wiki

Page Tools

Checking for outliers in regression