FAQ/RegressionOutliers - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
Type the odd letters out: ONlY twO thinGs aRE infiNite

Revision 8 as of 2007-10-03 16:32:36

location: FAQ / RegressionOutliers

Checking for outliers in regression

According to Hoaglin and Welsch (1978) leverage values above 2(p+1)/n where p predictors are in the regression on n observations (items) are influential values. If the sample size is < 30 a stiffer criterion such as 3(p+1)/n is suggested.

Leverage is also related to the i-th observation's [:FAQ/mahal:Mahalanobis distance], $$\mbox{MD}_text{i}$$, such that for sample size, N

Leverage for observation i = (MD/(N-1)) + (1/N)

so

Critical $$\mbox{MD}_text{i}$$ = ($$\frac{\mbox{2(p+1)}}{\mbox{N}} - \frac{1}{\mbox{N}})(\mbox{N-1})

(See Tabachnick and Fidell)

Hair, Anderson, Tatham and Black (1998) suggest Cook's distances greater than 1 are influential.

References

Hair, J., Anderson, R., Tatham, R. and Black W. (1998). Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall.

Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and ANOVA. The American Statistician 32, 17-22.

[wiki:FAQ Return to Statistics FAQ page]

[wiki:CbuStatistics Return to Statistics main page]

[http://www.mrc-cbu.cam.ac.uk/ Return to CBU main page]

These pages are maintained by [mailto:ian.nimmo-smith@mrc-cbu.cam.ac.uk Ian Nimmo-Smith] and [mailto:peter.watson@mrc-cbu.cam.ac.uk Peter Watson]