Size: 1666
Comment: converted to 1.6 markup
|
Size: 1538
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
Leverage is also related to the i-th observation's [[FAQ/mahal|Mahalanobis distance]], $$\mbox{MD}_text{i}$$, such that for sample size, N | Leverage is also related to the i-th observation's [[FAQ/mahal|Mahalanobis distance]], MD(i), such that for sample size, N |
Line 6: | Line 6: |
Leverage for observation i = $$\frac{\mbox{MD}_text{i}}{\mbox{N-1}} + \frac{\mbox{1}}{\mbox{N}}$$ | Leverage for observation i = MD(i)/(N-1) + 1/N |
Line 10: | Line 10: |
Critical $$\mbox{MD}_text{i} = (\frac{\mbox{2(p+1)}}{\mbox{N}} - \frac{1}{\mbox{N}})(\mbox{N-1}) $$ | Critical MD(i) = (2(p+1)/N - 1/N)(N-1) |
Checking for outliers in regression
According to Hoaglin and Welsch (1978) leverage values above 2(p+1)/n where p predictors are in the regression on n observations (items) are influential values. If the sample size is < 30 a stiffer criterion such as 3(p+1)/n is suggested.
Leverage is also related to the i-th observation's Mahalanobis distance, MD(i), such that for sample size, N
Leverage for observation i = MD(i)/(N-1) + 1/N
so
Critical MD(i) = (2(p+1)/N - 1/N)(N-1)
(See Tabachnick and Fidell)
Other outlier detection methods using boxplots are in the Exploratory Data Analysis Graduate talk located here or by using z-scores using tests such as Grubb's test - further details and an on-line calculator are located here.
Hair, Anderson, Tatham and Black (1998) suggest Cook's distances greater than 1 are influential.
References
Hair, J., Anderson, R., Tatham, R. and Black W. (1998). Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall.
Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and ANOVA. The American Statistician 32, 17-22.
Return to Statistics main page
These pages are maintained by Ian Nimmo-Smith and Peter Watson