= Checking for outliers in regression = According to Hoaglin and Welsch (1978) leverage values above 2(p+1)/n where p predictors are in the regression on n observations (items) are influential values. If the sample size is < 30 a stiffer criterion such as 3(p+1)/n is suggested. Leverage is also related to the i-th observation's [[FAQ/mahal|Mahalanobis distance]], MD(i), such that for sample size, N Leverage for observation i = MD(i)/(N-1) + 1/N so Critical $$\mbox{MD}_text{i} = (2(p+1)/N - 1/N)(N-1) (See Tabachnick and Fidell) Other outlier detection methods using boxplots are in the Exploratory Data Analysis Graduate talk located [[StatsCourse2009|here]] or by using z-scores using tests such as Grubb's test - further details and an on-line calculator are located [[http://www.graphpad.com/quickcalcs/Grubbs1.cfm|here.]] Hair, Anderson, Tatham and Black (1998) suggest Cook's distances greater than 1 are influential. '''References''' '''Hair, J., Anderson, R., Tatham, R. and Black W. (1998).''' Multivariate Data Analysis (fifth edition). Englewood Cliffs, NJ: Prentice-Hall. '''Hoaglin, D. C. and Welsch, R. E. (1978).''' The hat matrix in regression and ANOVA. The American Statistician 32, 17-22. [[FAQ|Return to Statistics FAQ page]] [[CbuStatistics|Return to Statistics main page]] [[http://www.mrc-cbu.cam.ac.uk/|Return to CBU main page]] These pages are maintained by [[mailto:ian.nimmo-smith@mrc-cbu.cam.ac.uk|Ian Nimmo-Smith]] and [[mailto:peter.watson@mrc-cbu.cam.ac.uk|Peter Watson]]