FAQ/Rcodeg2 - CBU statistics Wiki
location: FAQ / Rcodeg2

R code for multivariate normality testing

The R code below performs Mardia's test of multivariate kurtosis:

mardiatest <- function(x,alpha=0.05) {
# MARDIATEST Mardia multivariate normality test using skewness & kurtosis.
#   MARDIATEST(X,ALPHA) performs Mardia's test of multivariate
#   normality to determine if the null hypothesis of multivariate 
#   normality is a reasonable assumption regarding the population
#   distributions of a random samples contained within the columns of X. X
#   must be a N-values by M-samples array. The desired significance level,
#   ALPHA, is an optional scalar input (default = 0.05).
#
#   The function performs three tests: of the multivariate skewness; the
#   multivariate skewness corrected for small samples; and the multivariate
#   kurtosis. H is a 1-by-3 array indicating the results of the hypothesis
#   tests:
#     H(i) = 0 => Do not reject the null hypothesis at significance level
#     ALPHA.
#     H(i) = 1 => Reject the null hypothesis at significance level ALPHA.
#
#   MARDIATEST(...) also returns the following fields:
#     stats.Hs  - logical scalar indicating whether to reject the null
#                 hypothesis that the multivariate skewness is consistent
#                 with a multivariate normal distribution.  
#     stats.Ps  - the P value for the multivariate skewness.
#     stats.Ms  - the Mardia test statistic for the multivariate skewness.
#     stats.CVs - the critical value for the multivariate skewness.
#     stats.Hsc - logical scalar indicating whether to reject the null
#                 hypothesis that the multivariate skewness (corrected for
#                 small samples) is consistent with a multivariate normal
#                 distribution.
#     stats.Psc - the P value for the multivariate skewness (corrected for
#                 small samples).
#     stats.Msc - the Mardia test statistic for the multivariate skewness
#                 (corrected for small samples).
#     stats.Hk  - logical scalar indicating whether to reject the null
#                 hypothesis that the multivariate kurtosis is consistent
#                 with a multivariate normal distribution.  
#     stats.Pk  - the P value for the multivariate kurtosis.
#     stats.Mk  - the Mardia test statistic for the multivariate kurtosis.
#     stats.CVk - the critical value for the multivariate kurtosis.
#
#   Notes:
#   For multivariate data, tests of normality of the individual variables
#   are not sufficent to determine multivariate normality. Even if every
#   variable in a set is normally distributed, it is still possible that
#   the combined distribution is not multivariate normal. Mardia's test is
#   one means testing for multivariate normality. The test is based on
#   independent tests of multivariate skewness and kurtosis. Data can be
#   assumed to conform to multivariate normality only if the null
#   hypothesis is not rejected for both tests.
#
#   Example:
#   >> X = [2.4  2.1  2.4; ...
#           3.5  1.8  3.9; ...
#           6.7  3.6  5.9; ...
#           5.3  3.3  6.1; ...
#           5.2  4.1  6.4; ...
#           3.2  2.7  4.0; ...
#           4.5  4.9  5.7; ...
#           3.9  4.7  4.7; ...
#           4.0  3.6  2.9; ...
#           5.7  5.5  6.2; ...
#           2.4  2.9  3.2; ...
#           2.7  2.6  4.1];
# 
#   >> [H stats] = mardiatest(X, 0.05)
# 
#   H =
#        0     0     1
# 
#   stats = 
#        Hs: 0
#        Ps: 0.9660
#        Ms: 3.5319
#       CVs: 18.3070
#       Hsc: 0
#       Psc: 0.8918
#       Msc: 4.9908
#        Hk: 1
#        Pk: 0.0452
#        Mk: -1.6936
#       CVk: 1.6449
#   
#   The magnitude of the Mardia kurtosis test statistic (Mk) is greater
#   than critical value (CVk) for a 5% level test, so the hypothesis of
#   multivariate normality is rejected. 
#
#
#   This program is based upon the MATCLAB version which is:
#   Copyright 2006 David Graham, Loughborough University
#   (D.J.Graham@lboro.ac.uk)
#
# This function is based on MSKEKUR by A. Trujillo-Ortiz and R.
# Hernandez-Walls. Modifications include:
#   - additional checking of inputs
#   - modification of help text for consistency with Matlab conventions
#   - the output of the test results to variables
#   - removal of the code sending test results as a string to the command
#     window
#   - removed call to 'eval'.
#
# Statements from the original code:
#
#  Created by A. Trujillo-Ortiz and R. Hernandez-Walls
#         Facultad de Ciencias Marinas        
#         Universidad Autonoma de Baja California 
#         Apdo. Postal 453  
#         Ensenada, Baja California
#         Mexico  
#         atrujo@uabc.mx
#
#  May 22, 2003.
#
#  To cite this file, this would be an appropriate format:
#  Trujillo-Ortiz, A. and R. Hernandez-Walls. (2003). Mskekur: Mardia's multivariate skewness
#    and kurtosis coefficients and its hypotheses testing. A MATLAB file. [WWW document]. URL 
#    http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=3519&objectType=FILE
#
#  References:
# 
#  Mardia, K. V. (1970), Measures of multivariate skewnees and kurtosis with
#         applications. Biometrika, 57(3):519-530.
#  Mardia, K. V. (1974), Applications of some measures of multivariate skewness
#         and kurtosis for testing normality and robustness studies. Sankhyâ A,
#         36:115-128
#  Stevens, J. (1992), Applied Multivariate Statistics for Social Sciences. 2nd. ed.
#         New-Jersey:Lawrance Erlbaum Associates Publishers. pp. 247-248.



# Check for validity of ALPHA
if  (is.numeric(alpha) == FALSE || alpha>0.5 || alpha <0)
    cat('Input ALPHA must be a scalar between 0 and 0.5.')
end

n <- nrow(x)
p <- ncol(x)

# Check for validity of X
if (p < 2)
    cat('Input X must be an array with at least 2 columns.')
end


covxinv <- solve(cov(x))

dift <- matrix(0,nrow(x),ncol(x))

 for (j in 1:ncol(x)) {
dift[,j] <- (x[,j] - mean(x[,j]))
 }
 
D <- dift %*% covxinv %*% t(dift) # Mahalanobis Distances

b1p <- (sum(sum(D^3))) / n^2   # Multivariate skewness coefficient
b2p <- sum(diag(D^2)) / n          # Multivariate kurtosis coefficient

k <- ((p+1)*(n+1)*(n+3)) / (n*(((n+1)*(p+1))-6))      # Small sample correction
v <- (p*(p+1)*(p+2)) / 6        # Degrees of freedom
g1c <- (n*b1p*k) / 6            # Skewness test statistic corrected for small sample (approximates to a chi-square distribution)
g1 <- (n*b1p) / 6               # Skewness test statistic (approximates to a chi-square distribution)
P1 <- 1 - pchisq(g1,v)         # Significance value of skewness
P1c <- 1 - pchisq(g1c,v)       # Significance value of skewness corrected for small sample

g2 <- (b2p-(p*(p+2))) / (sqrt((8*p*(p+2))/n))      # Kurtosis test statistic (approximates to a unit-normal distribution)
P2 <- 1-pnorm(abs(g2))        # Significance value of kurtosis

stats.Hs  <- (P1 < alpha);
stats.Ps  <- P1;
stats.Ms  <- g1;
stats.CVs <- qchisq(1-alpha,v);
stats.Hsc <- (P1c < alpha);
stats.Psc <- P1c;
stats.Msc <- g1c;
stats.Hk  <- (P2 < alpha);
stats.Pk  <- P2;
stats.Mk  <- g2;
stats.CVk <- qnorm(1-alpha,0,1);

cat("Hs = "); print(stats.Hs)
cat("Ps = "); print(stats.Ps)
cat("Ms = "); print(stats.Ms)
cat("CVs = "); print(stats.CVs)
cat("Hsc = "); print(stats.Hsc)
cat("Psc = "); print(stats.Psc)
cat("Msc = "); print(stats.Msc)
cat("Hk = "); print(stats.Hk)
cat("Mk = "); print(stats.Mk)
cat("CVk = "); print(stats.CVk)
}

which may be copied into R. When a data matrix, x, is then inputted the program may be run by typing:

mardiatest(x,0.05)

None: FAQ/Rcodeg2 (last edited 2013-03-08 10:17:13 by localhost)