R code for multivariate normality testing
The R code below performs Mardia's test of multivariate kurtosis:
mardiatest <- function(x,alpha=0.05) {
# MARDIATEST Mardia multivariate normality test using skewness & kurtosis.
# MARDIATEST(X,ALPHA) performs Mardia's test of multivariate
# normality to determine if the null hypothesis of multivariate
# normality is a reasonable assumption regarding the population
# distributions of a random samples contained within the columns of X. X
# must be a N-values by M-samples array. The desired significance level,
# ALPHA, is an optional scalar input (default = 0.05).
#
# The function performs three tests: of the multivariate skewness; the
# multivariate skewness corrected for small samples; and the multivariate
# kurtosis. H is a 1-by-3 array indicating the results of the hypothesis
# tests:
# H(i) = 0 => Do not reject the null hypothesis at significance level
# ALPHA.
# H(i) = 1 => Reject the null hypothesis at significance level ALPHA.
#
# MARDIATEST(...) also returns the following fields:
# stats.Hs - logical scalar indicating whether to reject the null
# hypothesis that the multivariate skewness is consistent
# with a multivariate normal distribution.
# stats.Ps - the P value for the multivariate skewness.
# stats.Ms - the Mardia test statistic for the multivariate skewness.
# stats.CVs - the critical value for the multivariate skewness.
# stats.Hsc - logical scalar indicating whether to reject the null
# hypothesis that the multivariate skewness (corrected for
# small samples) is consistent with a multivariate normal
# distribution.
# stats.Psc - the P value for the multivariate skewness (corrected for
# small samples).
# stats.Msc - the Mardia test statistic for the multivariate skewness
# (corrected for small samples).
# stats.Hk - logical scalar indicating whether to reject the null
# hypothesis that the multivariate kurtosis is consistent
# with a multivariate normal distribution.
# stats.Pk - the P value for the multivariate kurtosis.
# stats.Mk - the Mardia test statistic for the multivariate kurtosis.
# stats.CVk - the critical value for the multivariate kurtosis.
#
# Notes:
# For multivariate data, tests of normality of the individual variables
# are not sufficent to determine multivariate normality. Even if every
# variable in a set is normally distributed, it is still possible that
# the combined distribution is not multivariate normal. Mardia's test is
# one means testing for multivariate normality. The test is based on
# independent tests of multivariate skewness and kurtosis. Data can be
# assumed to conform to multivariate normality only if the null
# hypothesis is not rejected for both tests.
#
# Example:
# >> X = [2.4 2.1 2.4; ...
# 3.5 1.8 3.9; ...
# 6.7 3.6 5.9; ...
# 5.3 3.3 6.1; ...
# 5.2 4.1 6.4; ...
# 3.2 2.7 4.0; ...
# 4.5 4.9 5.7; ...
# 3.9 4.7 4.7; ...
# 4.0 3.6 2.9; ...
# 5.7 5.5 6.2; ...
# 2.4 2.9 3.2; ...
# 2.7 2.6 4.1];
#
# >> [H stats] = mardiatest(X, 0.05)
#
# H =
# 0 0 1
#
# stats =
# Hs: 0
# Ps: 0.9660
# Ms: 3.5319
# CVs: 18.3070
# Hsc: 0
# Psc: 0.8918
# Msc: 4.9908
# Hk: 1
# Pk: 0.0452
# Mk: -1.6936
# CVk: 1.6449
#
# The magnitude of the Mardia kurtosis test statistic (Mk) is greater
# than critical value (CVk) for a 5% level test, so the hypothesis of
# multivariate normality is rejected.
#
#
# This program is based upon the MATCLAB version which is:
# Copyright 2006 David Graham, Loughborough University
# (D.J.Graham@lboro.ac.uk)
#
# This function is based on MSKEKUR by A. Trujillo-Ortiz and R.
# Hernandez-Walls. Modifications include:
# - additional checking of inputs
# - modification of help text for consistency with Matlab conventions
# - the output of the test results to variables
# - removal of the code sending test results as a string to the command
# window
# - removed call to 'eval'.
#
# Statements from the original code:
#
# Created by A. Trujillo-Ortiz and R. Hernandez-Walls
# Facultad de Ciencias Marinas
# Universidad Autonoma de Baja California
# Apdo. Postal 453
# Ensenada, Baja California
# Mexico
# atrujo@uabc.mx
#
# May 22, 2003.
#
# To cite this file, this would be an appropriate format:
# Trujillo-Ortiz, A. and R. Hernandez-Walls. (2003). Mskekur: Mardia's multivariate skewness
# and kurtosis coefficients and its hypotheses testing. A MATLAB file. [WWW document]. URL
# http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=3519&objectType=FILE
#
# References:
#
# Mardia, K. V. (1970), Measures of multivariate skewnees and kurtosis with
# applications. Biometrika, 57(3):519-530.
# Mardia, K. V. (1974), Applications of some measures of multivariate skewness
# and kurtosis for testing normality and robustness studies. Sankhyâ A,
# 36:115-128
# Stevens, J. (1992), Applied Multivariate Statistics for Social Sciences. 2nd. ed.
# New-Jersey:Lawrance Erlbaum Associates Publishers. pp. 247-248.
# Check for validity of ALPHA
if (is.numeric(alpha) == FALSE || alpha>0.5 || alpha <0)
cat('Input ALPHA must be a scalar between 0 and 0.5.')
end
n <- nrow(x)
p <- ncol(x)
# Check for validity of X
if (p < 2)
cat('Input X must be an array with at least 2 columns.')
end
covxinv <- solve(cov(x))
dift <- matrix(0,nrow(x),ncol(x))
for (j in 1:ncol(x)) {
dift[,j] <- (x[,j] - mean(x[,j]))
}
D <- dift %*% covxinv %*% t(dift) # Mahalanobis Distances
b1p <- (sum(sum(D^3))) / n^2 # Multivariate skewness coefficient
b2p <- sum(diag(D^2)) / n # Multivariate kurtosis coefficient
k <- ((p+1)*(n+1)*(n+3)) / (n*(((n+1)*(p+1))-6)) # Small sample correction
v <- (p*(p+1)*(p+2)) / 6 # Degrees of freedom
g1c <- (n*b1p*k) / 6 # Skewness test statistic corrected for small sample (approximates to a chi-square distribution)
g1 <- (n*b1p) / 6 # Skewness test statistic (approximates to a chi-square distribution)
P1 <- 1 - pchisq(g1,v) # Significance value of skewness
P1c <- 1 - pchisq(g1c,v) # Significance value of skewness corrected for small sample
g2 <- (b2p-(p*(p+2))) / (sqrt((8*p*(p+2))/n)) # Kurtosis test statistic (approximates to a unit-normal distribution)
P2 <- 1-pnorm(abs(g2)) # Significance value of kurtosis
stats.Hs <- (P1 < alpha);
stats.Ps <- P1;
stats.Ms <- g1;
stats.CVs <- qchisq(1-alpha,v);
stats.Hsc <- (P1c < alpha);
stats.Psc <- P1c;
stats.Msc <- g1c;
stats.Hk <- (P2 < alpha);
stats.Pk <- P2;
stats.Mk <- g2;
stats.CVk <- qnorm(1-alpha,0,1);
cat("Hs = "); print(stats.Hs)
cat("Ps = "); print(stats.Ps)
cat("Ms = "); print(stats.Ms)
cat("CVs = "); print(stats.CVs)
cat("Hsc = "); print(stats.Hsc)
cat("Psc = "); print(stats.Psc)
cat("Msc = "); print(stats.Msc)
cat("Hk = "); print(stats.Hk)
cat("Mk = "); print(stats.Mk)
cat("CVk = "); print(stats.CVk)
}which may be copied into R. When a data matrix, x, is then inputted the program may be run by typing:
mardiatest(x,0.05)
