FAQ/Rcodeg2 - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
WHat wOrd is made by the captiaL lettErs?

location: FAQ / Rcodeg2

R code for multivariate normality testing

The R code below performs Mardia's test of multivariate kurtosis:

mardiatest <- function(x,alpha=0.05) {
# MARDIATEST Mardia multivariate normality test using skewness & kurtosis.
#   MARDIATEST(X,ALPHA) performs Mardia's test of multivariate
#   normality to determine if the null hypothesis of multivariate 
#   normality is a reasonable assumption regarding the population
#   distributions of a random samples contained within the columns of X. X
#   must be a N-values by M-samples array. The desired significance level,
#   ALPHA, is an optional scalar input (default = 0.05).
#
#   The function performs three tests: of the multivariate skewness; the
#   multivariate skewness corrected for small samples; and the multivariate
#   kurtosis. H is a 1-by-3 array indicating the results of the hypothesis
#   tests:
#     H(i) = 0 => Do not reject the null hypothesis at significance level
#     ALPHA.
#     H(i) = 1 => Reject the null hypothesis at significance level ALPHA.
#
#   MARDIATEST(...) also returns the following fields:
#     stats.Hs  - logical scalar indicating whether to reject the null
#                 hypothesis that the multivariate skewness is consistent
#                 with a multivariate normal distribution.  
#     stats.Ps  - the P value for the multivariate skewness.
#     stats.Ms  - the Mardia test statistic for the multivariate skewness.
#     stats.CVs - the critical value for the multivariate skewness.
#     stats.Hsc - logical scalar indicating whether to reject the null
#                 hypothesis that the multivariate skewness (corrected for
#                 small samples) is consistent with a multivariate normal
#                 distribution.
#     stats.Psc - the P value for the multivariate skewness (corrected for
#                 small samples).
#     stats.Msc - the Mardia test statistic for the multivariate skewness
#                 (corrected for small samples).
#     stats.Hk  - logical scalar indicating whether to reject the null
#                 hypothesis that the multivariate kurtosis is consistent
#                 with a multivariate normal distribution.  
#     stats.Pk  - the P value for the multivariate kurtosis.
#     stats.Mk  - the Mardia test statistic for the multivariate kurtosis.
#     stats.CVk - the critical value for the multivariate kurtosis.
#
#   Notes:
#   For multivariate data, tests of normality of the individual variables
#   are not sufficent to determine multivariate normality. Even if every
#   variable in a set is normally distributed, it is still possible that
#   the combined distribution is not multivariate normal. Mardia's test is
#   one means testing for multivariate normality. The test is based on
#   independent tests of multivariate skewness and kurtosis. Data can be
#   assumed to conform to multivariate normality only if the null
#   hypothesis is not rejected for both tests.
#
#   Example:
#   >> X = [2.4  2.1  2.4; ...
#           3.5  1.8  3.9; ...
#           6.7  3.6  5.9; ...
#           5.3  3.3  6.1; ...
#           5.2  4.1  6.4; ...
#           3.2  2.7  4.0; ...
#           4.5  4.9  5.7; ...
#           3.9  4.7  4.7; ...
#           4.0  3.6  2.9; ...
#           5.7  5.5  6.2; ...
#           2.4  2.9  3.2; ...
#           2.7  2.6  4.1];
# 
#   >> [H stats] = mardiatest(X, 0.05)
# 
#   H =
#        0     0     1
# 
#   stats = 
#        Hs: 0
#        Ps: 0.9660
#        Ms: 3.5319
#       CVs: 18.3070
#       Hsc: 0
#       Psc: 0.8918
#       Msc: 4.9908
#        Hk: 1
#        Pk: 0.0452
#        Mk: -1.6936
#       CVk: 1.6449
#   
#   The magnitude of the Mardia kurtosis test statistic (Mk) is greater
#   than critical value (CVk) for a 5% level test, so the hypothesis of
#   multivariate normality is rejected. 
#
#
#   This program is based upon the MATCLAB version which is:
#   Copyright 2006 David Graham, Loughborough University
#   (D.J.Graham@lboro.ac.uk)
#
# This function is based on MSKEKUR by A. Trujillo-Ortiz and R.
# Hernandez-Walls. Modifications include:
#   - additional checking of inputs
#   - modification of help text for consistency with Matlab conventions
#   - the output of the test results to variables
#   - removal of the code sending test results as a string to the command
#     window
#   - removed call to 'eval'.
#
# Statements from the original code:
#
#  Created by A. Trujillo-Ortiz and R. Hernandez-Walls
#         Facultad de Ciencias Marinas        
#         Universidad Autonoma de Baja California 
#         Apdo. Postal 453  
#         Ensenada, Baja California
#         Mexico  
#         atrujo@uabc.mx
#
#  May 22, 2003.
#
#  To cite this file, this would be an appropriate format:
#  Trujillo-Ortiz, A. and R. Hernandez-Walls. (2003). Mskekur: Mardia's multivariate skewness
#    and kurtosis coefficients and its hypotheses testing. A MATLAB file. [WWW document]. URL 
#    http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=3519&objectType=FILE
#
#  References:
# 
#  Mardia, K. V. (1970), Measures of multivariate skewnees and kurtosis with
#         applications. Biometrika, 57(3):519-530.
#  Mardia, K. V. (1974), Applications of some measures of multivariate skewness
#         and kurtosis for testing normality and robustness studies. Sankhyâ A,
#         36:115-128
#  Stevens, J. (1992), Applied Multivariate Statistics for Social Sciences. 2nd. ed.
#         New-Jersey:Lawrance Erlbaum Associates Publishers. pp. 247-248.



# Check for validity of ALPHA
if  (is.numeric(alpha) == FALSE || alpha>0.5 || alpha <0)
    cat('Input ALPHA must be a scalar between 0 and 0.5.')
end

n <- nrow(x)
p <- ncol(x)

# Check for validity of X
if (p < 2)
    cat('Input X must be an array with at least 2 columns.')
end


covxinv <- solve(cov(x))

dift <- matrix(0,nrow(x),ncol(x))

 for (j in 1:ncol(x)) {
dift[,j] <- (x[,j] - mean(x[,j]))
 }
 
D <- dift %*% covxinv %*% t(dift) # Mahalanobis Distances

b1p <- (sum(sum(D^3))) / n^2   # Multivariate skewness coefficient
b2p <- sum(diag(D^2)) / n          # Multivariate kurtosis coefficient

k <- ((p+1)*(n+1)*(n+3)) / (n*(((n+1)*(p+1))-6))      # Small sample correction
v <- (p*(p+1)*(p+2)) / 6        # Degrees of freedom
g1c <- (n*b1p*k) / 6            # Skewness test statistic corrected for small sample (approximates to a chi-square distribution)
g1 <- (n*b1p) / 6               # Skewness test statistic (approximates to a chi-square distribution)
P1 <- 1 - pchisq(g1,v)         # Significance value of skewness
P1c <- 1 - pchisq(g1c,v)       # Significance value of skewness corrected for small sample

g2 <- (b2p-(p*(p+2))) / (sqrt((8*p*(p+2))/n))      # Kurtosis test statistic (approximates to a unit-normal distribution)
P2 <- 1-pnorm(abs(g2))        # Significance value of kurtosis

stats.Hs  <- (P1 < alpha);
stats.Ps  <- P1;
stats.Ms  <- g1;
stats.CVs <- qchisq(1-alpha,v);
stats.Hsc <- (P1c < alpha);
stats.Psc <- P1c;
stats.Msc <- g1c;
stats.Hk  <- (P2 < alpha);
stats.Pk  <- P2;
stats.Mk  <- g2;
stats.CVk <- qnorm(1-alpha,0,1);

cat("Hs = "); print(stats.Hs)
cat("Ps = "); print(stats.Ps)
cat("Ms = "); print(stats.Ms)
cat("CVs = "); print(stats.CVs)
cat("Hsc = "); print(stats.Hsc)
cat("Psc = "); print(stats.Psc)
cat("Msc = "); print(stats.Msc)
cat("Hk = "); print(stats.Hk)
cat("Mk = "); print(stats.Mk)
cat("CVk = "); print(stats.CVk)
}

which may be copied into R. When a data matrix, x, is then inputted the program may be run by typing:

mardiatest(x,0.05)