FAQ/CombiningPvalues - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
In thi sntence, what word is mad fro the mising letters?

Revision 28 as of 2008-05-15 12:16:40

location: FAQ / CombiningPvalues

Combining p-values by Stouffer's (preferred) and Fisher's (legacy) methods

Combining p-values by Stouffer's method

function pcomb = stouffer(p)
% Stouffer et al's (1949) unweighted method for combination of 
% independent p-values via z's 
    if length(p)==0
        error('pfast was passed an empty array of p-values')
        pcomb=1;
    else
        pcomb = (1-erf(sum(sqrt(2) * erfinv(1-2*p))/sqrt(2*length(p))))/2;
    end

Note the below performs Stouffer's method in R assuming p-values are entered into a vector p e.g. p <- c(0,1,0.2,0.01).

erf <- function(x) 2 * pnorm(2 * x/ sqrt(2)) - 1
erfinv <- function(x) qnorm( (x+1)/2 ) / sqrt(2)
pcomb <- function(p) (1-erf(sum(sqrt(2) * erfinv(1-2*p))/sqrt(2*length(p))))/2
pl <- NA
pl <- length(p)
{ if (is.na(pl)) { res <- "There was an empty array of p-values"} 
else 
res <- pcomb(p) }
print(res)

A [attachment:combinedp.xls spreadsheet] can also be used to compute Fisher's and Stouffer's combined p.

Combining p-values by Fisher's method

The basic idea is that if $$p_i (i=1 \ldots n)$$ are the one-sided $$p$$-values for $$n$$ independent statistics then $$-2 \sum\log(p_i)$$ is a $$\chi^2(2n)$$ statistic which reflects whether the combined $$p$$-values are smaller than would be expected if they were Uniform(0,1) variates.

The following MATLAB code evaluates this statistic and its p-value.

function p = pfast(p)
% Fisher's (1925) method for combination of independent p-values
% Code adapted from Bailey and Gribskov (1998)
    product=prod(p);
    n=length(p);
    if n<=0
        error('pfast was passed an empty array of p-values')
    elseif n==1
        p = product;
        return
    elseif product == 0
        p = 0;
        return
    else
        x = -log(product);
        t=product;
        p=product;
        for i = 1:n-1
            t = t * x / i;
            p = p + t;
        end
    end  

Let's try it out:

>> pvals=[0.1 0.01 0.01 0.7 0.3 0.1];
>> pfast(pvals)

ans =

    0.0021

I.e. the combined p-value is 0.0021 for this array of 6 $$p$$-values.

Further investigations suggest that Fisher's method has inappropriate behaviour. [examples to be included]

This method may also be performed using [:FAQ/Rfishp: R code.]

References

Bailey TL, Gribskov M (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics, 14 (1) 48-54.

Fisher RA (1925). Statistical methods for research workers (13th edition). London: Oliver and Boyd.

Stouffer, Samuel A., Edward A. Suchman, Leland C. DeVinney, Shirley A. Star, and Robin M. Williams, Jr. (1949) Studies in Social Psychology in World War II: The American Soldier. Vol. 1, Adjustment During Army Life. Princeton: Princeton University Press.