How do I choose between different logistic regression models?

In multiple linear regressions we have R-squared to summarise the fit of a model and thereby inform on choice.

SPSS produces two R-squared measures (Nagelkerke and Cox-Snell) for binary logistic regressions but the use of these is controversial (see below). They compare the fit of the model with the predictors to one without them (just like fit indices in structural equation models, for those familiar with this area).

What follows is an extract of an e-mail on the choice of R-squared in logistic regression from Dietrich Alte. The recommended referenced journal is available from the University library.

Menard (2000) referred to below suggests using

$$R^text{2} = \frac{\mbox{Difference between -2 log likelihoods of null model and model with covariates of interest in}}{\mbox{-2 log likelihood of null model}} $$

where the null model is a binary logistic regression with a single predictor (covariate) consisting of a column of 1's. To fit this particular model you first need to click the options button and ask for no constant to be in the regression. SPSS and other packages routinely output -2 log likelihoods which are indices of particular model goodness-of-fits.

The first measure (Cox-Snell) is used to assess models where all the independent variables are continuous and the second (Nagelkerke) is used where there are one or more binary independent variables in the model.

These statistics have at their core the ratio of the likelihood function of the fitted model to the likelihood function of an intercept only model. What that are actually measuring is the proportion of change in the likelihood function of the specified model vs. no model at all.

I don't like these statistics very much, and like them even less because their names suggest they are analogous to the "variance explained" measures used in linear models, but they are actually measuring something else.

There was a very good article in the Feb 2000 issue of The American Statistician by Scott Menard called "Coefficients of Determination for Multiple Logistic Regression Models," which may be of use.

Note that R^2 cannot be used to compare two logistic models if one or more suffers from overdispersion. In this case information criteria should be used (see here).

Reference

Scott Menard, “Coefficients of Determination for Multiple Logistic Regression Analysis,” The American Statistician 54:17-24, 2000.