An example using the offset command of glm() in R to obtain a 95% Confidence Interval for a infinite logistic regression estimate

In this example we wish to predict whether there was evidence of the onset of depression, mdddic, in a person. We wish to see if this is related to a genotype, r6265, which can take two values and the binary variable recording whether that person suffered stress at follow-up, stressfu. In particular the interaction, r6265 x stressfu, is considered.

We are going to use R to do the model fitting because, unlike other packages such as SPSS, this software allows the fitting of an offset term in a logistic regression model using the glm function.

We first of all read the data into R from a SPSS spreadsheet file and then fit two models which assess whether there is a statistically significant r6265 x stressfu interaction. The models differ only in the presence of the interaction term so the difference in their ability to predict the onset of depression signifies the importance of the interaction in predicting depression. The anova command computes twice the difference in model fit indices, the likelihood ratio, which follows a chi-square assuming there is no evidence of an interaction.

library(foreign)
x <- read.spss("data.sav")
x <- attach(x)
r6265n <- as.numeric(r6265_dich)
stressn <- as.numeric(stressfu)-1
model <- glm(mdddic ~  r6265n + stressn + r6265n:stressn, family=binomial(link=logit))
summary(model)
model1 <- glm(mdddic ~  r6265n + stressn, family=binomial(link=logit))
anova(model,model1)

The Likelihood ratio chi-square equals 5.02, p=0.025 but the regression estimate for the stressfu x r6265 interaction but infinite mles (large s.es)

The regression estimates and standard errors are undefined so use the offset method of Rindskopf (2002) to obtain confidence intervals for them.

int <- r6265n*stressn

model <- glm(mdddic ~  r6265n + stressn + offset(1.50*int), family=binomial(link=logit))
summary(model)
model1 <- glm(mdddic ~  r6265n + stressn, family=binomial(link=logit))
anova(model,model1)

The change in deviance (-2 log likelihoods) between the models with the r6265 x stress interaction having a regression estimate of 1.50 equals 3.57; For a 95% confidence interval for this regression estimate we need this difference to equal 3.84 (The value of chi-square(1) at 5%) so we increase the offset until we obtain this value. This is found through trial and error as 1.735.

model <- glm(mdddic ~  r6265n + stressn + offset(1.735*int), family=binomial(link=logit))
summary(model)
model1 <- glm(mdddic ~  r6265n + stressn, family=binomial(link=logit))
anova(model,model1)

So the 95% CI for the interaction term is (1.74, +infinity).

This compares with the more conservative exact 95% CI using SAS LOGISTIC for r6265 x stressfu interaction of (-0.57, +infinity) which contains zero so unlike the ML likelihood ratio there is no evidence of an interaction (p=0.17). Rindskopf (2002) criticises the exact approach as being too conservative.

Reference

Rindskopf D. (2002) Infinite Parameter Estimates in Logistic Regression: Opportunities, Not Problems. Journal of Educational and Behavioral Statistics 27(2), 147-161.