FAQ/dummycoding - CBU statistics Wiki

Upload page content

You can upload content for the page named below. If you change the page name, you can also upload content for another page. If the page name is empty, we derive the page name from the file name.

File to load page content from
Page name
Comment
In thi sntence, what word is mad fro the mising letters?

location: FAQ / dummycoding

Alternative coding for dummy variables when looking at interactions

[Some advice below from Thom Baguley]

Generally one forms the product terms for the interaction in the same way by multiplying the two predictors. If you have a categorical predictor with more than k = 2 values this means (k-1) product terms to capture the interaction effect. Collinearity should not be an issue here. The test of the interaction (the change in model F when you add all the product terms) is unaffected by the collinearity with the main effects.

Where it does have an impact is on the the interpretation of the coefficients for the indicator variables representing the main effects. This interpretation depends on the parameterisation of the model. For example, with dummy coding parameterisation the interpretation of one group becomes the intercept and the others differences from the intercept and the product term coefficients the slope for the intercept and differences in slopes for the other categories. You can change the parameterisation using effect coding (which works a bit like entering in this context) and generally it is always sensible to centre continuous predictors prior to calculating product terms.

However, my advice is generally to focus on interpreting effects looking at the adjusted means or equivalently the model predictors - ideally graphically if it looks like there might be interactions.

If all the interactions are with categorical predictors like gender I would consider effect coding these. This gives you a parameterisation similar to an ANOVA. Some software will more-or-less do that for you if you run the model as an ANCOVA (it isn't quite the same but has a similar interpretation). To do this manually:

dummy

effect

male

0

-0.5

female

1

+0.5

Classically effect coding using -1 and +1 but I usually prefer -0.5 and + 0.5. The latter means there is a 1 unit difference between the groups and this the slope represents the difference in groups as it would for dummy coding (not half the difference in groups). In a balanced design the intercept is now the grand mean.

With more than two categories it gets messier but you can extend effect coding like this:

effect 1

effect 2

cat1

-0.5

-0.5

cat2

0

+0.5

cat3

+0.5

0

Depending on software running this as an ANCOVA might be the simplest option (this is effectively a regression where the categorical predictors are parameterised in a particular way). In either case it is probably a very good idea to centre any continuous covariates before you do anything else.