1

Optimal cut-offs for splitting a variable into groups for correlating with an outcome variable

Gelman and Park (2009) suggest it is optimal (in having accurately measured regression coefficients with small variances) to split a continuous variable into three segments coding the lowest values (in either the lowest quarter or third) as -0.5, the middle section (between lowest quartile/third and upper quartile/third) as 0 and the highest values (in upper third or upper quartile) as 0.5 (see page 2 of the paper). In this way the regression coefficient for the coded variable represents the difference between the means in the highest and lowest sections. Such a splitting is shown to perform efficiently if the predictor is uniformly or normally distributed.

Semi-partial correlations and regression coefficients can be obtained in the usual way with the regression coefficient corresponding to the difference in the means assuming the values of the other predictors are held constant (see page 6 of the paper).

Reference

Gelman A and Park D (2009) Splitting a predictor at the upper quarter or third and the lower quarter or third. The American Statistician 63(1) 1-8.