<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>FAQ/alexR</title><revhistory><revision><revnumber>4</revnumber><date>2018-04-17 11:15:27</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>3</revnumber><date>2018-04-17 11:15:15</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>2</revnumber><date>2018-04-17 11:12:50</date><authorinitials>PeterWatson</authorinitials></revision><revision><revnumber>1</revnumber><date>2018-04-17 11:09:28</date><authorinitials>PeterWatson</authorinitials></revision></revhistory></articleinfo><section><title>Reproduction of Alex Quent's simulations to assess accuracy of lme4 procedures in R fitting mixed fixed and random effect models</title><para>(The below is reproduced from Alex Quent's webpage at <ulink url="https://jaquent.github.io/2018/0314_mixedModels.html"/>). </para><para>14 March 2018 </para><para>•Preface ◦Libraries &amp; functions ◦Model 1: Simple regression model with 1 level ◦Model 2: A random intercept for each subject ◦Model 3: Random intercept and slope for each subject ◦Model 4: Random intercepts for each object and each subject ◦Model 5: Random intercepts for each object, each subject, quadratic term and random slopes for subjects ◾Model 5.1 Model 5.2 Inaccurate estimations of the random intercepts for each subject </para><para>Conclusion </para><para>Preface </para><para>This work documents my attempt to understand the model structure of mixed linear models or multilevel models. I adapted a formula that I found on Wikipedia and work myself through the different models and test whether I am able to retrieve the different parameters with lmerTest. Bodo Winter’s tutorial on the lme was very useful for this. The main motivation for doing all this is to use basically model 5 in my analysis of my first experiment. This work is also part of the process of pre-registering the study with a complete analysis script, which I will upload soon. </para><para>The design of the fictive experiment that 100 subjects see 20 objects and provide a continuous measure of memory for each objects. The expectancy of the objects (X  X ) servers as a predictor for the memory performance (Y  Y ). </para><para>I start with very easy models and add more effects in order to finally arrive at my target model (model 5). </para><screen><![CDATA[Libraries & functions
library(lmerTest)
library(ggplot2)
]]><![CDATA[
pValue <-function(x, sign = '='){
  if (inherits(x, "lm")){
    s <- summary.lm(x)
    x <- pf(s$fstatistic[1L], s$fstatistic[2L], s$fstatistic[3L], lower.tail = FALSE)
    if(x > 1){
      stop("There is no p-value greater than 1")
    } else if(x < 0.001){
      x.converted <- '< .001'
    } else{
      x.converted <- paste(sign,substr(as.character(round(x, 3)), 2,5))
    } 
  } else {
    if(x > 1){
      stop("There is no p-value greater than 1")
    } else if(x < 0.001){
      x.converted <- '< .001'
    } else{
      x.converted <- paste(sign,substr(as.character(round(x, 3)), 2,5))
    } 
  }
  return(x.converted)
}
]]><![CDATA[
rValue <-function(x){
  if (inherits(x, "lm")){
    r.squared <- summary(x)$r.squared
    x.converted <- paste('=',substr(as.character(round(r.squared, 3)), 2,5)) 
  } else {
    if (x < 0){
      x.converted <- paste('= -',substr(as.character(abs(round(x, 3))), 2,5), sep = '') 
    } else {
      x.converted <- paste('=',substr(as.character(abs(round(x, 3))), 2,5)) 
    }
  }
  return(x.converted) 
}]]></screen><para>Model 1: Simple regression model with 1 level </para><para>Assume that expectancy and memory performance was aggregated across subjects.In this model, objects’ memorability is predicted by their average expectancy values. The corresponding formula looks like this:  Y i =β 0 +β 1 X i +e i   Yi=β0+β1Xi+ei </para><para>•Y i   Yi </para><itemizedlist><listitem override="none"><para>refers to memorability of object i. </para></listitem></itemizedlist><para>•X i   Xi </para><itemizedlist><listitem override="none"><para>refers to average expectancy of object i. </para></listitem></itemizedlist><para>•β 0   β0 </para><itemizedlist><listitem override="none"><para>refers to the intercept. </para></listitem></itemizedlist><para>•β 1   β1 </para><itemizedlist><listitem override="none"><para>refers to the linear slope. </para></listitem></itemizedlist><para>•e i   ei </para><itemizedlist><listitem override="none"><para>refers to the random errors of prediction. </para></listitem></itemizedlist><screen><![CDATA[# Generating data set
# General values and variables
numObj <- 20
e      <- rnorm(numObj, mean = 0, sd = 0.1)
x      <- scale(runif(numObj, min = -100, max = 100))
y      <- c()
]]><![CDATA[
# Coefficients
beta0  <- 8
beta1  <- 0.3
]]><![CDATA[
for(i in 1:numObj){
  y[i] <- beta0 + beta1*x[i] + e[i]
}
]]><![CDATA[
dataFrame1 <- data.frame(y = y, x = x, objNum = factor(1:20)) 
]]><![CDATA[
model1 <- lm(y ~  x, data = dataFrame1)
summary(model1)]]></screen><screen><![CDATA[## 
## Call:
## lm(formula = y ~ x, data = dataFrame1)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.238310 -0.071472  0.007047  0.064316  0.237211 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.99868    0.02806 285.055  < 2e-16 ***
## x            0.28629    0.02879   9.944 9.73e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1255 on 18 degrees of freedom
## Multiple R-squared:  0.846,  Adjusted R-squared:  0.8375 
## F-statistic: 98.89 on 1 and 18 DF,  p-value: 9.731e-09]]></screen><para>Unsurprisingly, lm returns estimates that are very close the the true values β 0  ^ =  β0^= </para><itemizedlist><listitem override="none"><para>7.9986752 (vs 8) and β 1  ^ =  </para></listitem></itemizedlist><para>β1^= </para><itemizedlist><listitem override="none"><para>0.2862898 (vs 0.3). </para></listitem></itemizedlist><para>Model 2: A random intercept for each subject </para><para>Now, I don’t aggregated anymore and there is one value for each subject and each object and I allow for different intercepts for each subject, which basically means that I assume that my subjects’ general memory performance varies. </para><para>Level 1:  Y ij =β 0j +β 1 X ij +e ij   Yij=β0j+β1Xij+eij </para><itemizedlist><listitem override="none"><para>Level 2:  </para></listitem></itemizedlist><para>β 0j =γ 00 +γ 01 W j +u 0j   β0j=γ00+γ01Wj+u0j </para><para>•Y ij   Yij </para><itemizedlist><listitem override="none"><para>refers to memory score for object i and subject j. </para></listitem></itemizedlist><para>•β 0j   β0j </para><itemizedlist><listitem override="none"><para>refers to the intercept of the dependent variable in subject j (level 2) in other words the individual difference in general performance across subjects. </para></listitem></itemizedlist><para>•β 1   β1 </para><itemizedlist><listitem override="none"><para>refers to the linear slope. </para></listitem></itemizedlist><para>•X ij   Xij </para><itemizedlist><listitem override="none"><para>refers to the level 1 predictor. </para></listitem></itemizedlist><para>•e ij   eij </para><itemizedlist><listitem override="none"><para>refers to the random errors of prediction for the level 1 equation. </para></listitem></itemizedlist><para>•γ 00   γ00 </para><itemizedlist><listitem override="none"><para>refers to the overall intercept. This is the grand mean of the scores on the dependent variable across all the subjects when all the predictors are equal to 0. </para></listitem></itemizedlist><para>•γ 01   γ01 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W j   Wj </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor. </para></listitem></itemizedlist><para>•u 0j   u0j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a subject from the overall intercept. </para></listitem></itemizedlist><screen><![CDATA[# Generating data set
# General values and variables
numObj <- 20
numSub <- 100
e      <- rnorm(numObj * numSub, mean = 0, sd = 0.1)
x      <- scale(runif(numObj * numSub, min = -100, max = 100))
y      <- c()
index  <- 1
]]><![CDATA[
# Coefficients
gamma00 <- 18
gamma01 <- 0.5
beta1   <- -100
w       <- runif(numSub, min = -3, max = 3)
u0      <- rnorm(numSub, mean = 0, sd = 0.1)
meanBeta0 <- mean(gamma00 + gamma01*w + u0) # I should be able to retrieve that parameter. 
]]><![CDATA[
for(j in 1:numSub){
  for(i in 1:numObj){
    y[index] <- gamma00 + gamma01*w[j]+ u0[j] + beta1*x[index] + e[index]
    index <- index + 1
  } 
}
]]><![CDATA[
dataFrame2 <- data.frame(y = y, x = x, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub))) 
]]><![CDATA[
model2 <- lmer(y ~  x + 
                 (1 | subNo), data = dataFrame2)
summary(model2)]]></screen><screen><![CDATA[## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: y ~ x + (1 | subNo)
##    Data: dataFrame2
## 
## REML criterion at convergence: -2903.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.06286 -0.66375  0.00169  0.65728  2.99181 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subNo    (Intercept) 0.656088 0.80999 
##  Residual             0.009494 0.09744 
## Number of obs: 2000, groups:  subNo, 100
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  1.799e+01  8.103e-02  9.900e+01     222   <2e-16 ***
## x           -1.000e+02  2.236e-03  1.899e+03  -44729   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## x 0.000
anova(model2)
## Analysis of Variance Table of type III  with  Satterthwaite 
## approximation for degrees of freedom
##     Sum Sq  Mean Sq NumDF  DenDF    F.value    Pr(>F)    
## x 18995109 18995109     1 1899.2 2000646776 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Extract intercept for each subject by coef(model2). ]]></screen><para>As you can see above (fixed effects), β 0.  ^ =  β0.^= </para><itemizedlist><listitem override="none"><para>17.986858 is approximating the true (17.9836611). The same is true for β 1  ^ =  </para></listitem></itemizedlist><para>β1^= </para><itemizedlist><listitem override="none"><para>-99.9995396 (vs -100). The estimated level 1 intercept for subject 1 was also close to the true value (17.8598629 vs 17.8225987). </para></listitem></itemizedlist><para>Model 3: Random intercept and slope for each subject </para><para>In the next model, I am additionally allowing the slope randomly different for each subject. This means that the strength of the relationship between the level 1 predictor X  X </para><itemizedlist><listitem override="none"><para>and the memory scores Y  </para></listitem></itemizedlist><para>Y </para><itemizedlist><listitem override="none"><para>varies across subjects. </para></listitem></itemizedlist><para>Level 1:  Y ij =β 0j +β 1j X ij +e ij   Yij=β0j+β1jXij+eij </para><para>Level 2:  β 0j =γ 00 +γ 01 W j +u 0j   β0j=γ00+γ01Wj+u0j </para><para>β 1j =γ 10 +u 1j   β1j=γ10+u1j </para><para>•Y ij   Yij </para><itemizedlist><listitem override="none"><para>refers to memory score for object i and subject j. </para></listitem></itemizedlist><para>•β 0j   β0j </para><itemizedlist><listitem override="none"><para>refers to the intercept of the dependent variable in subject j (level 2) in other words the individual difference in general performance across subjects. </para></listitem></itemizedlist><para>•β 1j   β1j </para><itemizedlist><listitem override="none"><para>refers to the linear slope that varies across subjects in other words the relationship X  </para></listitem></itemizedlist><para>X </para><itemizedlist><listitem override="none"><para>and Y  </para></listitem></itemizedlist><para>Y </para><itemizedlist><listitem override="none"><para>is different across subjects. </para></listitem></itemizedlist><para>•X ij   Xij </para><itemizedlist><listitem override="none"><para>refers to the level 1 predictor. </para></listitem></itemizedlist><para>•e ij   eij </para><itemizedlist><listitem override="none"><para>refers to the random errors of prediction for the level 1 equation. </para></listitem></itemizedlist><para>•γ 00   γ00 </para><itemizedlist><listitem override="none"><para>refers to the overall intercept. This is the grand mean of the scores on the dependent variable across all the subjects when all the predictors are equal to 0. </para></listitem></itemizedlist><para>•γ 01   γ01 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W j   Wj </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor. </para></listitem></itemizedlist><para>•u 0j   u0j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a subject from the overall intercept. </para></listitem></itemizedlist><para>•γ 10   γ10 </para><itemizedlist><listitem override="none"><para>overall slope of the level 2. </para></listitem></itemizedlist><para>•u 1j   u1j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the slope of a subject from the overall intercept. </para></listitem></itemizedlist><screen><![CDATA[# Generating data set
# General values and variables
numObj <- 20
numSub <- 100
e      <- rnorm(numObj * numSub, mean = 0, sd = 0.1)
x      <- scale(runif(numObj * numSub, min = -100, max = 100))
y      <- c()
index  <- 1
]]><![CDATA[
# Coefficients
gamma00 <- 18
gamma01 <- 0.5
gamma10 <- -100
w       <- runif(numSub, min = -3, max = 3)
u0      <- rnorm(numSub, mean = 0, sd = 0.1)
u1      <- rnorm(numSub, mean = 0, sd = 0.1)
meanBeta0 <- mean(gamma00 + gamma01*w + u0) # I should be able to retrieve that parameter. 
]]><![CDATA[
for(j in 1:numSub){
  for(i in 1:numObj){
    y[index] <- gamma00 + gamma01*w[j]+ u0[j] + (gamma10 + u1[j])*x[index] + e[index]
    index <- index + 1
  } 
}
]]><![CDATA[
dataFrame3 <- data.frame(y = y, x = x, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub))) 
]]><![CDATA[
model3 <- lmer(y ~  x + 
                 (1 + x | subNo), data = dataFrame3)
summary(model3)]]></screen><screen><![CDATA[## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: y ~ x + (1 + x | subNo)
##    Data: dataFrame3
## 
## REML criterion at convergence: -2448.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.07599 -0.66806 -0.03042  0.68046  2.93575 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  subNo    (Intercept) 0.81640  0.9035        
##           x           0.01158  0.1076   -0.11
##  Residual             0.01016  0.1008        
## Number of obs: 2000, groups:  subNo, 100
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   17.95924    0.09039   99.04000   198.7   <2e-16 ***
## x           -100.01190    0.01102   98.83000 -9073.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## x -0.106]]></screen><screen><![CDATA[anova(model3)]]></screen><screen><![CDATA[## Analysis of Variance Table of type III  with  Satterthwaite 
## approximation for degrees of freedom
##   Sum Sq Mean Sq NumDF  DenDF  F.value    Pr(>F)    
## x 836595  836595     1 98.831 82319857 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
}}
]]><![CDATA[
Again, β 0.  ^ = 
β0.^=
 17.986858 and β 1.  ^ = 
β1.^=
 -99.9995396 were correctly estimated (i.e. 18 vs -100) for level 1. Level 2 estimates for subject 1 were also correctly estimated: β 01  ^  
β01^
 = 17.5757378 vs β 11  
β11
 = 17.5971511 and β 11  ^  
β11^
 = -100.0583105 vs β 11  
β11
 = -100.0599873.]]></screen><para>Model 4: Random intercepts for each object and each subject </para><para>As explained, above each subject rates each of twenty objects. In addition to the varying performance among the participants, one can assume that there are objects, which can be remembered more easily than others. Therefore, I add a random intercept for each object representing at varying baseline memorability. </para><para>Level 1:  Y ij =β 0ij +β 1 X ij +e ij   Yij=β0ij+β1Xij+eij </para><para>Level 2:  β 0ij =γ 00 +γ 01 W 1i +u 01i +γ 02 W 2j +u 02j   β0ij=γ00+γ01W1i+u01i+γ02W2j+u02j </para><para>•Y ij   Yij </para><itemizedlist><listitem override="none"><para>refers to memory score for object i and subject j. </para></listitem></itemizedlist><para>•β 0ij   β0ij </para><itemizedlist><listitem override="none"><para>refers to the intercept of the dependent variable that is different for each subject and object. </para></listitem></itemizedlist><para>•β 1   β1 </para><itemizedlist><listitem override="none"><para>refers to the linear slope. </para></listitem></itemizedlist><para>•X ij   Xij </para><itemizedlist><listitem override="none"><para>refers to the level 1 predictor. </para></listitem></itemizedlist><para>•e ij   eij </para><itemizedlist><listitem override="none"><para>refers to the random errors of prediction for the level 1 equation. </para></listitem></itemizedlist><para>•γ 00   γ00 </para><itemizedlist><listitem override="none"><para>refers to the overall intercept. This is the grand mean of the scores on the dependent variable across all the subjects when all the predictors are equal to 0. </para></listitem></itemizedlist><para>•γ 01   γ01 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W 1i   W1i </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor for objects. </para></listitem></itemizedlist><para>•u 01i   u01i </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a object from the overall intercept. </para></listitem></itemizedlist><para>•γ 02   γ02 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W 2j   W2j </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor for subjects. </para></listitem></itemizedlist><para>•u 02j   u02j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a subject from the overall intercept. </para></listitem></itemizedlist><screen><![CDATA[# Generating data set
# General values and variables
numObj <- 20
numSub <- 100
e      <- rnorm(numObj * numSub, mean = 0, sd = 0.1)
x      <- scale(runif(numObj * numSub, min = -100, max = 100))
y      <- c()
index  <- 1
]]><![CDATA[
# Coefficients
gamma00 <- 18
gamma01 <- 0.5
gamma02 <- -10
beta1    <- -100
w1       <- runif(numObj, min = 0, max = 3)
w2       <- runif(numSub, min = -3, max = 3)
u01       <- rnorm(numObj, mean = 0, sd = 0.1)
u02       <- rnorm(numSub, mean = 0, sd = 0.1)
]]><![CDATA[
for(j in 1:numSub){
  for(i in 1:numObj){
    y[index] <-gamma00 + gamma01*w1[i] + u01[i] + gamma02*w2[j] + u02[j] + beta1*x[index] + e[index]
    index <- index + 1
  } 
}
]]><![CDATA[
dataFrame4 <- data.frame(y = y, x = x, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub))) 
]]><![CDATA[
model4 <- lmer(y ~  x + 
                 (1 | subNo) + 
                 (1 | objNum), data = dataFrame4)
summary(model4)]]></screen><screen><![CDATA[## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: y ~ x + (1 | subNo) + (1 | objNum)
##    Data: dataFrame4
## 
## REML criterion at convergence: -2014.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4633 -0.6605  0.0095  0.6576  3.3734 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  subNo    (Intercept) 309.20491 17.5842 
##  objNum   (Intercept)   0.27440  0.5238 
##  Residual               0.01016  0.1008 
## Number of obs: 2000, groups:  subNo, 100; objNum, 20
## 
## Fixed effects:
##               Estimate Std. Error         df   t value Pr(>|t|)    
## (Intercept)  1.931e+01  1.762e+00  1.237e+03     10.96   <2e-16 ***
## x           -1.000e+02  2.316e-03  1.976e+03 -43184.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##   (Intr)
## x 0.000
anova(model4)
## Analysis of Variance Table of type III  with  Satterthwaite 
## approximation for degrees of freedom
##     Sum Sq  Mean Sq NumDF  DenDF    F.value    Pr(>F)    
## x 18955582 18955582     1 1975.6 1864937637 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1]]></screen><para>Firstly, on level 1 estimates were close to the true values: β 0..  ^ =  β0..^= </para><itemizedlist><listitem override="none"><para>19.3131619 (vs 19.3127994) as well as β 1  ^ =  </para></listitem></itemizedlist><para>β1^= </para><itemizedlist><listitem override="none"><para>-100.0025395 (vs -100). Now looking at the level 2 estimates: the intercept for the object 1 across all subjects is β 01.  ^   </para></listitem></itemizedlist><para>β01.^ </para><itemizedlist><listitem override="none"><para>= 19.1269314 (vs 19.1249384), while the intercept for subject 1 across all objects is β 0.1  ^   </para></listitem></itemizedlist><para>β0.1^ </para><itemizedlist><listitem override="none"><para>= -2.8745026 (vs -2.8654707). </para></listitem></itemizedlist><para>Model 5: Random intercepts for each object, each subject, quadratic term and random slopes for subjects </para><para>Model 5.1 </para><para>I am finally arriving at the model, which I want to use to model my data in the upcoming analysis. This models contains random intercepts for each object and each subject. A second quadratic term is added to test for non-linearities. Both the slope of the linear term and the slope of the quadratic term are random for each subject. At this point it is crucial to note that the predictor X  X </para><itemizedlist><listitem override="none"><para>is random for each subject and object just as if the subjects had to rate the expectancy of each object after their memory was tested. This will be the case of my experiment, but I also collected normative data, where X  </para></listitem></itemizedlist><para>X </para><itemizedlist><listitem override="none"><para>varies only across objects representing aggregate values (see model 5.2). </para></listitem></itemizedlist><para>Level 1:  Y ij =β 0ij +β 1j X ij +β 2j X 2 ij +e ij   Yij=β0ij+β1jXij+β2jXij2+eij </para><itemizedlist><listitem override="none"><para>Level 2:  </para></listitem></itemizedlist><para>β 0ij =γ 00 +γ 01 W 1i +u 01i +γ 02 W 2j +u 02j   β0ij=γ00+γ01W1i+u01i+γ02W2j+u02j </para><para>β 1j =γ 10 +u 1j   β1j=γ10+u1j </para><para>β 2j =γ 20 +u 2j   β2j=γ20+u2j </para><para>•Y ij   Yij </para><itemizedlist><listitem override="none"><para>refers to memory score for object i and subject j. </para></listitem></itemizedlist><para>•β 0ij   β0ij </para><itemizedlist><listitem override="none"><para>refers to the intercept of the dependent variable that is different for each subject and object. </para></listitem></itemizedlist><para>•β 1j   β1j </para><itemizedlist><listitem override="none"><para>refers to the linear slope for the relationship in subject j (level 2) between the level 1 predictor and the dependent variable. </para></listitem></itemizedlist><para>•X ij   Xij </para><itemizedlist><listitem override="none"><para>refers to the linear level 1 predictor that varies for across objects and subjects. </para></listitem></itemizedlist><para>•β 2j   β2j </para><itemizedlist><listitem override="none"><para>refers to the quadratic slope for the relationship in subject j (level 2) between the level 1 predictor and the dependent variable. </para></listitem></itemizedlist><para>•X 2 ij   Xij2 </para><itemizedlist><listitem override="none"><para>refers to the quadratic level 1 predictor that varies for across objects and subjects. </para></listitem></itemizedlist><para>•e ij   eij </para><itemizedlist><listitem override="none"><para>refers to the random errors of prediction for the level 1 equation. </para></listitem></itemizedlist><para>•γ 00   γ00 </para><itemizedlist><listitem override="none"><para>refers to the overall intercept. This is the grand mean of the scores on the dependent variable across all the subjects when all the predictors are equal to 0. </para></listitem></itemizedlist><para>•γ 01   γ01 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W 1i   W1i </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor for objects. </para></listitem></itemizedlist><para>•u 01i   u01i </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a object from the overall intercept. </para></listitem></itemizedlist><para>•γ 02   γ02 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W 2j   W2j </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor for subjects. </para></listitem></itemizedlist><para>•u 02j   u02j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a subject from the overall intercept. </para></listitem></itemizedlist><para>•γ 10   γ10 </para><itemizedlist><listitem override="none"><para>overall linear slope of the level 2. </para></listitem></itemizedlist><para>•u 1j   u1j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation from the linear slope of a subject from the overall intercept. </para></listitem></itemizedlist><para>•γ 20   γ20 </para><itemizedlist><listitem override="none"><para>overall quadratic slope of the level 2. </para></listitem></itemizedlist><para>•u 2j   u2j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation from the quadratic slope of a subject from the overall intercept. </para></listitem></itemizedlist><screen><![CDATA[# Generating data set
# General values
numObj <- 20
numSub <- 100
index  <- 1
y      <- c()
x      <- scale(runif(numObj * numSub, min = -100, max = 100)) # randomly varies across object and subjects as post ratings, but the problem here could be that this assumes that the ratings for the objects across subjects are completely uncorrelated
x2     <- x*x
e      <- rnorm(numObj * numSub, mean = 0, sd = 0.1) 
]]><![CDATA[
# Coefficients
gamma00 <- 0.5
gamma01 <- 2
gamma02 <- 8
gamma10 <- 0
gamma20 <- 10
]]><![CDATA[
# Varying stuff
w1 <- runif(numObj, min = -4, max = 4)
w2 <- runif(numSub, min =  0, max = 4)
u01 <- rnorm(numObj, mean = 0, sd = 0.1)
u02 <- rnorm(numSub, mean = 0, sd = 0.1)
u1 <- rnorm(numSub, mean = 0, sd = 0.1)
u2 <- rnorm(numSub, mean = 0, sd = 0.1)
]]><![CDATA[
for(j in 1:numSub){
  for(i in 1:numObj){
    # Long format
    y[index] <- gamma00 + gamma01 * w1[i]  + u01[i] + gamma02 * w2[j] + u02[j] + (gamma10 + u1[j]) * x[index] + (gamma20 + u2[j]) * x2[index] + e[index]
    index <- index + 1
  }
}
]]><![CDATA[
dataFrame5 <- data.frame(y = y, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub)), x = x, x2 = x2)
]]><![CDATA[
model5 <- lmer(y ~  x + 
                    x2 + 
                    (1 | subNo) + 
                    (1 | objNum) + 
                    (x2|subNo) +
                    (x|subNo), data = dataFrame5)
]]><![CDATA[
summary(model5)]]></screen><screen><![CDATA[## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: y ~ x + x2 + (1 | subNo) + (1 | objNum) + (x2 | subNo) + (x |  
##     subNo)
##    Data: dataFrame5
## 
## REML criterion at convergence: -1561.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.03566 -0.62912 -0.02829  0.58142  2.71312 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr
##  subNo    (Intercept) 17.696956 4.20678      
##           x            0.007460 0.08637  0.14
##  subNo.1  (Intercept) 50.097104 7.07793      
##           x2           0.011510 0.10729  0.14
##  subNo.2  (Intercept) 23.614826 4.85951      
##  objNum   (Intercept) 22.855930 4.78079      
##  Residual              0.009808 0.09903      
## Number of obs: 2000, groups:  subNo, 100; objNum, 20
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  16.745775   1.434189 284.140000  11.676   <2e-16 ***
## x            -0.006660   0.008967  99.950000  -0.743    0.459    
## x2           10.011245   0.011079  99.070000 903.628   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##    (Intr) x    
## x  0.038       
## x2 0.067  0.000
anova(model5)
## Analysis of Variance Table of type III  with  Satterthwaite 
## approximation for degrees of freedom
##    Sum Sq Mean Sq NumDF  DenDF F.value Pr(>F)    
## x     0.0     0.0     1 99.955       1 0.4594    
## x2 8008.6  8008.6     1 99.073  816544 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
corrResult <- cor.test(coef(model5)$subNo$`(Intercept)`, gamma00 + mean(gamma01*w1 + u01) + gamma02*w2 + u02)]]></screen><para>The intercept of object 1 across all subjects is β 01.  ^ =  β01.^= </para><itemizedlist><listitem override="none"><para>14.4912111 (vs 14.4971756), while subject’s 1 performance across all objects was β 0.1  ^ =  </para></listitem></itemizedlist><para>β0.1^= </para><itemizedlist><listitem override="none"><para>18.9779272 (vs 20.0973103). Compared to the other estimates, the last estimate is quite far off. However the estimates are still highly correlated with the true values, r = .969, p &lt; .001. However on some simulations they are not, which I realised after running this script a couple of times. I have no idea why this is the case. On level 2, the linear slope for subject 1 was β 11  ^ =  </para></listitem></itemizedlist><para>β11^= </para><itemizedlist><listitem override="none"><para>0.035072 (vs 0.0042065), while the quadratic slope was β 21  ^ =  </para></listitem></itemizedlist><para>β21^= </para><itemizedlist><listitem override="none"><para>9.9665142 (vs 9.988436). </para></listitem></itemizedlist><para>Model 5.2 </para><para>This model is basically the same than the previous except one different. X  X </para><itemizedlist><listitem override="none"><para>varies only across objects like as if it was aggregated values from a different population. This is case in the experiment, in which one could use the normative data instead of participants’ ratings to predict the memory score. </para></listitem></itemizedlist><para>Level 1:  Y ij =β 0ij +β 1j X i +β 2j X 2 i +e ij   Yij=β0ij+β1jXi+β2jXi2+eij </para><itemizedlist><listitem override="none"><para>Level 2:  </para></listitem></itemizedlist><para>β 0ij =γ 00 +γ 01 W 1i +u 01i +γ 02 W 2j +u 02j   β0ij=γ00+γ01W1i+u01i+γ02W2j+u02j </para><para>β 1j =γ 10 +u 1j   β1j=γ10+u1j </para><para>β 2j =γ 20 +u 2j   β2j=γ20+u2j </para><para>•Y ij   Yij </para><itemizedlist><listitem override="none"><para>refers to memory score for object i and subject j. </para></listitem></itemizedlist><para>•β 0ij   β0ij </para><itemizedlist><listitem override="none"><para>refers to the intercept of the dependent variable that is different for each subject and object. </para></listitem></itemizedlist><para>•β 1j   β1j </para><itemizedlist><listitem override="none"><para>refers to the linear slope for the relationship in subject j (level 2) between the level 1 predictor and the dependent variable. </para></listitem></itemizedlist><para>•X i   Xi </para><itemizedlist><listitem override="none"><para>refers to the linear level 1 predictor that varies for across objects but not across subjects. </para></listitem></itemizedlist><para>•β 2j   β2j </para><itemizedlist><listitem override="none"><para>refers to the quadratic slope for the relationship in subject j (level 2) between the level 1 predictor and the dependent variable. </para></listitem></itemizedlist><para>•X 2 i   Xi2 </para><itemizedlist><listitem override="none"><para>refers to the quadratic level 1 predictor that varies for across objects but not across subjects. </para></listitem></itemizedlist><para>•e ij   eij </para><itemizedlist><listitem override="none"><para>refers to the random errors of prediction for the level 1 equation. </para></listitem></itemizedlist><para>•γ 00   γ00 </para><itemizedlist><listitem override="none"><para>refers to the overall intercept. This is the grand mean of the scores on the dependent variable across all the subjects when all the predictors are equal to 0. </para></listitem></itemizedlist><para>•γ 01   γ01 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W 1i   W1i </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor for objects. </para></listitem></itemizedlist><para>•u 01i   u01i </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a object from the overall intercept. </para></listitem></itemizedlist><para>•γ 02   γ02 </para><itemizedlist><listitem override="none"><para>refers to the overall regression coefficient, or the slope, between the dependent variable and the level 2 predictor. </para></listitem></itemizedlist><para>•W 2j   W2j </para><itemizedlist><listitem override="none"><para>refers to the level 2 predictor for subjects. </para></listitem></itemizedlist><para>•u 02j   u02j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation of the intercept of a subject from the overall intercept. </para></listitem></itemizedlist><para>•γ 10   γ10 </para><itemizedlist><listitem override="none"><para>overall linear slope of the level 2. </para></listitem></itemizedlist><para>•u 1j   u1j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation from the linear slope of a subject from the overall intercept. </para></listitem></itemizedlist><para>•γ 20   γ20 </para><itemizedlist><listitem override="none"><para>overall quadratic slope of the level 2. </para></listitem></itemizedlist><para>•u 2j   u2j </para><itemizedlist><listitem override="none"><para>refers to the random error component for the deviation from the quadratic slope of a subject from the overall intercept. </para></listitem></itemizedlist><screen><![CDATA[# Generating data set
# General values
numObj <- 20
numSub <- 100
index  <- 1
y      <- c()
x      <- scale(runif(numObj, min = -100, max = 100)) # randomly varies across object and subjects as post ratings, but the problem here could be that this assumes that the ratings for the objects across subjects are completely uncorrelated
x2     <- x*x
e      <- rnorm(numObj * numSub, mean = 0, sd = 0.1) 
]]><![CDATA[
# Coefficients
gamma00 <- 0.5
gamma01 <- 2
gamma02 <- 8
gamma10 <- 0
gamma20 <- 10
]]><![CDATA[
# Varying stuff
w1 <- runif(numObj, min = -4, max = 4)
w2 <- runif(numSub, min =  0, max = 4)
u01 <- rnorm(numObj, mean = 0, sd = 0.1)
u02 <- rnorm(numSub, mean = 0, sd = 0.1)
u1 <- rnorm(numSub, mean = 0, sd = 0.1)
u2 <- rnorm(numSub, mean = 0, sd = 0.1)
]]><![CDATA[
for(j in 1:numSub){
  for(i in 1:numObj){
    # Long format
    y[index] <- gamma00 + gamma01 * w1[i]  + u01[i] + gamma02 * w2[j] + u02[j] + (gamma10 + u1[j]) * x[i] + (gamma20 + u2[j]) * x2[i] + e[index]
    index <- index + 1
  }
}
]]><![CDATA[
dataFrame5 <- data.frame(y = y, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub)), x = x, x2 = x2)
]]><![CDATA[
model5 <- lmer(y ~  x + 
                    x2 + 
                    (1 | subNo) + 
                    (1 | objNum) + 
                    (x2|subNo) +
                    (x|subNo), data = dataFrame5)
]]><![CDATA[
summary(model5)]]></screen><screen><![CDATA[## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + x2 + (1 | subNo) + (1 | objNum) + (x2 | subNo) + (x |  
##     subNo)
##    Data: dataFrame5
## 
## REML criterion at convergence: -1582.8
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.08286 -0.61913  0.01853  0.62207  2.69470 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  subNo    (Intercept) 26.932979 5.18970       
##           x            0.008787 0.09374  0.19 
##  subNo.1  (Intercept)  0.055574 0.23574       
##           x2           0.010822 0.10403  -0.28
##  subNo.2  (Intercept) 61.928370 7.86946       
##  objNum   (Intercept) 18.814142 4.33753       
##  Residual              0.009716 0.09857       
## Number of obs: 2000, groups:  subNo, 100; objNum, 20
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 16.89157    1.70828   9.888
## x           -0.06481    1.06457  -0.061
## x2           9.09754    1.09819   8.284
## 
## Correlation of Fixed Effects:
##    (Intr) x     
## x  -0.216       
## x2 -0.611  0.355
## convergence code: 0
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 2 negative eigenvalues
anova(model5)
## Analysis of Variance Table
##    Df  Sum Sq Mean Sq F value
## x   1 0.10030 0.10030  10.323
## x2  1 0.66676 0.66676  68.627]]></screen><screen><![CDATA[corrResult <- cor.test(coef(model5)$subNo$`(Intercept)`, gamma00 + mean(gamma01*w1 + u01) + gamma02*w2 + u02)]]></screen><para>The intercept of object 1 across all subjects is β 01.  ^ =  β01.^= </para><itemizedlist><listitem override="none"><para>18.1767803 (vs 18.1734038), while subject’s 1 performance across all objects was β 0.1  ^ =  </para></listitem></itemizedlist><para>β0.1^= </para><itemizedlist><listitem override="none"><para>20.1355938 (vs 17.1431682). Compared to the other estimates, the last estimate is quite far off. However the estimates are still highly correlated with the true values, r = .973, p &lt; .001. However on some simulations they are not, which I realised after running this script a couple of times. I have no idea why this is the case. On level 2, the linear slope for subject 1 was β 11  ^ =  </para></listitem></itemizedlist><para>β11^= </para><itemizedlist><listitem override="none"><para>0.0929432 (vs 0.0373702), while the quadratic slope was β 21  ^ =  </para></listitem></itemizedlist><para>β21^= </para><itemizedlist><listitem override="none"><para>9.8407853 (vs 8.9302937). </para></listitem></itemizedlist><para>Inaccurate estimations of the random intercepts for each subject </para><para>There seems to be a problem in both simulation above that the estimate of β 0.j  ^   β0.j^ </para><itemizedlist><listitem override="none"><para>was quite inaccurate sometimes, but this could be due the fact that I used 16 subjects in previous analyses. To test this, I run number of simulation with the same parameters. </para></listitem></itemizedlist><screen><![CDATA[n <- 100
# Fixed values:
  numObj <- 20
  numSub <- 100
  gamma00 <- 0.5
  gamma01 <- 2
  gamma02 <- 8
  gamma10 <- 0
  gamma20 <- 10
  corrResults <- c()
  
# Loop for n simulations
for(run in 1:n){
  # Generating data set
  # General values
  index  <- 1
  y      <- c()
  x      <- scale(runif(numObj, min = -100, max = 100)) # randomly varies across object and subjects as post ratings, but the problem here could be that this assumes that the ratings for the objects across subjects are completely uncorrelated
  x2     <- x*x
  e      <- rnorm(numObj * numSub, mean = 0, sd = 0.1) 
  # Varying stuff
  w1 <- runif(numObj, min = -4, max = 4)
  w2 <- runif(numSub, min =  0, max = 4)
  u01 <- rnorm(numObj, mean = 0, sd = 0.1)
  u02 <- rnorm(numSub, mean = 0, sd = 0.1)
  u1 <- rnorm(numSub, mean = 0, sd = 0.1)
  u2 <- rnorm(numSub, mean = 0, sd = 0.1)
  
  for(j in 1:numSub){
    for(i in 1:numObj){
      # Long format
      y[index] <- gamma00 + gamma01 * w1[i]  + u01[i] + gamma02 * w2[j] + u02[j] + (gamma10 + u1[j]) * x[i] + (gamma20 + u2[j]) * x2[i] + e[index]
      index <- index + 1
    }
  }
  
  dataFrame5 <- data.frame(y = y, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub)), x = x, x2 = x2)
  
  try(model5 <- lmer(y ~  x + 
                      x2 + 
                      (1 | subNo) + 
                      (1 | objNum) + 
                      (x2|subNo) +
                      (x|subNo), data = dataFrame5))
  
  corrResults[run] <- cor.test(coef(model5)$subNo$`(Intercept)`, gamma00 + mean(gamma01*w1 + u01) + gamma02*w2 + u02)$estimate
}]]></screen><para>Even when increasing the number of subjects to 100, there were models that couldn’t be estimated or had every low correlations coefficients. See the histogram below for the distribution. </para><screen><![CDATA[ggplot(data.frame(corrResults), aes(corrResults)) + geom_histogram() + theme(panel.margin = unit(1, "cm"), text = element_text(size = 12), plot.margin = margin(10, 10, 10, 10)) + 
        labs(y = 'Frequency', x = 'Correlation coefficients') + 
        coord_cartesian(xlim = c(0, 1), expand = FALSE)]]></screen></section></article>