These packages are needed to perform the analyses in this notebook, a brief description of what each does is listed below.
library(nlme) # for fitting traditional growth models
library(GLMMadaptive) # for fitting two part outcomes
library(MplusAutomation) # for using Mplus through R (Mplus must be installed on machine)
library(psych) # for tables of descriptive statistics
library(knitr) # for creating the pdf output of the R notebook and formatting tables
library(haven) # for reading in dataset from other programs
library(tidyverse) # for data wrangling
The haven package is useful for reading in a variety of datasets – here, our data is stored as a SAS file. We also change the type of our ID variable (NEWID) from numeric (the type as stored in SAS) to factor (an unordered character variable that uniquely identifies each child in the study). This can be accomplished using the mutate function in dplyr (part of the tidyverse), a function that allows new variables to be created or existing variables to be changed.
This datafile is in a long format, that is, each child (identified as NEWID) has multiple rows of data (identified as G3AGE) – one for each year of observation (3 in our example).
d <- read_sas("rdata.sas7bdat") %>%
mutate(NEWID = as.factor(NEWID))
Linear growth models are defined by an intercept (predicted outcome when time = 0) and a slope (rate of change in the outcome over time). For all of the growth models in this notebook, the intercept will be defined at age 15. Thus, it is necessary to center the time variable (G3AGE in our example) at age 15 by subtracting 15 from each score. We can use mutate to accomplish this task.
d1 <- d %>%
mutate(G3AGE15 = G3AGE - 15) # center G3's age at 15 years
The lme function is from the nlme package (https://cran.r-project.org/web/packages/nlme/nlme.pdf), which is a package for fitting multilevel models in R.
G3DEP is the outcome, it is regressed on a vector of 1’s (to define the intercept) and the time variable (G3AGE, which is centered at age 15 in our example). The intercept and slope are allowed to be random (vary across children). The random effects (intercept and slope) covary by default in the lme function. A maximum likelihood (ML) estimator is used.
The first line of the lme function defines the fixed effects part of the model, G3DEP ~ 1 + G3AGE15, which means that the dependent variable is regressed on an intercept and on the centered age variable. The second line defines the random effects part of the model, which means that there is a random intercept and a random slope for centered age.
demo_cont.1 <- lme(fixed = G3DEP ~ 1 + G3AGE15,
random = ~ 1 + G3AGE15 | NEWID,
data = d1, method = "ML")
summary(demo_cont.1) # request output
## Linear mixed-effects model fit by maximum likelihood
## Data: d1
## AIC BIC logLik
## 1255.232 1283.203 -621.616
##
## Random effects:
## Formula: ~1 + G3AGE15 | NEWID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.5388819 (Intr)
## G3AGE15 0.1419065 0.179
## Residual 0.3640428
##
## Fixed effects: G3DEP ~ 1 + G3AGE15
## Value Std.Error DF t-value p-value
## (Intercept) 1.1040030 0.03527507 508 31.296978 0.000
## G3AGE15 0.0364751 0.01840105 508 1.982228 0.048
## Correlation:
## (Intr)
## G3AGE15 0.092
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.0298849 -0.4812134 -0.0147335 0.4864610 3.1120097
##
## Number of Observations: 782
## Number of Groups: 273
VarCorr(demo_cont.1) # request variance/covariance matrix for random effects
## NEWID = pdLogChol(1 + G3AGE15)
## Variance StdDev Corr
## (Intercept) 0.29039366 0.5388819 (Intr)
## G3AGE15 0.02013747 0.1419065 0.179
## Residual 0.13252718 0.3640428
All of the Mplus models can be fit directly in Mplus or in R using the MplusAutomation package (https://cran.r-project.org/web/packages/MplusAutomation/MplusAutomation.pdf). MplusAutomation finds Mplus on the local computer and executes the provided commands. The benefits of MplusAutomation are that the likelihood of all findings being reproducible is increased (since the entire R notebook can be executed and rendered at once) and more flexibility in collating and presenting the results is provided.
If one were not using MplusAutomation, the code to run the same program directly in Mplus would be:
DATA: FILE = “df.dat”;
VARIABLE: NAMES = G3AGE NEWID G3DEP G3AGE15;
MISSING = .;
USEV = G3DEP G3AGE15;
CLUSTER = NEWID;
WITHIN = G3AGE15;
ANALYSIS:
ESTIMATOR = ML;
TYPE = twolevel random;
MODEL:
%within%
S | G3DEP on G3AGE15;
%between%
G3DEP;
S;
G3DEP with S;
Below, we explain each part of the Mplus code in detail, to orient the reader to the key components of the syntax for fitting growth models as multilevel models in Mplus.
First, in the DATA section, one provides the name of the data set to be read into Mplus.
In the VARIABLE section, first the NAMES of the variables in the data set are provided (ensuring the exact order). The MISSING argument denotes the indicator for missing values. The USEV argument, short for usevariables, lists the variables used in the corresponding model. The CLUSTER argument defines the clustering variable (the multiple measurement occasions are nested in G3 child – NEWID). CLUSTER is a necessary argument for multilevel models. Our multilevel model has two levels – level 2 represents the children, and level 1 represents the repeated measures. The repeated measures are nested in child. In a multilevel model, some variables vary within cluster (i.e., at level 1), and others vary between cluster (i.e., at level 2). For now, we include only a within cluster variable – the age of the child. Therefore, the WITHIN argument indicates which variables are within the clustering unit (G3AGE15 – each child has three years of data indexed by the age of the child). Later, we will also define variables that are between clustering unit and will add a “BETWEEN =” argument (e.g., child sex is a between level variable).
The ANALYSIS section defines the type of analysis used. A maximum likelihood (ML) estimator is requested (ESTIMATOR = ML), and TYPE indicates that our data are structured in a multilevel structure (twolevel – as defined by the cluster variable listed in the VARIABLE section) and that random effects (in our case a random intercept and slope) are included.
The MODEL section defines the desired model. For a two-level model, there must be a level 1 (i.e., within cluster) model, see %within% below, and a level 2 (i.e., between cluster) model, see %between% below. The within cluster model defines the variables and effects that vary within cluster, and the between cluster model defines variables and effects that vary between clusters. In the within cluster model, a random intercept is implicit, but random slopes are denoted by the bar notation, for example, S | G3DEP on G3AGE15 indicates a random slope (defined as the rate of change, and referred to as S). The between cluster model indicates that the intercept (G3DEP) and slope (S) are both allowed to vary across children. Further, G3DEP with S indicates that the random intercept and slope covary.
Further details and examples for fitting two level models in Mplus can be found here: https://www.statmodel.com/usersguide/chapter9.shtml.
The R code below shows how to run the same example by calling Mplus remotely from within R using the MplusAutomation package.
# the following lines define the information to be sent from R to Mplus:
setup.mplus.demo_cont.1 <- mplusObject(
VARIABLE =
"USEV = G3DEP G3AGE15;
CLUSTER = NEWID;
WITHIN = G3AGE15;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
S | G3DEP on G3AGE15;
%between%
G3DEP;
S;
G3DEP with S;",
rdata = d1)
# the following command lines use MplusAutomation
# to fit the Mplus model and read the results back into R
model.mplus.demo_cont.1 <- mplusModeler(
setup.mplus.demo_cont.1, modelout = "mplus.demo_cont.1.inp", run = TRUE)
##
## Running model: mplus.demo_cont.1.inp
## System command: cd "." && "mplus" "mplus.demo_cont.1.inp"
## Reading model: mplus.demo_cont.1.out
output.mplus.demo_cont.1 <- readModels("mplus.demo_cont.1.out")
## Reading model: mplus.demo_cont.1.out
output.mplus.demo_cont.1$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 Residual.Variances G3DEP 0.133 0.012 11.068 0.000 Within
## 2 G3DEP.WITH S 0.014 0.011 1.283 0.200 Between
## 3 Means G3DEP 1.104 0.035 31.334 0.000 Between
## 4 Means S 0.036 0.018 1.984 0.047 Between
## 5 Variances G3DEP 0.290 0.029 9.937 0.000 Between
## 6 Variances S 0.020 0.010 2.020 0.043 Between
The following R code runs the example growth model with predictors in R, again using the lme function from the nlme package. Notice that covariates are separated by + and the interactions between them are represented as products with *.
demo_cont.2 <- lme(G3DEP ~ 1 + G3AGE15 + G3MALE + G3MALE*G3AGE15 + G3EVENT,
random = ~ 1 + G3AGE15 | NEWID, data = d1, method = "ML")
summary(demo_cont.2)
## Linear mixed-effects model fit by maximum likelihood
## Data: d1
## AIC BIC logLik
## 1194.885 1236.842 -588.4425
##
## Random effects:
## Formula: ~1 + G3AGE15 | NEWID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.4607899 (Intr)
## G3AGE15 0.1173843 0.145
## Residual 0.3724890
##
## Fixed effects: G3DEP ~ 1 + G3AGE15 + G3MALE + G3MALE * G3AGE15 + G3EVENT
## Value Std.Error DF t-value p-value
## (Intercept) 1.0714291 0.05527784 506 19.382614 0.0000
## G3AGE15 0.0537044 0.02525642 506 2.126367 0.0340
## G3MALE -0.3668816 0.06223915 271 -5.894707 0.0000
## G3EVENT 0.0476098 0.00752327 506 6.328342 0.0000
## G3AGE15:G3MALE -0.0393223 0.03618959 506 -1.086563 0.2777
## Correlation:
## (Intr) G3AGE15 G3MALE G3EVEN
## G3AGE15 0.077
## G3MALE -0.557 -0.046
## G3EVENT -0.613 -0.042 0.005
## G3AGE15:G3MALE -0.056 -0.698 0.067 0.032
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.014334757 -0.525721822 0.009252965 0.509876944 3.417510051
##
## Number of Observations: 782
## Number of Groups: 273
VarCorr(demo_cont.2)
## NEWID = pdLogChol(1 + G3AGE15)
## Variance StdDev Corr
## (Intercept) 0.21232732 0.4607899 (Intr)
## G3AGE15 0.01377907 0.1173843 0.145
## Residual 0.13874805 0.3724890
The R code below shows how to run the same example by calling Mplus remotely from within R using the MplusAutomation package. Notice that G3MALE is a between cluster variable. In the VARIABLE section, we indicate that G3MALE is a between cluster variable. Notice also that G3EVENT is a time varying covariate, it varies within cluster, therefore we note that it is a within cluster variable in the VARIABLE section.
setup.mplus.demo_cont.2 <- mplusObject(
VARIABLE =
"USEV = G3DEP G3AGE15 G3MALE G3EVENT;
CLUSTER = NEWID;
WITHIN = G3AGE15 G3EVENT;
BETWEEN = G3MALE;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
S | G3DEP on G3AGE15;
G3DEP on G3EVENT;
%between%
G3DEP;
S;
G3DEP with S;
G3DEP S on G3MALE;",
rdata = d1)
model.mplus.demo_cont.2 <- mplusModeler(
setup.mplus.demo_cont.2, modelout = "mplus.demo_cont.2.inp", run = TRUE)
##
## Running model: mplus.demo_cont.2.inp
## System command: cd "." && "mplus" "mplus.demo_cont.2.inp"
## Reading model: mplus.demo_cont.2.out
output.mplus.demo_cont.2 <- readModels("mplus.demo_cont.2.out")
## Reading model: mplus.demo_cont.2.out
output.mplus.demo_cont.2$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 G3DEP.ON G3EVENT 0.048 0.008 6.097 0.000 Within
## 2 Residual.Variances G3DEP 0.139 0.013 10.932 0.000 Within
## 3 S.ON G3MALE -0.039 0.036 -1.090 0.276 Between
## 4 G3DEP.ON G3MALE -0.367 0.062 -5.914 0.000 Between
## 5 G3DEP.WITH S 0.008 0.009 0.849 0.396 Between
## 6 Intercepts G3DEP 1.071 0.056 19.147 0.000 Between
## 7 Intercepts S 0.054 0.025 2.133 0.033 Between
## 8 Residual.Variances G3DEP 0.212 0.024 9.014 0.000 Between
## 9 Residual.Variances S 0.014 0.010 1.395 0.163 Between
We utilize the GLMMadaptive package’s (https://cran.r-project.org/web/packages/GLMMadaptive/GLMMadaptive.pdf) mixed_model command to fit a binary growth model in R. The syntax is similar to the lme function used above for the continuous outcome; however, the arguments for the syntax are slightly different. In this example, the outcome is G3SUBS, a binary indicator of the child’s self-reported substance use at each interview: 1 = use alcohol or cannabis at least once per month, 0 = do not use alcohol or cannabis or use less frequently than once per month.
In this growth model for a binary outcome, no random slope is included (see description in manuscript). We also specify family = binomial() to indicate that we desire a logit link to fit the model (i.e., a logistic regression). A multilevel model for a binary outcome produces unit specific effects, as described in the manuscript, we may also desire population-averaged effects. These can be obtained with the marginal_coefs command in GLMMadaptive.
demo_cat.1 <- mixed_model(fixed = G3SUBS ~ 1 + G3AGE15,
random = ~ 1 | NEWID,
data = d1,
family = binomial())
summary(demo_cat.1)
##
## Call:
## mixed_model(fixed = G3SUBS ~ 1 + G3AGE15, random = ~1 | NEWID,
## data = d1, family = binomial())
##
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273
##
## Model:
## family: binomial
## link: logit
##
## Fit statistics:
## log.Lik AIC BIC
## -163.1173 332.2346 343.063
##
## Random effects covariance matrix:
## StdDev
## (Intercept) 2.867195
##
## Fixed effects:
## Estimate Std.Err z-value p-value
## (Intercept) -5.1523 0.7581 -6.7967 < 1e-04
## G3AGE15 1.1764 0.2903 4.0518 < 1e-04
##
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 11
##
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE
marginal_coefs(demo_cat.1, std_errors = TRUE) # requests marginal coefficients
## Estimate Std.Err z-value p-value
## (Intercept) -2.7081 0.2198 -12.3217 < 1e-04
## G3AGE15 0.7185 0.1536 4.6786 < 1e-04
We can fit the same model in Mplus, below is the syntax to fit the Mplus model via R. Because the outcome, G3SUBS, is a categorical variable, we must define it as such in the VARIABLE section (i.e., CATEGORICAL = G3SUBS). S@0 excludes the random slope (that is it constrains the random slope variance to be zero), so that we estimate only a random intercept.
Also note that for a categorical outcome, Mplus reports the threshold rather than the intercept. To obtain the intercept, the threshold is multiplied by -1.
setup.mplus.demo_cat.1 <- mplusObject(
VARIABLE =
"USEV = G3SUBS G3AGE15;
CLUSTER = NEWID;
WITHIN = G3AGE15;
CATEGORICAL = G3SUBS;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
S | G3SUBS on G3AGE15;
%between%
G3SUBS;
S@0;",
rdata = d1)
model.mplus.demo_cat.1 <- mplusModeler(
setup.mplus.demo_cat.1, modelout = "mplus.demo_cat.1.inp", run = TRUE)
##
## Running model: mplus.demo_cat.1.inp
## System command: cd "." && "mplus" "mplus.demo_cat.1.inp"
## Reading model: mplus.demo_cat.1.out
output.mplus.demo_cat.1 <- readModels("mplus.demo_cat.1.out")
## Reading model: mplus.demo_cat.1.out
output.mplus.demo_cat.1$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 Means S 1.229 0.294 4.175 0.000 Between
## 2 Thresholds G3SUBS$1 5.541 0.852 6.502 0.000 Between
## 3 Variances G3SUBS 9.985 4.139 2.413 0.016 Between
## 4 Variances S 0.000 0.000 999.000 999.000 Between
The code below fits the growth model with predictors in R, using the mixed_model function of the the GLMMadaptive package.
demo_cat.2 <- mixed_model(fixed = G3SUBS ~ 1 + G3AGE15 + G3MALE +
G3MALE*G3AGE15 + G3EVENT,
random = ~ 1 | NEWID, data = d1,
family = binomial())
summary(demo_cat.2)
##
## Call:
## mixed_model(fixed = G3SUBS ~ 1 + G3AGE15 + G3MALE + G3MALE *
## G3AGE15 + G3EVENT, random = ~1 | NEWID, data = d1, family = binomial())
##
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273
##
## Model:
## family: binomial
## link: logit
##
## Fit statistics:
## log.Lik AIC BIC
## -152.8909 317.7817 339.4386
##
## Random effects covariance matrix:
## StdDev
## (Intercept) 2.662432
##
## Fixed effects:
## Estimate Std.Err z-value p-value
## (Intercept) -6.8456 1.1398 -6.0061 < 1e-04
## G3AGE15 0.8950 0.3606 2.4820 0.013065
## G3MALE -0.4880 0.6396 -0.7629 0.445496
## G3EVENT 0.3929 0.0999 3.9332 < 1e-04
## G3AGE15:G3MALE 0.7094 0.5391 1.3158 0.188243
##
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 11
##
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE
marginal_coefs(demo_cat.2, std_errors = TRUE)
## Estimate Std.Err z-value p-value
## (Intercept) -4.0119 0.5260 -7.6277 < 1e-04
## G3AGE15 0.5865 0.2376 2.4683 0.013576
## G3MALE -0.3610 0.4159 -0.8679 0.385449
## G3EVENT 0.2585 0.0657 3.9358 < 1e-04
## G3AGE15:G3MALE 0.5061 0.3555 1.4236 0.154548
This code fits the same model with Mplus code run remotely from R.
setup.mplus.demo_cat.2 <- mplusObject(
VARIABLE =
"USEV = G3SUBS G3AGE15 G3MALE G3EVENT;
CLUSTER = NEWID;
WITHIN = G3AGE15 G3EVENT;
BETWEEN = G3MALE;
CATEGORICAL = G3SUBS;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
S | G3SUBS on G3AGE15;
G3SUBS on G3EVENT;
%between%
G3SUBS;
S@0;
G3SUBS S on G3MALE;",
rdata = d1)
model.mplus.demo_cat.2 <- mplusModeler(
setup.mplus.demo_cat.2, modelout = "mplus.demo_cat.2.inp", run = TRUE)
##
## Running model: mplus.demo_cat.2.inp
## System command: cd "." && "mplus" "mplus.demo_cat.2.inp"
## Reading model: mplus.demo_cat.2.out
output.mplus.demo_cat.2 <- readModels("mplus.demo_cat.2.out")
## Reading model: mplus.demo_cat.2.out
output.mplus.demo_cat.2$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 G3SUBS.ON G3EVENT 0.395 0.098 4.050 0.000 Within
## 2 S.ON G3MALE 0.724 0.389 1.860 0.063 Between
## 3 G3SUBS.ON G3MALE -0.491 0.626 -0.784 0.433 Between
## 4 Intercepts S 0.901 0.328 2.742 0.006 Between
## 5 Thresholds G3SUBS$1 6.946 1.054 6.593 0.000 Between
## 6 Residual.Variances G3SUBS 7.437 3.001 2.478 0.013 Between
## 7 Residual.Variances S 0.000 0.000 999.000 999.000 Between
As described in the manuscript, G3WARM is left skewed. We elect to normalize the distribution of the variable by exponentiating all G3WARM scores. The mutate function in the dplyr package (part of the tidyverse) can be used to define new variables. Here we create a variable call G3WRMEXP. which exponentiates each observed G3WARM score. In the second line of code, we create a new variable called G3WRM2PRT, which will be used in the two-part model fit in GLMMadaptive. If the child had no contact with their father, a score of 0 will be recorded for G3WRM2PRT (i.e., if G3CON equals 0, record 0 for G3WRM2PRT), otherwise, we record the exponentiated warmth score (G3WRMEXP). Thus, for the new variable called G3WRM2PRT, all children without contact with their father have a score of 0, and all children with contact with their father have the G3WARM score observed (but exponentiated to deal with the skew) – that is, a positive score greater than 0. This is the expected format of the variable used as a two-part outcome with GLMMadaptive.
d2 <- d1 %>%
mutate(G3WRMEXP = exp(G3WARM),
G3WRM2PRT = if_else(G3CON == 0, 0, G3WRMEXP))
GLMMadaptive has several built in functions for modeling the continuous portion of the two-part growth model. This includes, for example, hurdle.poisson for situations in which the continuous part is a count or hurdle.lognormal for situations in which the continuous part is a log-normal distribution. In our example, the warmth scores are left skewed. To normalize our continuous distribution of warmth scores, we choose to exponentiate the warmth scores and then specify a normal distribution for the continuous part when fitting the two-part growth model. Therefore, we create a custom function called hurdle_normal() to model our semi-continuous data. This custom function is defined in the code chunk below and then can be utilized when fitting the GLMMadaptive model in Script Set #3. If a user encounters a model in which the continuous part can be transformed to a normal distribution, then executing the code chunk below will allow the user to also make use of our hurdle_normal custom function we provide here. Otherwise, the built in functions of GLMMadaptive can be used to define the most appropriate distribution given one’s data.
This R code describes the different parts of the log-likelihood function defining the hurdle normal distribution. The resulting hurdle_normal object will later be used by the mixed_model function to solve for the model parameter estimates using the data. This function should be used if the distribution of the continuous part of the two-part variable is normally distributed (initially, or after applying a transformation prior to fitting the model).
is.zero <- function(x) sapply(x, function(u) isTRUE(all.equal(u, 0)))
# The "hurdle normal" family
hurdle_normal <- function() {
link <- make.link("identity")
log.f <- function(y, eta, mu, phi, zeta) {
s <- exp(phi)
i <- !is.zero(y)
m <- as.matrix(eta)
z <- as.matrix(zeta)
f <- m
f[i, ] <- plogis(z[i, ], lower.tail=FALSE, log.p=TRUE) +
dnorm((y[i] - m[i, ])/s, log=TRUE) - log(s)
f[!i, ] <- plogis(z[!i, ], log.p=TRUE)
attr(f, "mu_y") <- m
f
}
S.eta <- function(y, mu, phi, zeta) {
s <- exp(phi)
i <- !is.zero(y)
m <- as.matrix(mu)
S <- m
S[!i, ] <- 0
S[i, ] <- (y[i] - m[i, ])/s^2
S
}
S.z <- function(y, mu, phi, zeta) {
i <- !is.zero(y)
p <- plogis(as.matrix(zeta))
S <- 1 - p
S[i, ] <- -p[i, ]
S
}
S.phis <- function(y, mu, phi, zeta) {
s <- exp(phi)
i <- !is.zero(y)
m <- as.matrix(mu)
S <- m
S[!i, ] <- 0
S[i, ] <- (y[i] - m[i, ])^2/s^2 - 1
S
}
structure(
list(family="User hurdle normal family",
link=link$name,
linkfun=link$linkfun,
linkinv=link$linkinv,
log_dens=log.f,
score_eta_fun=S.eta,
score_eta_zi_fun=S.z,
score_phis_fun=S.phis
),
class="family")
}
The first two lines of the mixed_model function define the fixed and random effects for the continuous growth model, while the third and fourth lines (i.e., starting with zi_) define the fixed and random effects for the binary growth model.
Adaptive quadrature is a numerical integration method, and is used to fit computationally demanding models. The argument nAGQ below defines 15 points of quadrature to be used. A user can refit the model, varying points of quadrature depending on the complexity of the fitted model.
n_phis defines the number of extra parameters that the distribution of the outcome needs, in the case of a two-part growth model, there is one extra parameter.
out_mod.1 <- mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15,
random = ~ 1 + G3AGE15 | NEWID,
zi_fixed = ~ 1 + G3AGE15,
zi_random = ~ 1 | NEWID,
data = d2, family = hurdle_normal(), n_phis = 1, nAGQ = 15)
summary(out_mod.1)
##
## Call:
## mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15, random = ~1 + G3AGE15 |
## NEWID, data = d2, family = hurdle_normal(), zi_fixed = ~1 +
## G3AGE15, zi_random = ~1 | NEWID, n_phis = 1, nAGQ = 15)
##
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273
##
## Model:
## family: User hurdle normal family
## link: identity
##
## Fit statistics:
## log.Lik AIC BIC
## -2651.767 5325.534 5365.238
##
## Random effects covariance matrix:
## StdDev Corr
## (Intercept) 11.5665 (Intr) G3AGE1
## G3AGE15 0.9294 -0.2758
## zi_(Intercept) 4.0524 -0.6947 0.2245
##
## Fixed effects:
## Estimate Std.Err z-value p-value
## (Intercept) 27.9745 0.851 32.8716 < 1e-04
## G3AGE15 -0.9827 0.492 -1.9975 0.045773
##
## Zero-part coefficients:
## Estimate Std.Err z-value p-value
## (Intercept) -3.9207 0.5516 -7.1075 < 1e-04
## G3AGE15 0.2238 0.1819 1.2308 0.21839
##
## phi parameters:
## Estimate Std.Err
## phi_1 1.9291 0.0403
##
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 15
##
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE
# output residual standard deviation (to match with Mplus model)
exp(out_mod.1$phis)
## phi_1
## 6.883637
The code below executes Mplus through R to fit the same two-part growth model. Notice that with Mplus, the two separate variables (G3CON and G3WRMEXP) are used, negating the need to create one variable (i.e., G3WRM2PRT that was needed for GLMMdaptive).
setup.mplus.out_mod.1 <- mplusObject(
VARIABLE =
"USEV = G3CON G3AGE15 G3WRMEXP;
CLUSTER = NEWID;
WITHIN = G3AGE15;
CATEGORICAL = G3CON;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
DEFINE =
"G3WRMEXP = exp(G3WARM);",
MODEL =
"%within%
SBIN | G3CON on G3AGE15;
SCONT | G3WRMEXP on G3AGE15;
%between%
G3CON;
G3WRMEXP;
SCONT;
SBIN@0;
G3CON with G3WRMEXP SCONT;
G3WRMEXP with SCONT;",
rdata = d2)
model.mplus.out_mod.1 <- mplusModeler(
setup.mplus.out_mod.1, modelout = "mplus.out_mod.1.inp", run = TRUE)
##
## Running model: mplus.out_mod.1.inp
## System command: cd "." && "mplus" "mplus.out_mod.1.inp"
## Reading model: mplus.out_mod.1.out
output.mplus.out_mod.1 <- readModels("mplus.out_mod.1.out")
## Reading model: mplus.out_mod.1.out
output.mplus.out_mod.1$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 Residual.Variances G3WRMEXP 48.044 4.716 10.187 0.000 Within
## 2 G3CON.WITH SCONT -1.215 4.560 -0.266 0.790 Between
## 3 G3WRMEXP.WITH SCONT -3.478 5.825 -0.597 0.551 Between
## 4 G3CON.WITH G3WRMEXP 32.545 7.231 4.501 0.000 Between
## 5 Means G3WRMEXP 27.964 0.847 33.019 0.000 Between
## 6 Means SBIN -0.090 0.163 -0.549 0.583 Between
## 7 Means SCONT -0.884 0.524 -1.687 0.092 Between
## 8 Thresholds G3CON$1 -3.878 0.521 -7.444 0.000 Between
## 9 Variances G3CON 15.908 4.517 3.522 0.000 Between
## 10 Variances G3WRMEXP 134.464 16.616 8.092 0.000 Between
## 11 Variances SBIN 0.000 0.000 999.000 999.000 Between
## 12 Variances SCONT 0.190 3.527 0.054 0.957 Between
In the code below, we add predictors to our two-part growth model. Here, we add the same predictors to both parts, but that is not necessary. One can build a model with different predictors if desired.
out_mod.2 <- mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15 + INTACT +
G3AGE15*INTACT + G3EVENT,
random = ~ 1 + G3AGE15 | NEWID,
zi_fixed = ~ 1 + G3AGE15 + INTACT +
G3AGE15*INTACT + G3EVENT,
zi_random = ~ 1 | NEWID,
data = d2, family = hurdle_normal(), n_phis = 1, nAGQ = 15)
summary(out_mod.2)
##
## Call:
## mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15 + INTACT + G3AGE15 *
## INTACT + G3EVENT, random = ~1 + G3AGE15 | NEWID, data = d2,
## family = hurdle_normal(), zi_fixed = ~1 + G3AGE15 + INTACT +
## G3AGE15 * INTACT + G3EVENT, zi_random = ~1 | NEWID, n_phis = 1,
## nAGQ = 15)
##
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273
##
## Model:
## family: User hurdle normal family
## link: identity
##
## Fit statistics:
## log.Lik AIC BIC
## -2632.189 5298.377 5359.738
##
## Random effects covariance matrix:
## StdDev Corr
## (Intercept) 10.9211 (Intr) G3AGE1
## G3AGE15 0.8665 -0.3205
## zi_(Intercept) 3.8081 -0.6792 0.3109
##
## Fixed effects:
## Estimate Std.Err z-value p-value
## (Intercept) 29.1141 1.2806 22.7339 < 1e-04
## G3AGE15 -0.8820 0.5990 -1.4724 0.14091215
## INTACT 4.7396 1.6248 2.9171 0.00353307
## G3EVENT -0.5438 0.1628 -3.3399 0.00083812
## G3AGE15:INTACT -0.1726 0.7877 -0.2191 0.82656600
##
## Zero-part coefficients:
## Estimate Std.Err z-value p-value
## (Intercept) -2.5789 0.5866 -4.3963 < 1e-04
## G3AGE15 0.2412 0.1924 1.2538 0.20992
## INTACT -3.8844 0.9864 -3.9381 < 1e-04
## G3EVENT -0.0794 0.0753 -1.0540 0.29188
## G3AGE15:INTACT -0.0282 0.6325 -0.0445 0.96449
##
## phi parameters:
## Estimate Std.Err
## phi_1 1.9315 0.0407
##
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 15
##
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE
# output residual standard deviation
exp(out_mod.2$phis)
## phi_1
## 6.89976
The code below fits the same model in Mplus through R using the MplusAutomation package.
setup.mplus.out_mod.2 <- mplusObject(
VARIABLE =
"USEV = G3CON G3AGE15 INTACT G3EVENT G3WRMEXP;
CLUSTER = NEWID;
WITHIN = G3AGE15 G3EVENT;
BETWEEN = INTACT;
CATEGORICAL = G3CON;",
DEFINE =
"G3WRMEXP = exp(G3WARM);",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
SBIN | G3CON on G3AGE15;
SCONT | G3WRMEXP on G3AGE15;
G3CON G3WRMEXP on G3EVENT;
%between%
G3CON;
G3WRMEXP;
SCONT;
SBIN@0;
G3CON with G3WRMEXP SCONT;
G3WRMEXP with SCONT;
G3CON SBIN G3WRMEXP SCONT on INTACT;",
rdata = d2)
model.mplus.out_mod.2 <- mplusModeler(
setup.mplus.out_mod.2, modelout = "mplus.out_mod.2.inp", run = TRUE)
##
## Running model: mplus.out_mod.2.inp
## System command: cd "." && "mplus" "mplus.out_mod.2.inp"
## Reading model: mplus.out_mod.2.out
output.mplus.out_mod.2 <- readModels("mplus.out_mod.2.out")
## Reading model: mplus.out_mod.2.out
output.mplus.out_mod.2$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 G3CON.ON G3EVENT 0.077 0.074 1.038 0.299 Within
## 2 G3WRMEXP.ON G3EVENT -0.542 0.163 -3.325 0.001 Within
## 3 Residual.Variances G3WRMEXP 48.177 4.769 10.102 0.000 Within
## 4 SBIN.ON INTACT 0.013 0.495 0.027 0.978 Between
## 5 SCONT.ON INTACT -0.213 0.802 -0.266 0.791 Between
## 6 G3CON.ON INTACT 3.941 0.970 4.062 0.000 Between
## 7 G3WRMEXP.ON INTACT 4.857 1.628 2.983 0.003 Between
## 8 G3CON.WITH SCONT -1.733 4.318 -0.401 0.688 Between
## 9 G3WRMEXP.WITH SCONT -3.567 5.266 -0.677 0.498 Between
## 10 G3CON.WITH G3WRMEXP 28.185 6.710 4.200 0.000 Between
## 11 Intercepts G3WRMEXP 29.038 1.282 22.652 0.000 Between
## 12 Intercepts SBIN -0.131 0.176 -0.745 0.456 Between
## 13 Intercepts SCONT -0.765 0.641 -1.195 0.232 Between
## 14 Thresholds G3CON$1 -2.532 0.570 -4.438 0.000 Between
## 15 Residual.Variances G3CON 14.046 4.029 3.486 0.000 Between
## 16 Residual.Variances G3WRMEXP 119.699 15.259 7.845 0.000 Between
## 17 Residual.Variances SBIN 0.000 0.000 999.000 999.000 Between
## 18 Residual.Variances SCONT 0.216 3.542 0.061 0.951 Between
The code below fits a traditional growth model for G3DEP, child depressive symptoms, but we consider a two-part predictor. First, we create a new variable, G3WARMC, which centers G3WARM at a score of 2 if the child had contact with their father, otherwise, a 0 is recorded for G3WARMC. Second, we create a new variable called G3WARMX which is the product of our newly created, centered warmth variable (G3WARMC multiplied with G3CON – a binary indicator of whether or not the child had contact with their father).
When we fit the lme model (i.e, multilevel model for a continuous variable) we regress G3DEP on the intercept and time variable (G3AGE15) to specify growth, and also the time-varying predictors of G3CON (binary indicator of contact versus no contact) and G3WARMX created earlier.
d3 <- d1 %>%
mutate(G3WARMC = if_else(G3CON == 1, G3WARM - 2, 0),
G3WARMX = G3WARMC*G3CON)
pred_mod.1 <- lme(G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX,
random = ~ 1 + G3AGE15 | NEWID,
data = d3, method = "ML")
summary(pred_mod.1)
## Linear mixed-effects model fit by maximum likelihood
## Data: d3
## AIC BIC logLik
## 1226.406 1263.701 -605.203
##
## Random effects:
## Formula: ~1 + G3AGE15 | NEWID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.4999530 (Intr)
## G3AGE15 0.1377766 0.133
## Residual 0.3656518
##
## Fixed effects: G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX
## Value Std.Error DF t-value p-value
## (Intercept) 1.0618985 0.05823829 506 18.233683 0.0000
## G3AGE15 0.0301774 0.01843092 506 1.637322 0.1022
## G3CON 0.4208940 0.08063701 506 5.219613 0.0000
## G3WARMX -0.2821255 0.05010854 506 -5.630287 0.0000
## Correlation:
## (Intr) G3AGE1 G3CON
## G3AGE15 -0.005
## G3CON -0.491 -0.025
## G3WARMX -0.133 0.080 -0.695
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.96453959 -0.50600825 -0.03187427 0.50402709 3.18908409
##
## Number of Observations: 782
## Number of Groups: 273
VarCorr(pred_mod.1)
## NEWID = pdLogChol(1 + G3AGE15)
## Variance StdDev Corr
## (Intercept) 0.2499530 0.4999530 (Intr)
## G3AGE15 0.0189824 0.1377766 0.133
## Residual 0.1337012 0.3656518
Because the effect of contact (G3CON) depends on the centering point of G3WARMX (see manuscript for further description), it may be of interest to consider the effect of contact at different centering points. Here we fit the same model as above, but instead centering warmth at a score of 4.
d3 <- d1 %>%
mutate(G3WARMC = ifelse(G3CON == 1, G3WARM - 4, 0),
G3WARMX = G3WARMC*G3CON)
pred_mod.2 <- lme(G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX,
random = ~ 1 + G3AGE15 | NEWID,
data = d3, method = "ML")
summary(pred_mod.2)
## Linear mixed-effects model fit by maximum likelihood
## Data: d3
## AIC BIC logLik
## 1226.406 1263.701 -605.203
##
## Random effects:
## Formula: ~1 + G3AGE15 | NEWID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.4999530 (Intr)
## G3AGE15 0.1377766 0.133
## Residual 0.3656518
##
## Fixed effects: G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX
## Value Std.Error DF t-value p-value
## (Intercept) 1.0618985 0.05823829 506 18.233683 0.0000
## G3AGE15 0.0301774 0.01843092 506 1.637322 0.1022
## G3CON -0.1433569 0.07291013 506 -1.966214 0.0498
## G3WARMX -0.2821255 0.05010854 506 -5.630287 0.0000
## Correlation:
## (Intr) G3AGE1 G3CON
## G3AGE15 -0.005
## G3CON -0.726 0.082
## G3WARMX -0.133 0.080 0.606
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.96453959 -0.50600825 -0.03187427 0.50402709 3.18908409
##
## Number of Observations: 782
## Number of Groups: 273
VarCorr(pred_mod.2)
## NEWID = pdLogChol(1 + G3AGE15)
## Variance StdDev Corr
## (Intercept) 0.2499530 0.4999530 (Intr)
## G3AGE15 0.0189824 0.1377766 0.133
## Residual 0.1337012 0.3656518
These last two code sections fit the same models as above (using lme to consider two-part predictors), except here we use Mplus to fit the models through R
setup.mplus.pred_mod.1 <- mplusObject(
VARIABLE =
"USEV = G3DEP G3AGE15 G3CON G3WARMX;
CLUSTER = NEWID;
WITHIN = G3AGE15 G3CON G3WARMX;",
DEFINE =
"G3WARMC = G3WARM - 2;
IF G3CON == 0 THEN G3WARMC = 0;
G3WARMX = G3WARMC*G3CON;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
S | G3DEP on G3AGE15;
G3DEP on G3CON G3WARMX;
%between%
G3DEP;
S;
G3DEP with S;",
rdata = d1)
model.mplus.pred_mod.1 <- mplusModeler(
setup.mplus.pred_mod.1, modelout = "mplus.pred_mod.1.inp", run = TRUE)
##
## Running model: mplus.pred_mod.1.inp
## System command: cd "." && "mplus" "mplus.pred_mod.1.inp"
## Reading model: mplus.pred_mod.1.out
output.mplus.pred_mod.1 <- readModels("mplus.pred_mod.1.out")
## Reading model: mplus.pred_mod.1.out
output.mplus.pred_mod.1$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 G3DEP.ON G3CON 0.421 0.081 5.172 0.000 Within
## 2 G3DEP.ON G3WARMX -0.282 0.052 -5.433 0.000 Within
## 3 Residual.Variances G3DEP 0.134 0.012 11.024 0.000 Within
## 4 G3DEP.WITH S 0.009 0.010 0.913 0.361 Between
## 5 Means G3DEP 1.062 0.058 18.191 0.000 Between
## 6 Means S 0.030 0.018 1.641 0.101 Between
## 7 Variances G3DEP 0.250 0.027 9.383 0.000 Between
## 8 Variances S 0.019 0.010 1.920 0.055 Between
setup.mplus.pred_mod.2 <- mplusObject(
VARIABLE =
"USEV = G3DEP G3AGE15 G3CON G3WARMX;
CLUSTER = NEWID;
WITHIN = G3AGE15 G3CON G3WARMX;",
DEFINE =
"G3WARMC = G3WARM - 4;
IF G3CON == 0 THEN G3WARMC = 0;
G3WARMX = G3WARMC*G3CON;",
ANALYSIS =
"ESTIMATOR = ML;
TYPE = twolevel random;",
MODEL =
"%within%
S | G3DEP on G3AGE15;
G3DEP on G3CON G3WARMX;
%between%
G3DEP;
S;
G3DEP with S;",
rdata = d1)
model.mplus.pred_mod.2 <- mplusModeler(
setup.mplus.pred_mod.2, modelout = "mplus.pred_mod.2.inp", run = TRUE)
##
## Running model: mplus.pred_mod.2.inp
## System command: cd "." && "mplus" "mplus.pred_mod.2.inp"
## Reading model: mplus.pred_mod.2.out
output.mplus.pred_mod.2 <- readModels("mplus.pred_mod.2.out")
## Reading model: mplus.pred_mod.2.out
output.mplus.pred_mod.2$parameters
## $unstandardized
## paramHeader param est se est_se pval BetweenWithin
## 1 G3DEP.ON G3CON -0.143 0.075 -1.923 0.055 Within
## 2 G3DEP.ON G3WARMX -0.282 0.052 -5.433 0.000 Within
## 3 Residual.Variances G3DEP 0.134 0.012 11.024 0.000 Within
## 4 G3DEP.WITH S 0.009 0.010 0.913 0.361 Between
## 5 Means G3DEP 1.062 0.058 18.191 0.000 Between
## 6 Means S 0.030 0.018 1.641 0.101 Between
## 7 Variances G3DEP 0.250 0.027 9.383 0.000 Between
## 8 Variances S 0.019 0.010 1.920 0.055 Between