Load packages
Import data
Format data
Growth model primer
- Script Set #1: Growth model for a continuous outcome (G3 depressive symptoms)
  - Growth model without predictors
    - Example in R
    - Example in Mplus
  - Growth model with predictors
    - Example in R
    - Example in Mplus
- Script Set #2: Growth model for a binary outcome (G3 substance use)
  - Growth model without predictors
    - Example in R
    - Example in Mplus
  - Growth model with predictors
    - Example in R
    - Example in Mplus
Two-part outcome
Two-part predictor
- Script Set #4: Fit two-part predictor model
  - Example in R
    - Center G3WARM at a score of 2
    - Center G3WARM at a score of 4
  - Example in Mplus
    - Center G3WARM at a score of 2
    - Center G3WARM at a score of 4

Load packages

These packages are needed to perform the analyses in this notebook, a brief description of what each does is listed below.

library(nlme) # for fitting traditional growth models
library(GLMMadaptive) # for fitting two part outcomes
library(MplusAutomation) # for using Mplus through R (Mplus must be installed on machine)
library(psych) # for tables of descriptive statistics
library(knitr) # for creating the pdf output of the R notebook and formatting tables
library(haven) # for reading in dataset from other programs
library(tidyverse) # for data wrangling

Import data

The haven package is useful for reading in a variety of datasets – here, our data is stored as a SAS file. We also change the type of our ID variable (NEWID) from numeric (the type as stored in SAS) to factor (an unordered character variable that uniquely identifies each child in the study). This can be accomplished using the mutate function in dplyr (part of the tidyverse), a function that allows new variables to be created or existing variables to be changed.

This datafile is in a long format, that is, each child (identified as NEWID) has multiple rows of data (identified as G3AGE) – one for each year of observation (3 in our example).

d <- read_sas("rdata.sas7bdat") %>% 
  mutate(NEWID = as.factor(NEWID))

Format data

Linear growth models are defined by an intercept (predicted outcome when time = 0) and a slope (rate of change in the outcome over time). For all of the growth models in this notebook, the intercept will be defined at age 15. Thus, it is necessary to center the time variable (G3AGE in our example) at age 15 by subtracting 15 from each score. We can use mutate to accomplish this task.

d1 <- d %>% 
  mutate(G3AGE15 = G3AGE - 15)  # center G3's age at 15 years

Growth model primer

Script Set #1: Growth model for a continuous outcome (G3 depressive symptoms)

Growth model without predictors

Example in R

The lme function is from the nlme package (https://cran.r-project.org/web/packages/nlme/nlme.pdf), which is a package for fitting multilevel models in R.

G3DEP is the outcome, it is regressed on a vector of 1’s (to define the intercept) and the time variable (G3AGE, which is centered at age 15 in our example). The intercept and slope are allowed to be random (vary across children). The random effects (intercept and slope) covary by default in the lme function. A maximum likelihood (ML) estimator is used.

The first line of the lme function defines the fixed effects part of the model, G3DEP ~ 1 + G3AGE15, which means that the dependent variable is regressed on an intercept and on the centered age variable. The second line defines the random effects part of the model, which means that there is a random intercept and a random slope for centered age.

demo_cont.1 <- lme(fixed = G3DEP ~ 1 + G3AGE15, 
                   random = ~ 1 + G3AGE15 | NEWID, 
                   data = d1, method = "ML")

summary(demo_cont.1) # request output

## Linear mixed-effects model fit by maximum likelihood
##  Data: d1 
##        AIC      BIC   logLik
##   1255.232 1283.203 -621.616
## 
## Random effects:
##  Formula: ~1 + G3AGE15 | NEWID
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.5388819 (Intr)
## G3AGE15     0.1419065 0.179 
## Residual    0.3640428       
## 
## Fixed effects: G3DEP ~ 1 + G3AGE15 
##                 Value  Std.Error  DF   t-value p-value
## (Intercept) 1.1040030 0.03527507 508 31.296978   0.000
## G3AGE15     0.0364751 0.01840105 508  1.982228   0.048
##  Correlation: 
##         (Intr)
## G3AGE15 0.092 
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.0298849 -0.4812134 -0.0147335  0.4864610  3.1120097 
## 
## Number of Observations: 782
## Number of Groups: 273

VarCorr(demo_cont.1) # request variance/covariance matrix for random effects

## NEWID = pdLogChol(1 + G3AGE15) 
##             Variance   StdDev    Corr  
## (Intercept) 0.29039366 0.5388819 (Intr)
## G3AGE15     0.02013747 0.1419065 0.179 
## Residual    0.13252718 0.3640428

Example in Mplus

All of the Mplus models can be fit directly in Mplus or in R using the MplusAutomation package (https://cran.r-project.org/web/packages/MplusAutomation/MplusAutomation.pdf). MplusAutomation finds Mplus on the local computer and executes the provided commands. The benefits of MplusAutomation are that the likelihood of all findings being reproducible is increased (since the entire R notebook can be executed and rendered at once) and more flexibility in collating and presenting the results is provided.

If one were not using MplusAutomation, the code to run the same program directly in Mplus would be:

DATA: FILE = “df.dat”;

VARIABLE: NAMES = G3AGE NEWID G3DEP G3AGE15;
MISSING = .;
USEV = G3DEP G3AGE15;
CLUSTER = NEWID;
WITHIN = G3AGE15;

ANALYSIS:
ESTIMATOR = ML;
TYPE = twolevel random;

MODEL:
%within%
S | G3DEP on G3AGE15;

%between%
G3DEP;
S;
G3DEP with S;

Below, we explain each part of the Mplus code in detail, to orient the reader to the key components of the syntax for fitting growth models as multilevel models in Mplus.

First, in the DATA section, one provides the name of the data set to be read into Mplus.

In the VARIABLE section, first the NAMES of the variables in the data set are provided (ensuring the exact order). The MISSING argument denotes the indicator for missing values. The USEV argument, short for usevariables, lists the variables used in the corresponding model. The CLUSTER argument defines the clustering variable (the multiple measurement occasions are nested in G3 child – NEWID). CLUSTER is a necessary argument for multilevel models. Our multilevel model has two levels – level 2 represents the children, and level 1 represents the repeated measures. The repeated measures are nested in child. In a multilevel model, some variables vary within cluster (i.e., at level 1), and others vary between cluster (i.e., at level 2). For now, we include only a within cluster variable – the age of the child. Therefore, the WITHIN argument indicates which variables are within the clustering unit (G3AGE15 – each child has three years of data indexed by the age of the child). Later, we will also define variables that are between clustering unit and will add a “BETWEEN =” argument (e.g., child sex is a between level variable).

The ANALYSIS section defines the type of analysis used. A maximum likelihood (ML) estimator is requested (ESTIMATOR = ML), and TYPE indicates that our data are structured in a multilevel structure (twolevel – as defined by the cluster variable listed in the VARIABLE section) and that random effects (in our case a random intercept and slope) are included.

The MODEL section defines the desired model. For a two-level model, there must be a level 1 (i.e., within cluster) model, see %within% below, and a level 2 (i.e., between cluster) model, see %between% below. The within cluster model defines the variables and effects that vary within cluster, and the between cluster model defines variables and effects that vary between clusters. In the within cluster model, a random intercept is implicit, but random slopes are denoted by the bar notation, for example, S | G3DEP on G3AGE15 indicates a random slope (defined as the rate of change, and referred to as S). The between cluster model indicates that the intercept (G3DEP) and slope (S) are both allowed to vary across children. Further, G3DEP with S indicates that the random intercept and slope covary.

Further details and examples for fitting two level models in Mplus can be found here: https://www.statmodel.com/usersguide/chapter9.shtml.

The R code below shows how to run the same example by calling Mplus remotely from within R using the MplusAutomation package.

# the following lines define the information to be sent from R to Mplus:

setup.mplus.demo_cont.1 <- mplusObject(
   VARIABLE = 
      "USEV = G3DEP G3AGE15;
       CLUSTER = NEWID;
       WITHIN = G3AGE15;",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       S | G3DEP on G3AGE15; 
                                 
       %between%
       G3DEP;
       S;
       G3DEP with S;",
                         
rdata = d1)

# the following command lines use MplusAutomation 
# to fit the Mplus model and read the results back into R
model.mplus.demo_cont.1 <- mplusModeler(
  setup.mplus.demo_cont.1, modelout = "mplus.demo_cont.1.inp", run = TRUE)

## 
## Running model: mplus.demo_cont.1.inp 
## System command: cd "." && "mplus" "mplus.demo_cont.1.inp" 
## Reading model:  mplus.demo_cont.1.out

output.mplus.demo_cont.1 <- readModels("mplus.demo_cont.1.out")

## Reading model:  mplus.demo_cont.1.out

output.mplus.demo_cont.1$parameters

## $unstandardized
##          paramHeader param   est    se est_se  pval BetweenWithin
## 1 Residual.Variances G3DEP 0.133 0.012 11.068 0.000        Within
## 2         G3DEP.WITH     S 0.014 0.011  1.283 0.200       Between
## 3              Means G3DEP 1.104 0.035 31.334 0.000       Between
## 4              Means     S 0.036 0.018  1.984 0.047       Between
## 5          Variances G3DEP 0.290 0.029  9.937 0.000       Between
## 6          Variances     S 0.020 0.010  2.020 0.043       Between

Growth model with predictors

Example in R

The following R code runs the example growth model with predictors in R, again using the lme function from the nlme package. Notice that covariates are separated by + and the interactions between them are represented as products with *.

demo_cont.2 <- lme(G3DEP ~ 1 + G3AGE15 + G3MALE + G3MALE*G3AGE15 + G3EVENT, 
                   random = ~ 1 + G3AGE15 | NEWID, data = d1, method = "ML")
summary(demo_cont.2)

## Linear mixed-effects model fit by maximum likelihood
##  Data: d1 
##        AIC      BIC    logLik
##   1194.885 1236.842 -588.4425
## 
## Random effects:
##  Formula: ~1 + G3AGE15 | NEWID
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.4607899 (Intr)
## G3AGE15     0.1173843 0.145 
## Residual    0.3724890       
## 
## Fixed effects: G3DEP ~ 1 + G3AGE15 + G3MALE + G3MALE * G3AGE15 + G3EVENT 
##                     Value  Std.Error  DF   t-value p-value
## (Intercept)     1.0714291 0.05527784 506 19.382614  0.0000
## G3AGE15         0.0537044 0.02525642 506  2.126367  0.0340
## G3MALE         -0.3668816 0.06223915 271 -5.894707  0.0000
## G3EVENT         0.0476098 0.00752327 506  6.328342  0.0000
## G3AGE15:G3MALE -0.0393223 0.03618959 506 -1.086563  0.2777
##  Correlation: 
##                (Intr) G3AGE15 G3MALE G3EVEN
## G3AGE15         0.077                      
## G3MALE         -0.557 -0.046               
## G3EVENT        -0.613 -0.042   0.005       
## G3AGE15:G3MALE -0.056 -0.698   0.067  0.032
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.014334757 -0.525721822  0.009252965  0.509876944  3.417510051 
## 
## Number of Observations: 782
## Number of Groups: 273

VarCorr(demo_cont.2)

## NEWID = pdLogChol(1 + G3AGE15) 
##             Variance   StdDev    Corr  
## (Intercept) 0.21232732 0.4607899 (Intr)
## G3AGE15     0.01377907 0.1173843 0.145 
## Residual    0.13874805 0.3724890

Example in Mplus

The R code below shows how to run the same example by calling Mplus remotely from within R using the MplusAutomation package. Notice that G3MALE is a between cluster variable. In the VARIABLE section, we indicate that G3MALE is a between cluster variable. Notice also that G3EVENT is a time varying covariate, it varies within cluster, therefore we note that it is a within cluster variable in the VARIABLE section.

setup.mplus.demo_cont.2 <- mplusObject(
   VARIABLE = 
      "USEV = G3DEP G3AGE15 G3MALE G3EVENT;
       CLUSTER = NEWID;
       WITHIN = G3AGE15 G3EVENT;
       BETWEEN = G3MALE;",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       S | G3DEP on G3AGE15;
       G3DEP on G3EVENT;
                                 
       %between%
       G3DEP;
       S;
       G3DEP with S;
   
       G3DEP S on G3MALE;",
                         
rdata = d1)

model.mplus.demo_cont.2 <- mplusModeler(
  setup.mplus.demo_cont.2, modelout = "mplus.demo_cont.2.inp", run = TRUE)

## 
## Running model: mplus.demo_cont.2.inp 
## System command: cd "." && "mplus" "mplus.demo_cont.2.inp" 
## Reading model:  mplus.demo_cont.2.out

output.mplus.demo_cont.2 <- readModels("mplus.demo_cont.2.out")

## Reading model:  mplus.demo_cont.2.out

output.mplus.demo_cont.2$parameters

## $unstandardized
##          paramHeader   param    est    se est_se  pval BetweenWithin
## 1           G3DEP.ON G3EVENT  0.048 0.008  6.097 0.000        Within
## 2 Residual.Variances   G3DEP  0.139 0.013 10.932 0.000        Within
## 3               S.ON  G3MALE -0.039 0.036 -1.090 0.276       Between
## 4           G3DEP.ON  G3MALE -0.367 0.062 -5.914 0.000       Between
## 5         G3DEP.WITH       S  0.008 0.009  0.849 0.396       Between
## 6         Intercepts   G3DEP  1.071 0.056 19.147 0.000       Between
## 7         Intercepts       S  0.054 0.025  2.133 0.033       Between
## 8 Residual.Variances   G3DEP  0.212 0.024  9.014 0.000       Between
## 9 Residual.Variances       S  0.014 0.010  1.395 0.163       Between

Script Set #2: Growth model for a binary outcome (G3 substance use)

Growth model without predictors

Example in R

We utilize the GLMMadaptive package’s (https://cran.r-project.org/web/packages/GLMMadaptive/GLMMadaptive.pdf) mixed_model command to fit a binary growth model in R. The syntax is similar to the lme function used above for the continuous outcome; however, the arguments for the syntax are slightly different. In this example, the outcome is G3SUBS, a binary indicator of the child’s self-reported substance use at each interview: 1 = use alcohol or cannabis at least once per month, 0 = do not use alcohol or cannabis or use less frequently than once per month.

In this growth model for a binary outcome, no random slope is included (see description in manuscript). We also specify family = binomial() to indicate that we desire a logit link to fit the model (i.e., a logistic regression). A multilevel model for a binary outcome produces unit specific effects, as described in the manuscript, we may also desire population-averaged effects. These can be obtained with the marginal_coefs command in GLMMadaptive.

demo_cat.1 <- mixed_model(fixed = G3SUBS ~ 1 + G3AGE15, 
                          random = ~ 1 | NEWID, 
                          data = d1, 
                          family = binomial()) 

summary(demo_cat.1)

## 
## Call:
## mixed_model(fixed = G3SUBS ~ 1 + G3AGE15, random = ~1 | NEWID, 
##     data = d1, family = binomial())
## 
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273 
## 
## Model:
##  family: binomial
##  link: logit 
## 
## Fit statistics:
##    log.Lik      AIC     BIC
##  -163.1173 332.2346 343.063
## 
## Random effects covariance matrix:
##               StdDev
## (Intercept) 2.867195
## 
## Fixed effects:
##             Estimate Std.Err z-value p-value
## (Intercept)  -5.1523  0.7581 -6.7967 < 1e-04
## G3AGE15       1.1764  0.2903  4.0518 < 1e-04
## 
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 11
## 
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE

marginal_coefs(demo_cat.1, std_errors = TRUE) # requests marginal coefficients

##             Estimate Std.Err  z-value p-value
## (Intercept)  -2.7081  0.2198 -12.3217 < 1e-04
## G3AGE15       0.7185  0.1536   4.6786 < 1e-04

Example in Mplus

We can fit the same model in Mplus, below is the syntax to fit the Mplus model via R. Because the outcome, G3SUBS, is a categorical variable, we must define it as such in the VARIABLE section (i.e., CATEGORICAL = G3SUBS). S@0 excludes the random slope (that is it constrains the random slope variance to be zero), so that we estimate only a random intercept.

Also note that for a categorical outcome, Mplus reports the threshold rather than the intercept. To obtain the intercept, the threshold is multiplied by -1.

setup.mplus.demo_cat.1 <- mplusObject(
   VARIABLE = 
      "USEV = G3SUBS G3AGE15;
       CLUSTER = NEWID;
       WITHIN = G3AGE15;
       CATEGORICAL = G3SUBS;",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       S | G3SUBS on G3AGE15;
                                 
       %between%
       G3SUBS;
       S@0;",
                         
rdata = d1)

model.mplus.demo_cat.1 <- mplusModeler(
  setup.mplus.demo_cat.1, modelout = "mplus.demo_cat.1.inp", run = TRUE)

## 
## Running model: mplus.demo_cat.1.inp 
## System command: cd "." && "mplus" "mplus.demo_cat.1.inp" 
## Reading model:  mplus.demo_cat.1.out

output.mplus.demo_cat.1 <- readModels("mplus.demo_cat.1.out")

## Reading model:  mplus.demo_cat.1.out

output.mplus.demo_cat.1$parameters

## $unstandardized
##   paramHeader    param   est    se  est_se    pval BetweenWithin
## 1       Means        S 1.229 0.294   4.175   0.000       Between
## 2  Thresholds G3SUBS$1 5.541 0.852   6.502   0.000       Between
## 3   Variances   G3SUBS 9.985 4.139   2.413   0.016       Between
## 4   Variances        S 0.000 0.000 999.000 999.000       Between

Growth model with predictors

Example in R

The code below fits the growth model with predictors in R, using the mixed_model function of the the GLMMadaptive package.

demo_cat.2 <- mixed_model(fixed = G3SUBS ~ 1 + G3AGE15 + G3MALE + 
                          G3MALE*G3AGE15 + G3EVENT, 
                          random = ~ 1 | NEWID, data = d1,
                          family = binomial())

summary(demo_cat.2)

## 
## Call:
## mixed_model(fixed = G3SUBS ~ 1 + G3AGE15 + G3MALE + G3MALE * 
##     G3AGE15 + G3EVENT, random = ~1 | NEWID, data = d1, family = binomial())
## 
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273 
## 
## Model:
##  family: binomial
##  link: logit 
## 
## Fit statistics:
##    log.Lik      AIC      BIC
##  -152.8909 317.7817 339.4386
## 
## Random effects covariance matrix:
##               StdDev
## (Intercept) 2.662432
## 
## Fixed effects:
##                Estimate Std.Err z-value  p-value
## (Intercept)     -6.8456  1.1398 -6.0061  < 1e-04
## G3AGE15          0.8950  0.3606  2.4820 0.013065
## G3MALE          -0.4880  0.6396 -0.7629 0.445496
## G3EVENT          0.3929  0.0999  3.9332  < 1e-04
## G3AGE15:G3MALE   0.7094  0.5391  1.3158 0.188243
## 
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 11
## 
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE

marginal_coefs(demo_cat.2, std_errors = TRUE)

##                Estimate Std.Err z-value  p-value
## (Intercept)     -4.0119  0.5260 -7.6277  < 1e-04
## G3AGE15          0.5865  0.2376  2.4683 0.013576
## G3MALE          -0.3610  0.4159 -0.8679 0.385449
## G3EVENT          0.2585  0.0657  3.9358  < 1e-04
## G3AGE15:G3MALE   0.5061  0.3555  1.4236 0.154548

Example in Mplus

This code fits the same model with Mplus code run remotely from R.

setup.mplus.demo_cat.2 <- mplusObject(
   VARIABLE = 
      "USEV = G3SUBS G3AGE15 G3MALE G3EVENT;
       CLUSTER = NEWID;
       WITHIN = G3AGE15 G3EVENT;
       BETWEEN = G3MALE;
       CATEGORICAL = G3SUBS;",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       S | G3SUBS on G3AGE15;
       G3SUBS on G3EVENT;
                                 
       %between%
       G3SUBS;
       S@0;
       G3SUBS S on G3MALE;",
                         
rdata = d1)

model.mplus.demo_cat.2 <- mplusModeler(
  setup.mplus.demo_cat.2, modelout = "mplus.demo_cat.2.inp", run = TRUE)

## 
## Running model: mplus.demo_cat.2.inp 
## System command: cd "." && "mplus" "mplus.demo_cat.2.inp" 
## Reading model:  mplus.demo_cat.2.out

output.mplus.demo_cat.2 <- readModels("mplus.demo_cat.2.out")

## Reading model:  mplus.demo_cat.2.out

output.mplus.demo_cat.2$parameters

## $unstandardized
##          paramHeader    param    est    se  est_se    pval BetweenWithin
## 1          G3SUBS.ON  G3EVENT  0.395 0.098   4.050   0.000        Within
## 2               S.ON   G3MALE  0.724 0.389   1.860   0.063       Between
## 3          G3SUBS.ON   G3MALE -0.491 0.626  -0.784   0.433       Between
## 4         Intercepts        S  0.901 0.328   2.742   0.006       Between
## 5         Thresholds G3SUBS$1  6.946 1.054   6.593   0.000       Between
## 6 Residual.Variances   G3SUBS  7.437 3.001   2.478   0.013       Between
## 7 Residual.Variances        S  0.000 0.000 999.000 999.000       Between

Two-part outcome

Create new variables for a two-part outcome for model estimation in GLMMadaptive

As described in the manuscript, G3WARM is left skewed. We elect to normalize the distribution of the variable by exponentiating all G3WARM scores. The mutate function in the dplyr package (part of the tidyverse) can be used to define new variables. Here we create a variable call G3WRMEXP. which exponentiates each observed G3WARM score. In the second line of code, we create a new variable called G3WRM2PRT, which will be used in the two-part model fit in GLMMadaptive. If the child had no contact with their father, a score of 0 will be recorded for G3WRM2PRT (i.e., if G3CON equals 0, record 0 for G3WRM2PRT), otherwise, we record the exponentiated warmth score (G3WRMEXP). Thus, for the new variable called G3WRM2PRT, all children without contact with their father have a score of 0, and all children with contact with their father have the G3WARM score observed (but exponentiated to deal with the skew) – that is, a positive score greater than 0. This is the expected format of the variable used as a two-part outcome with GLMMadaptive.

d2 <- d1 %>% 
  mutate(G3WRMEXP = exp(G3WARM),
         G3WRM2PRT = if_else(G3CON == 0, 0, G3WRMEXP))

Create hurdle_normal function for GLMMadaptive

GLMMadaptive has several built in functions for modeling the continuous portion of the two-part growth model. This includes, for example, hurdle.poisson for situations in which the continuous part is a count or hurdle.lognormal for situations in which the continuous part is a log-normal distribution. In our example, the warmth scores are left skewed. To normalize our continuous distribution of warmth scores, we choose to exponentiate the warmth scores and then specify a normal distribution for the continuous part when fitting the two-part growth model. Therefore, we create a custom function called hurdle_normal() to model our semi-continuous data. This custom function is defined in the code chunk below and then can be utilized when fitting the GLMMadaptive model in Script Set #3. If a user encounters a model in which the continuous part can be transformed to a normal distribution, then executing the code chunk below will allow the user to also make use of our hurdle_normal custom function we provide here. Otherwise, the built in functions of GLMMadaptive can be used to define the most appropriate distribution given one’s data.

This R code describes the different parts of the log-likelihood function defining the hurdle normal distribution. The resulting hurdle_normal object will later be used by the mixed_model function to solve for the model parameter estimates using the data. This function should be used if the distribution of the continuous part of the two-part variable is normally distributed (initially, or after applying a transformation prior to fitting the model).

is.zero <- function(x) sapply(x, function(u) isTRUE(all.equal(u, 0)))

# The "hurdle normal" family
hurdle_normal <- function() {
    link <- make.link("identity")

    log.f <- function(y, eta, mu, phi, zeta) {
        s <- exp(phi)
        i <- !is.zero(y)

        m <- as.matrix(eta)
        z <- as.matrix(zeta)

        f <- m
        f[i, ] <- plogis(z[i, ], lower.tail=FALSE, log.p=TRUE) +
            dnorm((y[i] - m[i, ])/s, log=TRUE) - log(s)
        f[!i, ] <- plogis(z[!i, ], log.p=TRUE)

        attr(f, "mu_y") <- m
        f
    }

    S.eta <- function(y, mu, phi, zeta) {
        s <- exp(phi)
        i <- !is.zero(y)
        m <- as.matrix(mu)
        S <- m
        S[!i, ] <- 0
        S[i, ] <- (y[i] - m[i, ])/s^2
        S
    }

    S.z <- function(y, mu, phi, zeta) {
        i <- !is.zero(y)
        p <- plogis(as.matrix(zeta))
        S <- 1 - p
        S[i, ] <- -p[i, ]
        S
    }

    S.phis <- function(y, mu, phi, zeta) {
        s <- exp(phi)
        i <- !is.zero(y)
        m <- as.matrix(mu)
        S <- m
        S[!i, ] <- 0
        S[i, ] <- (y[i] - m[i, ])^2/s^2 - 1
        S
    }

    structure(
        list(family="User hurdle normal family",
             link=link$name,
             linkfun=link$linkfun,
             linkinv=link$linkinv,
             log_dens=log.f,
             score_eta_fun=S.eta,
             score_eta_zi_fun=S.z,
             score_phis_fun=S.phis
         ),
        class="family")
}

Script Set #3: Fit two-part outcome model

Growth model without predictors

Example in R

The first two lines of the mixed_model function define the fixed and random effects for the continuous growth model, while the third and fourth lines (i.e., starting with zi_) define the fixed and random effects for the binary growth model.

Adaptive quadrature is a numerical integration method, and is used to fit computationally demanding models. The argument nAGQ below defines 15 points of quadrature to be used. A user can refit the model, varying points of quadrature depending on the complexity of the fitted model.

n_phis defines the number of extra parameters that the distribution of the outcome needs, in the case of a two-part growth model, there is one extra parameter.

out_mod.1 <- mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15, 
                    random = ~ 1 + G3AGE15 | NEWID, 
                    zi_fixed = ~ 1 + G3AGE15, 
                    zi_random = ~ 1 | NEWID, 
                    data = d2, family = hurdle_normal(), n_phis = 1, nAGQ = 15)

summary(out_mod.1)

## 
## Call:
## mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15, random = ~1 + G3AGE15 | 
##     NEWID, data = d2, family = hurdle_normal(), zi_fixed = ~1 + 
##     G3AGE15, zi_random = ~1 | NEWID, n_phis = 1, nAGQ = 15)
## 
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273 
## 
## Model:
##  family: User hurdle normal family
##  link: identity 
## 
## Fit statistics:
##    log.Lik      AIC      BIC
##  -2651.767 5325.534 5365.238
## 
## Random effects covariance matrix:
##                  StdDev    Corr        
## (Intercept)     11.5665  (Intr)  G3AGE1
## G3AGE15          0.9294 -0.2758        
## zi_(Intercept)   4.0524 -0.6947  0.2245
## 
## Fixed effects:
##             Estimate Std.Err z-value  p-value
## (Intercept)  27.9745   0.851 32.8716  < 1e-04
## G3AGE15      -0.9827   0.492 -1.9975 0.045773
## 
## Zero-part coefficients:
##             Estimate Std.Err z-value p-value
## (Intercept)  -3.9207  0.5516 -7.1075 < 1e-04
## G3AGE15       0.2238  0.1819  1.2308 0.21839
## 
## phi parameters:
##       Estimate Std.Err
## phi_1   1.9291  0.0403
## 
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 15
## 
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE

# output residual standard deviation (to match with Mplus model)
exp(out_mod.1$phis)

##    phi_1 
## 6.883637

Example in Mplus

The code below executes Mplus through R to fit the same two-part growth model. Notice that with Mplus, the two separate variables (G3CON and G3WRMEXP) are used, negating the need to create one variable (i.e., G3WRM2PRT that was needed for GLMMdaptive).

setup.mplus.out_mod.1 <- mplusObject(
   VARIABLE = 
      "USEV = G3CON G3AGE15 G3WRMEXP;
       CLUSTER = NEWID;
       WITHIN = G3AGE15;
       CATEGORICAL = G3CON;",
   
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
   
   DEFINE = 
     "G3WRMEXP = exp(G3WARM);",
           
   MODEL = 
      "%within%
       SBIN | G3CON on G3AGE15;
       SCONT | G3WRMEXP on G3AGE15;
                                 
       %between%
       G3CON;
       G3WRMEXP;
       SCONT;
       SBIN@0;
       G3CON with G3WRMEXP SCONT;
       G3WRMEXP with SCONT;",
                         
rdata = d2)

model.mplus.out_mod.1 <- mplusModeler(
  setup.mplus.out_mod.1, modelout = "mplus.out_mod.1.inp", run = TRUE)

## 
## Running model: mplus.out_mod.1.inp 
## System command: cd "." && "mplus" "mplus.out_mod.1.inp" 
## Reading model:  mplus.out_mod.1.out

output.mplus.out_mod.1 <- readModels("mplus.out_mod.1.out")

## Reading model:  mplus.out_mod.1.out

output.mplus.out_mod.1$parameters

## $unstandardized
##           paramHeader    param     est     se  est_se    pval BetweenWithin
## 1  Residual.Variances G3WRMEXP  48.044  4.716  10.187   0.000        Within
## 2          G3CON.WITH    SCONT  -1.215  4.560  -0.266   0.790       Between
## 3       G3WRMEXP.WITH    SCONT  -3.478  5.825  -0.597   0.551       Between
## 4          G3CON.WITH G3WRMEXP  32.545  7.231   4.501   0.000       Between
## 5               Means G3WRMEXP  27.964  0.847  33.019   0.000       Between
## 6               Means     SBIN  -0.090  0.163  -0.549   0.583       Between
## 7               Means    SCONT  -0.884  0.524  -1.687   0.092       Between
## 8          Thresholds  G3CON$1  -3.878  0.521  -7.444   0.000       Between
## 9           Variances    G3CON  15.908  4.517   3.522   0.000       Between
## 10          Variances G3WRMEXP 134.464 16.616   8.092   0.000       Between
## 11          Variances     SBIN   0.000  0.000 999.000 999.000       Between
## 12          Variances    SCONT   0.190  3.527   0.054   0.957       Between

Growth model with predictors

Example in R

In the code below, we add predictors to our two-part growth model. Here, we add the same predictors to both parts, but that is not necessary. One can build a model with different predictors if desired.

out_mod.2 <- mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15 + INTACT + 
                            G3AGE15*INTACT + G3EVENT,
                         random = ~ 1 + G3AGE15 | NEWID, 
                         zi_fixed = ~ 1 + G3AGE15 + INTACT + 
                            G3AGE15*INTACT + G3EVENT,
                         zi_random = ~ 1 | NEWID,
                    data = d2, family = hurdle_normal(), n_phis = 1, nAGQ = 15)

summary(out_mod.2)

## 
## Call:
## mixed_model(fixed = G3WRM2PRT ~ 1 + G3AGE15 + INTACT + G3AGE15 * 
##     INTACT + G3EVENT, random = ~1 + G3AGE15 | NEWID, data = d2, 
##     family = hurdle_normal(), zi_fixed = ~1 + G3AGE15 + INTACT + 
##         G3AGE15 * INTACT + G3EVENT, zi_random = ~1 | NEWID, n_phis = 1, 
##     nAGQ = 15)
## 
## Data Descriptives:
## Number of Observations: 782
## Number of Groups: 273 
## 
## Model:
##  family: User hurdle normal family
##  link: identity 
## 
## Fit statistics:
##    log.Lik      AIC      BIC
##  -2632.189 5298.377 5359.738
## 
## Random effects covariance matrix:
##                  StdDev    Corr        
## (Intercept)     10.9211  (Intr)  G3AGE1
## G3AGE15          0.8665 -0.3205        
## zi_(Intercept)   3.8081 -0.6792  0.3109
## 
## Fixed effects:
##                Estimate Std.Err z-value    p-value
## (Intercept)     29.1141  1.2806 22.7339    < 1e-04
## G3AGE15         -0.8820  0.5990 -1.4724 0.14091215
## INTACT           4.7396  1.6248  2.9171 0.00353307
## G3EVENT         -0.5438  0.1628 -3.3399 0.00083812
## G3AGE15:INTACT  -0.1726  0.7877 -0.2191 0.82656600
## 
## Zero-part coefficients:
##                Estimate Std.Err z-value p-value
## (Intercept)     -2.5789  0.5866 -4.3963 < 1e-04
## G3AGE15          0.2412  0.1924  1.2538 0.20992
## INTACT          -3.8844  0.9864 -3.9381 < 1e-04
## G3EVENT         -0.0794  0.0753 -1.0540 0.29188
## G3AGE15:INTACT  -0.0282  0.6325 -0.0445 0.96449
## 
## phi parameters:
##       Estimate Std.Err
## phi_1   1.9315  0.0407
## 
## Integration:
## method: adaptive Gauss-Hermite quadrature rule
## quadrature points: 15
## 
## Optimization:
## method: hybrid EM and quasi-Newton
## converged: TRUE

# output residual standard deviation
exp(out_mod.2$phis)

##   phi_1 
## 6.89976

Example in Mplus

The code below fits the same model in Mplus through R using the MplusAutomation package.

setup.mplus.out_mod.2 <- mplusObject(
   VARIABLE = 
      "USEV = G3CON G3AGE15 INTACT G3EVENT G3WRMEXP;
       CLUSTER = NEWID;
       WITHIN = G3AGE15 G3EVENT;
       BETWEEN = INTACT;
       CATEGORICAL = G3CON;",
   
   DEFINE = 
     "G3WRMEXP = exp(G3WARM);",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       SBIN | G3CON on G3AGE15;
       SCONT | G3WRMEXP on G3AGE15;
       G3CON G3WRMEXP on G3EVENT;
                                 
       %between%
       G3CON;
       G3WRMEXP;
       SCONT;
       SBIN@0;
       G3CON with G3WRMEXP SCONT;
       G3WRMEXP with SCONT;
   
       G3CON SBIN G3WRMEXP SCONT on INTACT;",
                         
rdata = d2)

model.mplus.out_mod.2 <- mplusModeler(
  setup.mplus.out_mod.2, modelout = "mplus.out_mod.2.inp", run = TRUE)

## 
## Running model: mplus.out_mod.2.inp 
## System command: cd "." && "mplus" "mplus.out_mod.2.inp" 
## Reading model:  mplus.out_mod.2.out

output.mplus.out_mod.2 <- readModels("mplus.out_mod.2.out")

## Reading model:  mplus.out_mod.2.out

output.mplus.out_mod.2$parameters

## $unstandardized
##           paramHeader    param     est     se  est_se    pval BetweenWithin
## 1            G3CON.ON  G3EVENT   0.077  0.074   1.038   0.299        Within
## 2         G3WRMEXP.ON  G3EVENT  -0.542  0.163  -3.325   0.001        Within
## 3  Residual.Variances G3WRMEXP  48.177  4.769  10.102   0.000        Within
## 4             SBIN.ON   INTACT   0.013  0.495   0.027   0.978       Between
## 5            SCONT.ON   INTACT  -0.213  0.802  -0.266   0.791       Between
## 6            G3CON.ON   INTACT   3.941  0.970   4.062   0.000       Between
## 7         G3WRMEXP.ON   INTACT   4.857  1.628   2.983   0.003       Between
## 8          G3CON.WITH    SCONT  -1.733  4.318  -0.401   0.688       Between
## 9       G3WRMEXP.WITH    SCONT  -3.567  5.266  -0.677   0.498       Between
## 10         G3CON.WITH G3WRMEXP  28.185  6.710   4.200   0.000       Between
## 11         Intercepts G3WRMEXP  29.038  1.282  22.652   0.000       Between
## 12         Intercepts     SBIN  -0.131  0.176  -0.745   0.456       Between
## 13         Intercepts    SCONT  -0.765  0.641  -1.195   0.232       Between
## 14         Thresholds  G3CON$1  -2.532  0.570  -4.438   0.000       Between
## 15 Residual.Variances    G3CON  14.046  4.029   3.486   0.000       Between
## 16 Residual.Variances G3WRMEXP 119.699 15.259   7.845   0.000       Between
## 17 Residual.Variances     SBIN   0.000  0.000 999.000 999.000       Between
## 18 Residual.Variances    SCONT   0.216  3.542   0.061   0.951       Between

Two-part predictor

Script Set #4: Fit two-part predictor model

Example in R

Center G3WARM at a score of 2

The code below fits a traditional growth model for G3DEP, child depressive symptoms, but we consider a two-part predictor. First, we create a new variable, G3WARMC, which centers G3WARM at a score of 2 if the child had contact with their father, otherwise, a 0 is recorded for G3WARMC. Second, we create a new variable called G3WARMX which is the product of our newly created, centered warmth variable (G3WARMC multiplied with G3CON – a binary indicator of whether or not the child had contact with their father).

When we fit the lme model (i.e, multilevel model for a continuous variable) we regress G3DEP on the intercept and time variable (G3AGE15) to specify growth, and also the time-varying predictors of G3CON (binary indicator of contact versus no contact) and G3WARMX created earlier.

d3 <- d1 %>% 
  mutate(G3WARMC = if_else(G3CON == 1, G3WARM - 2, 0), 
         G3WARMX = G3WARMC*G3CON)

pred_mod.1 <- lme(G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX, 
                  random = ~ 1 + G3AGE15 | NEWID, 
                  data = d3, method = "ML")

summary(pred_mod.1)

## Linear mixed-effects model fit by maximum likelihood
##  Data: d3 
##        AIC      BIC   logLik
##   1226.406 1263.701 -605.203
## 
## Random effects:
##  Formula: ~1 + G3AGE15 | NEWID
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.4999530 (Intr)
## G3AGE15     0.1377766 0.133 
## Residual    0.3656518       
## 
## Fixed effects: G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX 
##                  Value  Std.Error  DF   t-value p-value
## (Intercept)  1.0618985 0.05823829 506 18.233683  0.0000
## G3AGE15      0.0301774 0.01843092 506  1.637322  0.1022
## G3CON        0.4208940 0.08063701 506  5.219613  0.0000
## G3WARMX     -0.2821255 0.05010854 506 -5.630287  0.0000
##  Correlation: 
##         (Intr) G3AGE1 G3CON 
## G3AGE15 -0.005              
## G3CON   -0.491 -0.025       
## G3WARMX -0.133  0.080 -0.695
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.96453959 -0.50600825 -0.03187427  0.50402709  3.18908409 
## 
## Number of Observations: 782
## Number of Groups: 273

VarCorr(pred_mod.1)

## NEWID = pdLogChol(1 + G3AGE15) 
##             Variance  StdDev    Corr  
## (Intercept) 0.2499530 0.4999530 (Intr)
## G3AGE15     0.0189824 0.1377766 0.133 
## Residual    0.1337012 0.3656518

Center G3WARM at a score of 4

Because the effect of contact (G3CON) depends on the centering point of G3WARMX (see manuscript for further description), it may be of interest to consider the effect of contact at different centering points. Here we fit the same model as above, but instead centering warmth at a score of 4.

d3 <- d1 %>% 
  mutate(G3WARMC = ifelse(G3CON == 1, G3WARM - 4, 0),
         G3WARMX = G3WARMC*G3CON)


pred_mod.2 <- lme(G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX, 
                  random = ~ 1 + G3AGE15 | NEWID, 
                  data = d3, method = "ML")

summary(pred_mod.2)

## Linear mixed-effects model fit by maximum likelihood
##  Data: d3 
##        AIC      BIC   logLik
##   1226.406 1263.701 -605.203
## 
## Random effects:
##  Formula: ~1 + G3AGE15 | NEWID
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.4999530 (Intr)
## G3AGE15     0.1377766 0.133 
## Residual    0.3656518       
## 
## Fixed effects: G3DEP ~ 1 + G3AGE15 + G3CON + G3WARMX 
##                  Value  Std.Error  DF   t-value p-value
## (Intercept)  1.0618985 0.05823829 506 18.233683  0.0000
## G3AGE15      0.0301774 0.01843092 506  1.637322  0.1022
## G3CON       -0.1433569 0.07291013 506 -1.966214  0.0498
## G3WARMX     -0.2821255 0.05010854 506 -5.630287  0.0000
##  Correlation: 
##         (Intr) G3AGE1 G3CON 
## G3AGE15 -0.005              
## G3CON   -0.726  0.082       
## G3WARMX -0.133  0.080  0.606
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.96453959 -0.50600825 -0.03187427  0.50402709  3.18908409 
## 
## Number of Observations: 782
## Number of Groups: 273

VarCorr(pred_mod.2)

## NEWID = pdLogChol(1 + G3AGE15) 
##             Variance  StdDev    Corr  
## (Intercept) 0.2499530 0.4999530 (Intr)
## G3AGE15     0.0189824 0.1377766 0.133 
## Residual    0.1337012 0.3656518

Example in Mplus

Center G3WARM at a score of 2

These last two code sections fit the same models as above (using lme to consider two-part predictors), except here we use Mplus to fit the models through R

setup.mplus.pred_mod.1 <- mplusObject(
   VARIABLE = 
      "USEV = G3DEP G3AGE15 G3CON G3WARMX;
       CLUSTER = NEWID;
       WITHIN = G3AGE15 G3CON G3WARMX;",
   
   DEFINE =  
      "G3WARMC = G3WARM - 2;
       IF G3CON == 0 THEN G3WARMC = 0;
       G3WARMX = G3WARMC*G3CON;",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       S | G3DEP on G3AGE15;
       G3DEP on G3CON G3WARMX;
                                 
       %between%
       G3DEP;
       S;
       G3DEP with S;",
                         
rdata = d1)

model.mplus.pred_mod.1 <- mplusModeler(
  setup.mplus.pred_mod.1, modelout = "mplus.pred_mod.1.inp", run = TRUE)

## 
## Running model: mplus.pred_mod.1.inp 
## System command: cd "." && "mplus" "mplus.pred_mod.1.inp" 
## Reading model:  mplus.pred_mod.1.out

output.mplus.pred_mod.1 <- readModels("mplus.pred_mod.1.out")

## Reading model:  mplus.pred_mod.1.out

output.mplus.pred_mod.1$parameters

## $unstandardized
##          paramHeader   param    est    se est_se  pval BetweenWithin
## 1           G3DEP.ON   G3CON  0.421 0.081  5.172 0.000        Within
## 2           G3DEP.ON G3WARMX -0.282 0.052 -5.433 0.000        Within
## 3 Residual.Variances   G3DEP  0.134 0.012 11.024 0.000        Within
## 4         G3DEP.WITH       S  0.009 0.010  0.913 0.361       Between
## 5              Means   G3DEP  1.062 0.058 18.191 0.000       Between
## 6              Means       S  0.030 0.018  1.641 0.101       Between
## 7          Variances   G3DEP  0.250 0.027  9.383 0.000       Between
## 8          Variances       S  0.019 0.010  1.920 0.055       Between

Center G3WARM at a score of 4

setup.mplus.pred_mod.2 <- mplusObject(
   VARIABLE = 
      "USEV = G3DEP G3AGE15 G3CON G3WARMX;
       CLUSTER = NEWID;
       WITHIN = G3AGE15 G3CON G3WARMX;",
   
   DEFINE =  
      "G3WARMC = G3WARM - 4;
       IF G3CON == 0 THEN G3WARMC = 0;
       G3WARMX = G3WARMC*G3CON;",
  
   ANALYSIS = 
      "ESTIMATOR = ML;
       TYPE = twolevel random;",
           
   MODEL = 
      "%within%
       S | G3DEP on G3AGE15;
       G3DEP on G3CON G3WARMX;
                                 
       %between%
       G3DEP;
       S;
       G3DEP with S;",
                         
rdata = d1)

model.mplus.pred_mod.2 <- mplusModeler(
  setup.mplus.pred_mod.2, modelout = "mplus.pred_mod.2.inp", run = TRUE)

## 
## Running model: mplus.pred_mod.2.inp 
## System command: cd "." && "mplus" "mplus.pred_mod.2.inp" 
## Reading model:  mplus.pred_mod.2.out

output.mplus.pred_mod.2 <- readModels("mplus.pred_mod.2.out")

## Reading model:  mplus.pred_mod.2.out

output.mplus.pred_mod.2$parameters

## $unstandardized
##          paramHeader   param    est    se est_se  pval BetweenWithin
## 1           G3DEP.ON   G3CON -0.143 0.075 -1.923 0.055        Within
## 2           G3DEP.ON G3WARMX -0.282 0.052 -5.433 0.000        Within
## 3 Residual.Variances   G3DEP  0.134 0.012 11.024 0.000        Within
## 4         G3DEP.WITH       S  0.009 0.010  0.913 0.361       Between
## 5              Means   G3DEP  1.062 0.058 18.191 0.000       Between
## 6              Means       S  0.030 0.018  1.641 0.101       Between
## 7          Variances   G3DEP  0.250 0.027  9.383 0.000       Between
## 8          Variances       S  0.019 0.010  1.920 0.055       Between

R Notebook: Two-Part Models for Father-Child Relationship Variables: Presence in the Child’s Life and Quality of the Relationship Conditional on Some Presence

supplementary material for the Journal of the Society for Social Work and Research

Load packages

Import data

Format data

Growth model primer

Script Set #1: Growth model for a continuous outcome (G3 depressive symptoms)

Growth model without predictors

Example in R

Example in Mplus

Growth model with predictors

Example in R

Example in Mplus

Script Set #2: Growth model for a binary outcome (G3 substance use)

Growth model without predictors

Example in R

Example in Mplus

Growth model with predictors

Example in R

Example in Mplus

Two-part outcome

Create new variables for a two-part outcome for model estimation in GLMMadaptive

Create hurdle_normal function for GLMMadaptive

Script Set #3: Fit two-part outcome model

Growth model without predictors

Example in R

Example in Mplus

Growth model with predictors

Example in R

Example in Mplus

Two-part predictor

Script Set #4: Fit two-part predictor model

Example in R

Center G3WARM at a score of 2

Center G3WARM at a score of 4

Example in Mplus

Center G3WARM at a score of 2

Center G3WARM at a score of 4