PSY350, Module 12

Telling Your Story With Data

Module road map

Learning objectives

  1. Describe the data science workflow
  2. Replicate and describe the analyses in Blair et al. (2004)
  3. Practice using American Psychological Association (APA style) for reporting study findings

Introduction

Chapter 12 of your textbook presents a data sciene pipeline – that is, the workflow that an analyst completes in order to better understand a pertinent issue using data and/or to answer a salient research question. In your last Apply and Practice exercise, you read the Psychological Science paper by Irene Blair and colleagues entitled “The Influence of Afrocentric Facial Features in Criminal Sentencing”.

You also explored the papaja package to create an APA style report of study findings using the Blair paper as a template. In this module we are going to dig deeper into the Blair paper and walk through the workflow that the authors would have taken to produce the paper.

As a refresher, please watch the brief video below which outlines the rational for the study and the study hypotheses.



Now, let’s walk through the analyses described in the paper. We will take the following steps to replicate the study analyses:

  1. Load the necessary libraries
  2. Import the data
  3. Conduct exploratory data analyses
  4. Wrange the data to create a final dataframe for model fitting
  5. Fit the 5 models described by Blair and colleagues
  6. Produce the Figures and Tables presented by Blair and colleagues
  7. Check model adequacy (i.e., evaluate linear regression assumptions)

Load libraries

As always, we must start a data analysis by loading the needed libraries.

library(here)
library(skimr)
library(moderndive)
library(sjPlot)
library(magrittr)
library(correlation)
library(GGally)
library(performance)
library(see)
library(tidyverse)

Import and examine the data

Next, we need to import the data. Once imported we’ll use the glimpse() function to examine the included variables.

df <- (here("data", "sentence.csv")) %>% 
  read_csv() 
df %>% 
  glimpse()
## Rows: 216
## Columns: 13
## $ id       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ years    <dbl> 3.83, 11.50, 4.33, 17.17, 14.17, 4.42, 99.00, 4.67, 9.17, 9.1…
## $ black    <dbl> 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1…
## $ afro     <dbl> 1.94286, 6.20000, 5.00000, 7.85714, 5.50000, 5.88235, 2.94286…
## $ primlev  <dbl> 7, 9, 9, 10, 6, 7, 11, 5, 9, 11, 11, 9, 8, 8, 9, 7, 9, 7, 4, …
## $ seclev   <dbl> 5.00, 8.00, 0.00, 4.33, 6.00, 7.00, 6.33, 3.86, 4.67, 0.00, 0…
## $ nsecond  <dbl> 1, 1, 0, 3, 1, 3, 3, 7, 3, 0, 0, 2, 1, 2, 3, 1, 3, 0, 0, 1, 9…
## $ anysec   <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1…
## $ priorlev <dbl> 0.00, 0.00, 0.00, 0.00, 4.25, 7.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ nprior   <dbl> 0, 0, 0, 0, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 6, …
## $ anyprior <dbl> 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0…
## $ attract  <dbl> 2.00000, 2.22857, 5.26471, 3.54286, 2.00000, 2.08824, 3.57143…
## $ babyface <dbl> 2.77143, 2.65714, 4.26471, 3.62857, 4.94118, 5.67647, 4.22857…

Most data sets will come to you with something called a codebook. A codebook describes the variables in the dataframe. A codebook can come in multiple formats – for example a word document, an excel table, of a R markdown html file. The codebook for these data came to me as an excel table. The table below includes the information in the codebook table.

variable_name label notes
id inmate ID NA
years sentence length in years NA
black race (1 = black, 0 = white) NA
afro rating for afrocentric features inmate photo rated by ~35 CU undergraduate students; raters assigned a single, global assessment of the degree to which each face had features that are typical of African Americans, using a scale from 1 (not at all) to 9 (very much).
primlev seriousness of primary offense based on Florida’s 10-point system, lower numbers indicate less serious felonies
seclev seriousness of secondary offense based on Florida’s 10-point system, lower numbers indicate less serious felonies
nsecond number of secondary offenses NA
anysec indicator for any secondary offenses (1 = yes, 0 = no) NA
priorlev seriousness of prior offenses based on Florida’s 10-point system, lower numbers indicate less serious felonies
nprior number of prior offenses NA
anyprior indicator for any prior offenses (1 = yes, 0 = no) NA
attract rating for attractiveness inmate photo rated by ~35 CU undergraduate students
babyface rating for babyface features inmate photo rated by ~35 CU undergraduate students

Exploratory data analysis

An initial scan of the variables

I like to begin by looking at descriptive statistics for the variables so that I can get a sense of the mean, minimum, and maximum values. It’s also of use to determine if there are missing values. This helps in ensuring that the values in the dataframe are what you’d expect. The skim() function does a good job with this. Scan thorugh the output and note the descriptive statistics for each variable in the dataframe.

df %>% 
  skim()
Data summary
Name Piped data
Number of rows 216
Number of columns 13
_______________________
Column type frequency:
numeric 13
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
id 0 1 108.50 62.50 1.00 54.75 108.50 162.25 216.00 ▇▇▇▇▇
years 0 1 6.84 15.54 0.42 1.56 2.67 4.67 99.00 ▇▁▁▁▁
black 0 1 0.46 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▇
afro 0 1 4.53 1.77 1.49 2.83 4.47 6.09 7.86 ▇▇▆▇▆
primlev 0 1 6.55 2.07 1.00 5.00 7.00 8.00 11.00 ▁▇▇▆▁
seclev 0 1 3.40 2.57 0.00 0.00 3.65 5.00 9.00 ▇▆▇▃▂
nsecond 0 1 2.41 3.82 0.00 1.00 1.00 3.00 41.00 ▇▁▁▁▁
anysec 0 1 0.75 0.44 0.00 0.00 1.00 1.00 1.00 ▃▁▁▁▇
priorlev 0 1 1.43 2.29 0.00 0.00 0.00 3.00 9.00 ▇▂▂▁▁
nprior 0 1 0.95 1.90 0.00 0.00 0.00 1.00 13.00 ▇▁▁▁▁
anyprior 0 1 0.33 0.47 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▃
attract 0 1 3.21 0.91 1.43 2.57 3.16 3.71 6.51 ▃▇▅▁▁
babyface 0 1 4.04 1.10 1.66 3.18 3.91 4.82 7.09 ▂▇▇▅▁

Explore the outcome variable

The outcome variable of interest in the Blair and colleagues paper is sentence length in years. Let’s create a histogram of this variable.

df %>% 
  ggplot(aes(x = years)) +
  geom_histogram(binwidth = 1) +
  labs(title = "Histogram of sentence length in years", 
       x = "Sentence length in years") +
  theme_bw()


This histogram illustrates that sentence length has extreme positive skew. Most of the inmates have relatively short sentence lengths. In the paper, Blair and colleagues indicate:

Because sentence length was positively skewed, a log-transformation was performed on this variable prior to analysis.

Often when you encounter a variable with extreme skew of this type, some sort of a down transformation (i.e., a transformation that pulls in the tail of the distribution) is needed in order to avoid violating the assumptions of your modeling procedure. For a linear regression model (the type of model used in the paper) extremely skewed variables can cause a violation of several assumptions – including normality of the residuals, homegeneity of variance of the residuals, and, perhaps most importantly, linearity. Let’s explore this last issue of linearity a bit.

In the following codechunk I will create a new version of years, called lnyears, which computes the natural log of years (ln is commonly used to stand for natural log). Then, I will create two scatter plots, one which assesses the relationship between the key predictor of interest (Afrocentric features) and years, and one which assesses the relationship between the key predictor and our newly transformed outcome (lnyears). Let’s see which produces a more linear relationship.

df <-
  df %>% 
  mutate(lnyears = log(years))

df %>% ggplot(aes(x = afro, y = years)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x,  se = FALSE) +
  labs(title = "Scatterplot of Afrocentric features and sentence length in years",
       x = "Afrocentric features",
       y = "Sentence length in years") +
  theme_bw()

df %>% ggplot(aes(x = afro, y = lnyears)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
  labs(title = "Scatterplot of Afrocentric features and sentence length in ln(years)",
       x = "Afrocentric features",
       y = "Sentence length in ln(years)") +
  theme_bw()


The second plot, which includes lnyears, is preferable. With the first plot, the linear model does a poor job of modeling the data. We have many extremely large residuals (i.e., difference between the observed value and the predicted value). To view this in another way, let’s plot the residuals from fitting a simple linear regression where each outcome is regressed on Afrocentric features to see which produces more normally distributed residuals.

# fit model with years as outcome
df %>% 
  lm(years ~ afro, data = .) %>% 
  get_regression_points() %>% 
  ggplot(aes(x = residual)) +
  geom_histogram(bins = 25) +
  labs(title = "Residuals for years") +
  theme_bw()

# fit model with lnyears as outcome
df %>% 
  lm(lnyears ~ afro, data = .) %>% 
  get_regression_points() %>% 
  ggplot(aes(x = residual)) +
  geom_histogram(bins = 25) +
  labs(title = "Residuals for ln(years)") +
  theme_bw()


The residuals for the model with lnyears as the outcome results in more normally distributed residuals as well. Since lnyears seems to be better both in terms of linearity of the model and the residuals, the authors used lnyears as the outcome rather than years.

Explore the key predictor

The key predictor in the Blair investigation is Afrocentric features (a variable called afro). As described in the paper the 216 facial photographs associated with the selected cases were randomly divided into two sets, each with approximately equal numbers of Black and White inmates. Each set was given to a group of undergraduate research participants (n = 34 and n = 35) who were asked to make a single, global assessment of the degree to which each face had features that are typical of African Americans, using a scale from 1 (not at all) to 9 (very much). To get a better sense of the distribution of the variable for Black and White inmates, let’s create a density plot of Afrocentric features, grouped by race.

I’ll first create a factor version of the binary indicator for race (it’s called black in the dataframe). Then I will produce the plot. I will also use skim() to calculate descriptive statistics of afro by race.

df <-
  df %>% 
  mutate(black.f = factor(black, levels = c(0,1), labels = c("White", "Black")))

df %>% 
  ggplot(aes(x = afro, grouping = black.f, fill = black.f)) +
  geom_density(alpha = .5) +
  labs(title = "Distribution of Afrocentric features by race of inmate",
       x = "Afrocentric features",
       fill = "Race") +
  theme_bw()

df %>% 
  select(black.f, afro) %>% 
  group_by(black.f) %>% 
  skim()
Data summary
Name Piped data
Number of rows 216
Number of columns 2
_______________________
Column type frequency:
numeric 1
________________________
Group variables black.f

Variable type: numeric

skim_variable black.f n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
afro White 0 1 3.33 1.27 1.49 2.42 2.97 3.92 6.94 ▆▇▂▂▁
afro Black 0 1 5.92 1.11 2.31 5.37 6.03 6.75 7.86 ▁▂▃▇▅

Here we see that, as would be expected, Black inmates have a higher average score on afro than White inmates. But, there is substantial variation in Afrocentric features for both White and Black individuals. Can you find the section in the Blair paper where these means (from the skim() output) are presented? In the same section, the authors report an independent samples t-test to determine if the difference in Afrocentric features across the two groups is significant, they allowed the variance of afro to differ by race (var.equal = FALSE). The code below replicates this. Can you match up Blair et al’s report of this t-test to the results presented here? Note that the sign of the test statistic is opposite, but that is just because they used a different statistical program with different defaults for assigning the comparison group.

df %$% t.test(afro ~ black.f, alternative = "two.sided", var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  afro by black.f
## t = -16.026, df = 213.97, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group White and group Black is not equal to 0
## 95 percent confidence interval:
##  -2.916197 -2.277401
## sample estimates:
## mean in group White mean in group Black 
##            3.327209            5.924008

Explore control variables

One of the aspects of the Blair paper that I quite like is the detailed explanation for how they chose their control variables. The main set of control variables pertain to the inmate’s criminal history. Blair and colleagues explain:

Our first analysis used multiple regression to determine the degree to which sentence length (years) was influenced by only those factors that should lawfully predict sentencing: seriousness of the primary offense (primlev), the number and seriousness of additional concurrent offenses (nsecond and seclev), and the number and seriousness of prior offenses (nprior and priorlev). We also included quadratic terms for seriousness of the primary offense (primlev2), seriousness of additional offenses (seclev2), and seriousness of prior offenses (priorlev2), because the Florida Criminal Punishment Code specifies that for more serious offenses, the length of the sentence ought to increase dramatically as the seriousness of the offense increases. Because sentence length was positively skewed, a log-transformation was performed on this variable prior to analysis (lnyears).

I added the variable names to the passage presented above so that you could relate the description to the variable. The last sentence describes something that perhaps is not familiar to you – i.e., the use of quadratic terms. The addition of quadratic terms for predictors, also referred to as polynomial regression, is a bit advanced for this intro course. But, I’d like to offer a brief explanation for interested students. You can ignore this section if you are not interested.

When including a quadratic term for a predictor of interest, one creates a new version of the predictor that is the square of itself – for example \(primlev2 = primlev^2\), then both the initial version of the predictor and this squared version of the predictor are included in the regression model as predictors. This allows for a curvilinear effect of the variable of interest on the outcome. If interested, you can read more about quadratic/polynomial regression here. By allowing the effect of each of the seriousness of offense variables to have a quadratic effect, Blair and colleagues allowed for their assertion that “the length of the sentence ought to increase dramatically as the seriousness of the offense increases”. That is, that seriousness should not be linearly related to sentence length, but rather non-linearly related such that as seriousness increased, sentence length ramped upward. You can see this effect in the example graph below. At higher levels of seriousness (e.g., above about a score of 5) each one unit increase in seriousness is associated with a larger increase in sentence length than at lower levels of seriousness.

In the code chunk below, I will create the squared terms for each of the seriousness control variables. It’s good practice to mean center the variables before forming the quadratic terms, so I will do that as well. To mean center just means to subtract the sample mean for the variable from each individuals score.

df <- df %>% 
  mutate(primlev_m = primlev - mean(primlev),
         primlev2 = primlev_m^2,
         seclev_m = seclev - mean(seclev),
         seclev2 = seclev_m^2,
         priorlev_m = priorlev - mean(priorlev),
         priorlev2 = priorlev_m^2) 

In addition to these control variables that pertain to criminal histories, Blair and colleagues also consider two alternative explanations that could potentially explain away the effect of Afrocentric features – attractiveness and babyish features. They argue that these two variables, also rated by the undergraduate students who rated Afrocentric features, could be correlated with the predictor of interest (afro) – and could potentially reduce the effect of Afrocentric features when controlled (i.e., added as additional predictors to the regression model). Let’s examine the extent to which Afrocentric features is correlated with these two variables. For creating a correlation matrix scatterplot – I like to use the ggpairs() function from the GGally package. I will also use the correlation() function from the correlation package (which is part of the tidyverse) to calculate the correlation between all pairs of variables.

df %>% 
  select(afro, attract, babyface) %>% 
  ggpairs()

df %>% 
  select(afro, attract, babyface) %>%
  correlation()
Parameter1 Parameter2 r CI CI_low CI_high t df_error p Method n_Obs
afro attract 0.2461009 0.95 0.1164331 0.3675199 3.7143854 214 0.0007798 Pearson correlation 216
afro babyface 0.0180025 0.95 -0.1157687 0.1511323 0.2633967 214 0.7924983 Pearson correlation 216
attract babyface 0.0854471 0.95 -0.0486002 0.2164709 1.2545720 214 0.4220053 Pearson correlation 216

We can see here that Afrocentric features and attractiveness are positively correlated (r = .246), while Afrocentric features and babyish features are not correlated (r = .018). In the Blair paper, they present the correlation between Afrocentric features and these two control variables adjusting for race – this is called a partial correlation. We won’t cover that here, but you can read more about partial correlations here. If you’re curious about what partial correlation captures – take a look at the code below. It’s just the correlation between the residuals of afro and the two competing controls (attract and babyface), after pulling out the variance in these variables that can be predicted by race. Note that the resulting estimate is not exactly the same as that obtained by Blair and collegeaus.

# obtain residuals for afro, attract and babyface -- controlling for race
afro_r <- lm(afro ~ black, data = df) %>% get_regression_points(ID = "id") %>% select(id, residual) %>% rename(afro_r = residual)
attract_r <- lm(attract ~ black, data = df) %>% get_regression_points(ID = "id") %>% select(id, residual) %>% rename(attract_r = residual)
babyface_r <- lm(babyface ~ black, data = df) %>% get_regression_points(ID = "id") %>% select(id, residual) %>% rename(babyface_r = residual)

# merge the residuals together
all_resids <- 
  afro_r %>% 
  left_join(attract_r, by = "id") %>% 
  left_join(babyface_r, by = "id") 

# re-estimate the correlation martix with the residuals
all_resids %>% 
  select(-id) %>% 
  correlation()
Parameter1 Parameter2 r CI CI_low CI_high t df_error p Method n_Obs
afro_r attract_r 0.1987805 0.95 0.0670672 0.3236842 2.9671194 214 0.0100437 Pearson correlation 216
afro_r babyface_r -0.0169016 0.95 -0.1500560 0.1168550 -0.2472849 214 0.8049248 Pearson correlation 216
attract_r babyface_r 0.0803706 0.95 -0.0536985 0.2115934 1.1795367 214 0.4789883 Pearson correlation 216

Perform all needed data wrangling

Make sure all predictors have a meaningful 0

As you have learned this semester, the intercept of a regression model is interpreted as the predicted value of the outcome when all predictors are 0. Sometimes, we have predictors that don’t have a meaningful 0 point. For example, in the Blair study – afro, attract, and babyface all do not have a meaningful 0. That is, they all three range from 1 to 9. We can easily remedy this situation by re-centering these variables. Moreover, by centering all of your predictors at the mean then the intercept represents the predicted score when all variables are at their average in the sample. This can be very useful for a variety of purposes, some of which you’ll see later on.

Let’s center all of the variables at the mean, to do this we simply subtract the mean. In this way, a score of 0 for centered variables represents the mean score – which is, of course, meaningful since the mean surely represents a score that is in the observed range of data in the sample. Note that the seriousness of offense variables were already mean centered, as we accomplished this when we created the quadratic variables.

Besides centering, there is one additional thing that we need to do, and that is to create a different type of indicator for race. In the third paragraph of the results section, the authors indicated that they coded a variable to represent race as -1 if the individual was White and + 1 if the individual was Black. This is a slightly different coding scheme than the dummy variable approach we studied in Module 6. It’s essentially the same, but the regression coefficient associated with the variable called black will represent half the expected difference in the outcome for Black compared to White individuals. This seemingly strange coding method was used because later in their modeling strategy, the authors interact the race variable with the Afrocentric features variable. The coding style that the authors chose has some advantages when estimating interaction terms, you can read more here if interested.

df <- df %>% 
  mutate(nsecond_m = nsecond - mean(nsecond),
         nprior_m = nprior - mean(nprior),
         afro_m = afro - mean(afro),
         babyface_m = babyface - mean(babyface),
         attract_m = attract - mean(attract))  %>% 
  mutate(black_m = case_when(black == 0 ~ -1, black == 1 ~ 1))

Create final dataframe for analysis

Now, we have all of our variables coded properly and centered, and we can fit the models. I will create a final version of the dataframe that includes just the variables we need to fit our models.

sentence <- df %>% 
  select(id, 
         lnyears,
         primlev_m, primlev2, seclev_m, seclev2, nsecond_m, priorlev_m, priorlev2, nprior_m,
         black_m,
         afro_m,
         attract_m, babyface_m)

sentence %>% skim()
Data summary
Name Piped data
Number of rows 216
Number of columns 14
_______________________
Column type frequency:
numeric 14
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
id 0 1 108.50 62.50 1.00 54.75 108.50 162.25 216.00 ▇▇▇▇▇
lnyears 0 1 1.13 1.06 -0.87 0.44 0.98 1.54 4.60 ▃▇▅▁▁
primlev_m 0 1 0.00 2.07 -5.55 -1.55 0.45 1.45 4.45 ▁▇▇▆▁
primlev2 0 1 4.28 4.99 0.20 0.30 2.41 6.00 30.81 ▇▂▁▁▁
seclev_m 0 1 0.00 2.57 -3.40 -3.40 0.25 1.60 5.60 ▇▆▇▃▂
seclev2 0 1 6.59 7.37 0.00 0.36 2.56 11.57 31.35 ▇▅▁▁▁
nsecond_m 0 1 0.00 3.82 -2.41 -1.41 -1.41 0.59 38.59 ▇▁▁▁▁
priorlev_m 0 1 0.00 2.29 -1.43 -1.43 -1.43 1.57 7.57 ▇▂▂▁▁
priorlev2 0 1 5.23 8.66 0.16 2.05 2.05 2.46 57.30 ▇▁▁▁▁
nprior_m 0 1 0.00 1.90 -0.95 -0.95 -0.95 0.05 12.05 ▇▁▁▁▁
black_m 0 1 -0.07 1.00 -1.00 -1.00 -1.00 1.00 1.00 ▇▁▁▁▇
afro_m 0 1 0.00 1.77 -3.04 -1.70 -0.06 1.56 3.33 ▇▇▆▇▆
attract_m 0 1 0.00 0.91 -1.78 -0.64 -0.05 0.50 3.31 ▃▇▅▁▁
babyface_m 0 1 0.00 1.10 -2.38 -0.86 -0.12 0.78 3.05 ▂▇▇▅▁

Fit the 5 models

As you discovered in the last Apply and Practice Activity, Blair and colleagues outlined five models to examine their hypotheses. Let’s fit each of those now.

Model 1

For Model 1, the authors regress lnyears on all of the criminal record variables, the authors describe this step by stating:

Our first analysis used multiple regression to determine the degree to which sentence length was influenced by only those factors that should lawfully predict sentencing…”

mod1 <- sentence %>% 
  lm(lnyears ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m, data = .)

mod1 %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 0.798 0.097 8.225 0.000 0.607 0.989
primlev_m 0.295 0.028 10.413 0.000 0.239 0.351
primlev2 0.039 0.010 3.790 0.000 0.019 0.059
seclev_m 0.036 0.021 1.710 0.089 -0.006 0.078
seclev2 0.021 0.008 2.619 0.009 0.005 0.037
nsecond_m 0.058 0.014 4.244 0.000 0.031 0.086
priorlev_m -0.015 0.056 -0.269 0.788 -0.125 0.095
priorlev2 0.005 0.012 0.372 0.710 -0.020 0.029
nprior_m 0.022 0.036 0.615 0.539 -0.049 0.093
mod1 %>% get_regression_summaries()
r_squared adj_r_squared mse rmse sigma statistic p_value df nobs
0.576 0.559 0.4761154 0.6900112 0.705 35.131 0 8 216

I’ve outputted the results of the regression model – including the parameter estimates for each slope and the overall model summary. There is no need to interpret the slope coefficients for the control variables as these are not of substantive interest to the hypotheses of the paper – they are simply in the model to control for all of these factors that should lawfully predict sentence length before looking at our effects of interest. However, it is of interest to take a look at the \(R^2\) presented in the get_regression_summaries() output. It is labeled r_squared in the table above. The \(R^2\) is .576 – which indicates nearly 57% of the variability in (the natural log of) sentence length can be explained by these criminal record variables. The authors describe this model by stating:

The results of the analysis showed, as expected, that criminal record accounted for a substantial amount of the variance (57%) in sentence length.

Model 2

In Model 2, the authors add race as an additional predictor. They describe the rational for this model by stating:

We turn next to the question of race differences in sentencing. We estimated a second model (Model 2) in which inmate race (-1 if White, +1 if Black) was entered as a predictor along with the variables from the previous model.

Thus, the null hypothesis is that there is no difference in sentence length between Black and White inmates holding constant criminal record, and the alternative hypothesis is that there is a difference in sentence length between Black and White inmates.

The df for testing this effect is 206, calculated as n - 1 - the number of predictors, or 216 - 1 - 9 = 206. Using the technique from Module 11, we can calculate the critical values of t for alpha of .05 (two-tailed), and thus our rejection region, as follows:

qt(p = c(.025, .975), df = 206)
## [1] -1.971547  1.971547

Therefore, the boundaries for our rejection region for Model 2 are -1.97 and 1.97, if our test statistic is outside of this range (i.e., less than -1.97 or greater than 1.97) then we can reject the null hypothesis. In this case the null hypothesis is that there is no difference in sentence length between Black and White inmates, holding constant criminal record.

Now, let’s fit the model to test the hypothesis.

mod2 <- sentence %>% 
  lm(lnyears ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m + 
       black_m, 
     data = .)

mod2 %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 0.793 0.097 8.162 0.000 0.602 0.985
primlev_m 0.299 0.029 10.427 0.000 0.242 0.355
primlev2 0.039 0.010 3.804 0.000 0.019 0.059
seclev_m 0.038 0.021 1.779 0.077 -0.004 0.079
seclev2 0.021 0.008 2.584 0.010 0.005 0.037
nsecond_m 0.057 0.014 4.128 0.000 0.030 0.084
priorlev_m -0.016 0.056 -0.282 0.779 -0.125 0.094
priorlev2 0.005 0.012 0.407 0.685 -0.019 0.030
nprior_m 0.024 0.036 0.671 0.503 -0.047 0.095
black_m -0.044 0.050 -0.884 0.377 -0.141 0.054
mod2 %>% get_regression_summaries()
r_squared adj_r_squared mse rmse sigma statistic p_value df nobs
0.577 0.559 0.4743143 0.6887048 0.705 31.282 0 9 216

The regression coefficient for black is -.044. Because of the coding of the predictor(-1 for White, +1 for Black), we need to perform a transformation to be able to interpret this score as the difference in lnyears for Black versus White inmates. One way to interpret this is to calculate the predicted value of lnyears for White and Black inmates, holding constant all of the criminal record variables at their mean. This simplifies the calculation because all of the criminal record regression slopes are multiplied by zero (since by centering the predictors we shifted the mean to 0) and thus cancel out – then we’re left simply with:

\(\hat{y}\) = intercept + (slope_for_black \(\times\) black) that is, \(\hat{y}\) = .793 + (-.044 \(\times\) black).

For White inmates: \(\hat{y} = .793 + (-.044\times-1) = .837\)

For Black inmates: \(\hat{y} = .793 + (-.044\times1) = .749\)

Notice that the difference between these two scores is -.088 (Black - White), which is twice the slope for black_m from our regression model results table (\(-.044\times2\)). Had we fit the regression model using the dummy code approach (where black = 1 for Black inmates, and black = 0 for White inmates), then the regression slope would equal -.088. This value represents the expected difference in sentence length in log years between Black and White inmates, holding constant criminal record.

We can back transform these predicted values from ln(years) to years by exponentiating them:

# predicted years for White inmates
exp(.837)
## [1] 2.309428
# predicted years for Black inmates
exp(.749)
## [1] 2.114884

Therefore, holding constant all criminal record variables at the mean in the sample, our model predicts White convicted offenders will receive 2.3 years in prison and Black convicted offenders will receive 2.1 years.

The test statistic for the regression slope for black_m is -.884 and the p-value is .377. The test statistic is not in the rejection region, thus, we cannot reject the null hypothesis – in other words, we do not have evidence of a difference in sentence length by race. These estimates are just slightly different from what Blair and colleagues report in their paper (they report a regression slope of .90 and a p-value of .37).

It’s also of interest to compare the change in \(R^2\) with the addition of race. In Model 1 the \(R^2\) = .576, and in Model 2 the \(R^2\) = .577 – so, essentially the \(R^2\) is unchanged, and this corresponds with the non-significant regression slope for black_m.

The authors describe their results as follows:

The results of this analysis were consistent with the findings of Florida’s Race Neutrality in Sentencing report (Bales, 1997): The race of the offender did not account for a significant amount of variance in sentence length over and above the effects of seriousness and number of offenses, t(206) = 0.90, p = .37, proportional reduction in error (PRE) = .00.

The PRE in the passage above refers to how much the unexplained variance in the model (i.e., the residual or error) was reduced with the addition of race (i.e., how much error was reduced in Model 2 compared to Model 1). We haven’t studied how to compare two nested models (i.e., where one model is a subset of another model) in this course. If you are interested, I will include code below to calculate this using the R objects that hold the regression model results (mod1 and mod2). This exercise demonstates that there is no reduction in error when black_m is added to the model.

# model 1 error
mod1_error <- sum(mod1$residuals^2) 

# model 2 error
mod2_error <- sum(mod2$residuals^2) 

PRE = (mod1_error - mod2_error)/mod1_error

PRE
## [1] 0.003783074

Model 3

In Model 3, the authors test their primary hypothesis of interest, that is, they determine if Afrocentric features predicts sentence length, holding constant race and all of the criminal record variables. The authors describe this step as follows:

In a third model, we added the degree to which the inmates manifested Afrocentric features as a predictor of sentence length, controlling for the race of the inmates and the seriousness and number of offenses they had committed.

Here, the null hypothesis is that, holding constant criminal record and race, Afrocentric features is not related to sentence length. The alternative hypothesis is that, holding constant criminal record and race, Afrocentric features is related to sentence length.

Let’s calculate the rejection region for this hypothesis test. Because we’ll add an additional predictor, the df for testing this effect is 205 (we lose one additional df as compared to Model 2; n - 1 - 10 predictors).

qt(p = c(.025, .975), df = 205)
## [1] -1.971603  1.971603

If the test statistic for the afro_m slope is outside of this range (i.e., in the rejection region), then we’ll reject the null hypothesis.

Let’s fit the model.

mod3 <- sentence %>% 
  lm(lnyears ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m + 
       black_m + afro_m, 
     data = .)

mod3 %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 0.787 0.096 8.181 0.000 0.597 0.977
primlev_m 0.294 0.028 10.357 0.000 0.238 0.351
primlev2 0.039 0.010 3.831 0.000 0.019 0.059
seclev_m 0.036 0.021 1.725 0.086 -0.005 0.077
seclev2 0.021 0.008 2.676 0.008 0.006 0.037
nsecond_m 0.058 0.014 4.233 0.000 0.031 0.085
priorlev_m -0.012 0.055 -0.221 0.825 -0.121 0.097
priorlev2 0.004 0.012 0.325 0.746 -0.020 0.028
nprior_m 0.021 0.036 0.588 0.557 -0.049 0.091
black_m -0.161 0.071 -2.280 0.024 -0.300 -0.022
afro_m 0.092 0.040 2.306 0.022 0.013 0.171
mod3 %>% get_regression_summaries()
r_squared adj_r_squared mse rmse sigma statistic p_value df nobs
0.588 0.568 0.462319 0.6799404 0.698 29.276 0 10 216

Take a look at the coefficient for Afrocentric features (afro_m), the slope estimate is .092. The test statistic is 2.306 and the p-value is .022. The test statistic is in the rejection region (i.e., > 1.97), and therefore we reject the null hypothesis. In other words, we have evidence that Afrocentric features is related to sentence length in years. Notice that the \(R^2\) for Model 3 is .588. This represents an increase in \(R^2\) of .011 over Model 2 (.588 - .577).

If interested, we can also calculate the PRE using the same approach as earlier with the code below. Essentially, there is about a 2.5% reduction in error when Afrocentric features is added to the model.

# model 2 error
mod2_error <- sum(mod2$residuals^2) 

# model 3 error
mod3_error <- sum(mod3$residuals^2) 

PRE = (mod2_error - mod3_error)/mod2_error

PRE
## [1] 0.02528978

The authors describe the effect as follows (notice that their values are just slightly different than ours here):

This analysis showed that Afrocentric features were a significant predictor of sentence length over and above the effects of the other factors, t(205) = 2.29, p < .025, PRE = .02.

The authors go on to revisit the effect of race, holding constant Afrocentric features. Notice that the test statistic for this slope (labeled black_m) is -2.280, which is in the rejection region (i.e., < -1.97 as calculated by the rejection region we calculated for Model 3) and the p-value is therefore less than alpha (i.e., p = .024) – constituting evidence of a significant effect of race. The slope estimate is negative (-.161), indicating that Black inmates are given shorter sentences than White inmates holding constant criminal record and Afrocentric features. As we did for Model 2, we can calculate the predicted sentence lengths for Black and White inmates, holding constant all other predictors (including Afrocentric features) at the mean in the sample. This time I will automate this process:

y_hat_white = .787 + (-.161*-1)
y_hat_white
## [1] 0.948
exp(y_hat_white)
## [1] 2.580543
y_hat_black = .787 + (-.161*1)
y_hat_black
## [1] 0.626
exp(y_hat_black)
## [1] 1.870115

Holding constant all predictors at the mean in the sample, our model predicts White inmates to have a sentence length of 2.6 years and Black inmates to have a sentence length of 1.9 years. The authors describe the effect as follows:

… with Afrocentric features in the model, race significantly predicted sentence length, t(205) = 2.28, p < .025, but in the direction opposite to what one might expect — with White inmates serving longer sentences than Black inmates.

Model 4

Models 4 and 5 constitute a type of sensitivity analysis. Here, the authors test how sensitive Model 3 (the primary model of interest) is to model assumptions and potential alternative explanations. Specifically in Model 4, the authors seek to determine if Model 3’s assumption that the effect of Afrocentric features on sentence length is the same for White and Black inmates is tenable. That is, Model 3 asserts a parallel slopes model – postulating that a higher score on Afrocentric features is related to a greater sentence length for both Black and White inmates. Recall from Chapter 6 of your text book, the assumption of parallel slopes can be relaxed by including an interaction term – here, we can add an interaction between black_m and afro_m to determine if the slopes differ.

The authors explain their rationale as follows:

In a fourth model, we examined whether the impact of Afrocentric features was the same for Black and White inmates by testing the interaction between Afrocentric features and race.

By adding an interaction term to our model, we lose an additional df; therefore, we need to calculate the new rejection region:

qt(p = c(.025, .975), df = 204)
## [1] -1.971661  1.971661

Now, let’s fit the model.

mod4 <- sentence %>% 
  lm(lnyears ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m +
       black_m*afro_m, 
     data = .)

mod4 %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 0.807 0.114 7.057 0.000 0.581 1.032
primlev_m 0.293 0.029 10.166 0.000 0.236 0.350
primlev2 0.039 0.010 3.836 0.000 0.019 0.059
seclev_m 0.036 0.021 1.733 0.085 -0.005 0.078
seclev2 0.021 0.008 2.621 0.009 0.005 0.037
nsecond_m 0.058 0.014 4.198 0.000 0.031 0.085
priorlev_m -0.012 0.055 -0.220 0.826 -0.121 0.097
priorlev2 0.004 0.012 0.304 0.762 -0.021 0.028
nprior_m 0.021 0.036 0.586 0.559 -0.049 0.091
black_m -0.156 0.073 -2.143 0.033 -0.299 -0.012
afro_m 0.090 0.041 2.191 0.030 0.009 0.170
black_m:afro_m -0.014 0.042 -0.321 0.748 -0.097 0.070

The test statistic for the interaction term (labeled black:afro_m) is -.321, and the p-value is .748. The test statistic is not in the rejection region, thus there is not evidence that different slopes for the effect of Afrocentric features on sentence length are needed. That is, the parellel slopes model appears to be sufficient. The authors describe this as follows:

This interaction did not approach significance (p > .70), thus suggesting that the plotted lines in Figure 1 really are parallel: The effects of Afrocentric features on residual sentence length within the two racial groups were statistically equivalent.

Model 5

In the final model, the authors consider two alternative variables that could explain away the effect of Afrocentric features – attractiveness and babyish features.

In this model, 12 predictors are included, leaving 203 df (216 - 1 - 12) and so the rejection region is calculated as:

qt(p = c(.025, .975), df = 203)
## [1] -1.971719  1.971719
mod5 <- sentence %>% 
  lm(lnyears ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m +
       black_m + 
       afro_m +  
       attract_m +
       babyface_m, 
     data = .)

mod5 %>% get_regression_table()
term estimate std_error statistic p_value lower_ci upper_ci
intercept 0.788 0.097 8.152 0.000 0.597 0.978
primlev_m 0.291 0.029 10.085 0.000 0.234 0.348
primlev2 0.039 0.010 3.776 0.000 0.018 0.059
seclev_m 0.036 0.021 1.732 0.085 -0.005 0.078
seclev2 0.022 0.008 2.677 0.008 0.006 0.038
nsecond_m 0.059 0.014 4.256 0.000 0.031 0.086
priorlev_m -0.012 0.055 -0.212 0.833 -0.121 0.097
priorlev2 0.004 0.012 0.321 0.748 -0.020 0.028
nprior_m 0.022 0.036 0.604 0.547 -0.049 0.092
black_m -0.163 0.071 -2.304 0.022 -0.303 -0.024
afro_m 0.096 0.041 2.328 0.021 0.015 0.177
attract_m -0.017 0.055 -0.309 0.758 -0.126 0.092
babyface_m 0.033 0.044 0.743 0.458 -0.055 0.121

The test statistics for the two alternative variables (attractiveness and babyish features) are not in the rejection region (test statistic = -.309 for attract_m and .743 for babyface_m); therefore, we do not find evidence that these variables are related to sentence length when holding all other variables constant. Moreover, the effect of Afrocentric features, remains significantly related to sentence length (i.e., the test statistic (2.328) is in the rejection region) even after we account for attractiveness and babyish features. This provides evidence that these alternative variables cannot explain away the effect of Afrocentric features.

Produce figures and tables of the results

As a last step, let’s create tables and figures to describe the results.

First, we can produce a figure that presents the results from the primary model (Model 3). In the Blair and colleagues paper, they use a table, here we will look at a dot and whisker plot of regression estimates using the plot_model() function from the sjPlot package. This is a great way to show results in a way that depicts both the point estimate and the confidence interval for each effect. If the confidence interval does not intersect the vertical line at 0 on the x-axis (denoting a null effect), then the effect is statistically significant (i.e., the null hypothesis for the effect can be rejected, in other words, 0 is not a plausible value for the effect).


Figure 1
Regression estimates for prediction of sentence length in Model 3.

plot_model(mod3, ci.lvl = .95, show.values = TRUE, show.p = FALSE, 
            axis.title = "Regression estimate and 95% confidence interval") 

Note. 95% confidence intervals that do not cross 0 are statistically significant (p<.05). Colors denote direction of effect, with red lines denoting predictors associated with a smaller sentence length and blue lines denoting predictors with a larger sentence length.


If you prefer the more traditional table, the tab_model() function from sjPlot can also create the table for you.


tab_model(mod1, mod3, 
          title = "Regression estimate and 95% confidence interval (CI) for Models 1 and 3") 
Regression estimate and 95% confidence interval (CI) for Models 1 and 3
  lnyears lnyears
Predictors Estimates CI p Estimates CI p
(Intercept) 0.80 0.61 – 0.99 <0.001 0.79 0.60 – 0.98 <0.001
primlev m 0.30 0.24 – 0.35 <0.001 0.29 0.24 – 0.35 <0.001
primlev2 0.04 0.02 – 0.06 <0.001 0.04 0.02 – 0.06 <0.001
seclev m 0.04 -0.01 – 0.08 0.089 0.04 -0.01 – 0.08 0.086
seclev2 0.02 0.01 – 0.04 0.009 0.02 0.01 – 0.04 0.008
nsecond m 0.06 0.03 – 0.09 <0.001 0.06 0.03 – 0.09 <0.001
priorlev m -0.01 -0.12 – 0.09 0.788 -0.01 -0.12 – 0.10 0.825
priorlev2 0.00 -0.02 – 0.03 0.710 0.00 -0.02 – 0.03 0.746
nprior m 0.02 -0.05 – 0.09 0.539 0.02 -0.05 – 0.09 0.557
black m -0.16 -0.30 – -0.02 0.024
afro m 0.09 0.01 – 0.17 0.022
Observations 216 216
R2 / R2 adjusted 0.576 / 0.559 0.588 / 0.568

Last, Blair and colleagues include a figure that presents the results of Models 2 (i.e., the model with race) and 3 (i.e., the model with race and Afrocentric features). We can create a similar figure using ggplot. This is somewhat advanced, and so if you aren’t ready to digest it or aren’t interested you can ignore the code. However, if you are interested, you can take a look at the code needed to reproduce the plot.


Figure 2
Model fitted results for Models 2 and 3

# create residual for lnyears
r.lnyears <- sentence %>% 
  lm(lnyears ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m, data = .) %>% 
  get_regression_points(ID = "id") %>% 
  select(id, residual) %>% 
  rename(resid.lnyears = residual) 

# create residual for afrocentric feautres
r.afro <- sentence %>% 
  lm(afro_m ~ primlev_m + primlev2 + seclev_m + seclev2 + nsecond_m + priorlev_m + priorlev2 + nprior_m, data = .) %>% 
  get_regression_points(ID = "id") %>% 
  select(id, residual) %>% 
  rename(resid.afro = residual) 

# merge residuals together
plot_df <- sentence %>% 
  left_join(r.lnyears, by = "id") %>% 
  left_join(r.afro, by = "id") %>% 
  mutate(black.f = factor(black_m, levels = c(-1, 1), labels = c("White", "Black")))

# get hex codes for chosen palette
# RColorBrewer::brewer.pal(3, "Dark2")

# create plot
plot_df %>% 
  ggplot(aes(x = resid.afro, y = resid.lnyears)) +
  geom_point(aes(color = black.f)) +
  geom_hline(yintercept = -.044*-1, color = "#1B9E77", linetype = "dashed") + # line for whites in model 2
  geom_hline(yintercept = -.044*1, color = "#D95F02", linetype = "dashed") + # line for blacks in model 2
  geom_abline(intercept = -.161*-1, slope = .092, color = "#1B9E77") + # line for whites in model 3
  geom_abline(intercept = -.161*1, slope = .092, color = "#D95F02") + # line for blacks in model 3
  theme_bw() +
  labs(x = "Afrocentric features residual", y = "Total prison time residual",
       color = "Race") +
  scale_color_brewer(palette = "Dark2")

Note. Residualized sentence length and residualized Afrocentric features for Black (in orange) and White (in green) inmates. Dashed lines denote difference in sentence length between Black and White inmates. Solid lines denote fitted regression line for effect of Afrocentric features on sentence length from Model 3.


In this plot, the vertical difference in the dashed lines represents the differences in log sentence length between White and Black inmates adjusting for criminal record, but not adjusting for Afrocentric features. The difference in the solid lines represent the difference in log sentence length between White and Black inmates adjusting for criminal record and Afrocentric features. The solid lines representing the effect of Afrocentric features are parallel (i.e., a parallel slopes model), emphasizing that posessing more Afrocentric features is associated with a longer sentence length for both White and Black offenders.

Concluding remarks

The authors summarize the results as follows:

The results we have reported confirm both earlier research on the role of race in sentencing and our own work on stereotyping. As found previously, we observed little effect of race on sentencing in Florida: Black and White offenders, given equivalent criminal histories, were given roughly equivalent sentences. We suggest that the state’s efforts to ensure race neutrality in sentencing over the past 20 years have largely been successful. Our results are also consistent with the psychological literature showing that people can effectively reduce category-based stereotyping (Blair et al., in press; Wyer et al., 1998, 2000); it appears that judges have effectively learned to give sentences of the same length when Black and White offenders with equivalent criminal histories come before them. However, Afrocentric facial features were associated with sentence length, such that offenders who had equivalent criminal histories and came from the same racial group (Black or White) were given longer sentences the more Afrocentric their features. These findings are consistent with the results of our laboratory research showing that people use Afrocentric features to infer traits that are stereotypic of African Americans. It is important to remember that this form of stereotyping appears to occur without people’s awareness and outside their immediate control (Blair et al., 2002, in press). We suspect that, like our laboratory participants, judges were unaware of the fact that Afrocentric features might be influencing their decisions and were not effectively controlling the impact of such features.

The authors go on to say:

How large were the effects of Afrocentric features? One way to calibrate them is to derive predicted sentence lengths (for the mean levels of the criminal-history variables) for individuals within each race who were 1 standard deviation above and below the mean level of Afrocentric features for their racial group. These calculations indicate that individuals 1 standard deviation above their group mean would receive sentences 7 to 8 months longer than individuals 1 standard deviation below their group mean (for the same typical criminal record). This is clearly a meaningful difference.

Rather than assessing the difference in years at just specific values of Afrocentric features (e.g., at 1 standard deviation below and above the mean as Blair and colleagues did), let’s create a graph that displays the predicted sentence length across the range of Afrocentric features for White and Black inmates. Some of the following code is new and a bit advanced, so if you aren’t interested or ready to dive deeper, you can ignore the code and just take a look at the resultant graphs.

First, I will create a prediction grid that allows afro_m to range from the lowest value to the highest value within each race (producing 10 scores between the lowest and highest value), and holds all prior record variables at the mean. The data_grid() function from the modelr package (part of the tidyverse) is useful here. Then, using this prediction grid that contains a series of protypical scores for the predictors, I will use our mod3 regression model to obtain the predicted lnyears (i.e., y-hat) for each set of prototypical scores. I will use the add_predictions() function to accomplish this part – which is also part of the tidyverse and works similar to get_regression_points() in moderndive.

pred_grid <- 
  sentence %>% 
  group_by(black_m) %>% 
  data_grid(.model = mod3, 
            afro_m = seq_range(afro_m, 10),
            primlev_m = 0, primlev2 = 0, 
            seclev_m = 0, seclev2 = 0, nsecond_m = 0, 
            priorlev_m = 0, priorlev2 = 0, nprior_m = 0) %>% 
  ungroup() %>% 
  mutate(black.f = factor(black_m, levels = c(-1, 1), labels = c("White", "Black"))) %>% 
  add_predictions(mod3) %>% 
  rename(lnyears_hat = pred)

pred_grid
black_m afro_m primlev_m primlev2 seclev_m seclev2 nsecond_m priorlev_m priorlev2 nprior_m black.f lnyears_hat
-1 -3.0437211 0 0 0 0 0 0 0 0 White 0.6674528
-1 -2.4373711 0 0 0 0 0 0 0 0 White 0.7233644
-1 -1.8310211 0 0 0 0 0 0 0 0 White 0.7792759
-1 -1.2246711 0 0 0 0 0 0 0 0 White 0.8351875
-1 -0.6183211 0 0 0 0 0 0 0 0 White 0.8910991
-1 -0.0119711 0 0 0 0 0 0 0 0 White 0.9470107
-1 0.5943789 0 0 0 0 0 0 0 0 White 1.0029223
-1 1.2007289 0 0 0 0 0 0 0 0 White 1.0588339
-1 1.8070789 0 0 0 0 0 0 0 0 White 1.1147454
-1 2.4134289 0 0 0 0 0 0 0 0 White 1.1706570
1 -2.2151411 0 0 0 0 0 0 0 0 Black 0.4219884
1 -1.5992689 0 0 0 0 0 0 0 0 Black 0.4787780
1 -0.9833967 0 0 0 0 0 0 0 0 Black 0.5355677
1 -0.3675244 0 0 0 0 0 0 0 0 Black 0.5923573
1 0.2483478 0 0 0 0 0 0 0 0 Black 0.6491469
1 0.8642200 0 0 0 0 0 0 0 0 Black 0.7059365
1 1.4800922 0 0 0 0 0 0 0 0 Black 0.7627262
1 2.0959644 0 0 0 0 0 0 0 0 Black 0.8195158
1 2.7118367 0 0 0 0 0 0 0 0 Black 0.8763054
1 3.3277089 0 0 0 0 0 0 0 0 Black 0.9330950

Then, we can use the prediction grid and fitted y-hats to create a plot:

pred_grid %>% 
  ggplot(aes(x = afro_m, y = lnyears_hat, group = black.f, color = black.f)) +
  geom_line() +
  theme_bw() +
  labs(title = "Predicted log sentence length by Afrocentric features for White and Black inmates",
       subtitle = "All criminal record variables are held at the mean in the sample",
       color = "Race",
       x = "Afrocentric features (centered at the mean)",
       y = "Predicted sentence length in ln(years)")



Rather than presenting the predicted scores for log sentence length, it is probably more useful for a viewer of our graph to exponentiate the predicted scores in ln(years) to put them back on the metric of actual years. Or better yet, we can convert years to months. We can accomplish this with a small modification to our code (notice the two mutate steps):

pred_grid %>% 
  mutate(years_hat = exp(lnyears_hat)) %>% 
  mutate(months_hat = years_hat*12) %>% 
  ggplot(aes(x = afro_m, months_hat, group = black.f, color = black.f)) +
  geom_line() +
  theme_bw() +
  scale_y_continuous(breaks = seq(0, 48, 2)) +
  scale_x_continuous(breaks = seq(-3, 3, 1)) +
  labs(title = "Predicted sentence length by Afrocentric features for White and Black inmates",
       subtitle = "All criminal record variables are held at the mean in the sample",
       color = "Race",
       x = "Afrocentric features (centered at the mean)",
       y = "Predicted sentence length in months")



This graph allows us to see how sentence length varied for White and Black offenders depending on their Afrocentric features, holding constant all criminal record variables at the mean in the sample. For example, let’s consider Black offenders (the blue line). If a Black offender had a low score for Afrocentric features (e.g., 2 units below the mean), our model predicts they will receive a sentence length of about 19 months. But, if they had a high score for Afrocentric features (e.g., 2 units above the mean), our model predicts they will receive a sentence length of about 27 months. Use the graph to find these points along the blue line to help yourself understand. Similar results are observed for White offenders. If a White offender had a low score for Afrocentric features (e.g., 2 units below the mean), our model predicts they will receive a sentence length of about 26 months. But, if they had a high score for Afrocentric features (e.g., 2 units above the mean), our model predicts they will receive a sentence length of about 37 months.

In concluding the paper, the authors provide an excellent commentary on how the results of their study impact society:

Taking the results as a whole, some readers might be tempted to say that the picture is fairly positive. Race is not being used in sentencing decisions, and, if anything, the minority group is coming out ahead (i.e., when Afrocentric features are equated). But such a conclusion is a serious misinterpretation of our results. Racial stereotyping in sentencing decisions is still going on, but it is not a function of the racial category of the individual. Instead, there is perhaps an equally pernicious and less controllable process at work. The racial stereotyping in sentencing that is now occurring is based on the facial appearance of offenders. Be they White or Black, offenders who possess more Afrocentric features are receiving harsher sentences for the same crimes, compared with less Afrocentric-looking offenders. Our research shows that addressing one form of bias does not guarantee that the other will also be eliminated. Both must be considered to achieve a fair and equitable society.

I think this paper by Irene Blair and colleagues is an excellent example of how to tell your story with data. They conducted a study that addressed a meaningful research question, and their work was effective in uncovering a hidden bias in the judicial system.

Check model adequacy

In Module 11, I briefly introduced techniques to check model assumptions and model adequacy. Recall that a linear regression model makes four primary assumptions:

  1. The relationship between the predictor(s) (x) and the outcome (y) is assumed to be linear. That is, a straight line drawn through the scatterplot provides a good representation of the relationship.

  2. The residuals resulting from the fitted model are assumed to be normally distributed.

  3. The residuals resulting from the fitted model are assumed to have a constant variance – also called homoscedasticity.

  4. The residuals are independent of one another.

I also mentioned that evaluating model fit and any potential violation of assumptions takes a good deal of practice and working alongside a mentor to begin is necessary. To further expose you to these steps, let’s take a look at the model assumptions figure produced by the check_model() function in the performace package. We’ll do this for Model 3, the primary model of interest.

mod3 %>% check_model()

There are 6 panels in the output.

The bottom right panel shows diagnostics for the assumption that the residuals are normally distributed. We want our model residuals to follow the shape of the green line. If they do, then this indicates that the residuals are normally distributed. This plot looks pretty good – the residuals (i.e, the dots) follow quite closely with the green line. In the top left panel, we need to compare the density of the observed outcome scores (lnyears) in green with the model-fitted scores in blue. We hope that the blue lines will resemble the green line – and here we see that is the case.

The middle left panel show diagnostics for homogeneity of variance. As we scan across the fitted values (i.e., y-hat or the predicted values), we hope to see that the residuals are evenly distributed above and below the horizontal line – across the whole distribution of the fitted values. For our model, this plot also look pretty good. A problem would be apparent if the data points were in the form of a funnel or some other distinct shape other than a rectangle.

The top right panel can be used to check the linear relationship assumption. A horizontal green line over the dashed horizontal line at 0, is what we’re hoping for if the linearity assumption is met. If the green line is curved, then we may have a problem with our model assumptions. There does seem to be a curve to the green line in this model; however, there are few points at the high end of the fitted values (i.e., towards the right of the plot) and the areas where most of the data points exist are in the area where the line is flat and straight. The linearity assumption is likely met in these data.

The middle right panel checks for influential observations. On occasion, one or more cases (e.g., people) in your data will have very unusual scores. These cases can have a very large influence (i.e., leverage) on the regression model estimates. We should investigate these sorts of cases to determine if there is a mistake in their scores. The plot considers two measures – leverage and residuals. In these data inmate 204 has a very large leverage score. Let’s take a look at their scores:

df %>% filter(row_number() == 204)
id years black afro primlev seclev nsecond anysec priorlev nprior anyprior attract babyface lnyears black.f primlev_m primlev2 seclev_m seclev2 priorlev_m priorlev2 nsecond_m nprior_m afro_m babyface_m attract_m black_m
204 9.42 0 3.20588 5 5.33 41 1 0 0 0 2.94118 3.32353 2.242835 White -1.550926 2.405371 1.929088 3.72138 -1.430278 2.045695 38.59259 -0.9490741 -1.323551 -0.7141646 -0.2663515 -1

Scanning through this person’s data, we see that they have an extremely large number of secondary offenses (i.e., the variable called nsecond, it equals 41 for this individual meaning 41 secondary offenses were committed) – the next highest score for nsecond in the dataframe is 18. If we had access to the records, we could verify if this score is correct, or if perhaps it is an error. If it is indeed correct, we could log transform this variable (like we did for sentence length) and refit the model to determine if the results change with this modification.

The bottom left plot presents the variance inflation factor (VIF) for each predictor – this is a measure of potential multicollinearity. If the predictors are too highly correlated, modeling issues can occur. In this plot, we’re hoping to find low (green) or moderate (blue) VIFs for each predictor. In the model we are checking, no predictors have a high VIF.

Essentially, at this stage of model fitting, we are trying to determine if the model fit is adequate, and if modifications to the model assumptions would change our conclusions. Surely, we don’t want to publish work unless we feel confident that the model is tenable and as accurate as possible. Therefore, it is important to carefully consider the assumptions of any model before sharing the results with your audience, and determine if they are likely met.

Wrap up

This concludes Module 12 – and in fact, all of the PSY350 content. Congratulations! I hope that you enjoyed learning about statistical modeling in Psychology, and that you now see the utility of Data Science and the potential role that you could play in this field.