In this vignette, we will perform a small simulation study to test whether the model fitting engines implemented in the thurstonianIRT package are able to recover known parameter values from a simulated data set. This also extends the automated unit tests implemented in the package to check the correctness of the software. For a more extensive simulation study using thurstonianIRT, see Bürkner, Schulte, and Holling (2019).
First, we will load a bunch of packages required in the vignette.
Next, we will set up the simulation condition.
Now, we will choose a set of true parameter values.
Finally, we put all of this together and simulate a data set based on the condion and true parameter values.
We know that the chosen simulation condition and parameter values are well behaved (this may not be the case in all situations; see Bürkner, Schulte, & Holling, 2019). Accordingly, the model fitting engines should show good parameter recovery given that they are correctly implemented. We will now go ahead and fit the model with all three engines.
We can now predict the trait scores of all persons and compare them to the true scores, which are stored in the simulated data set.
eta <- as_tibble(as.data.frame(attributes(sdata)$eta)) names(eta) <- paste0("trait", 1:ncol(eta)) true_scores <- eta %>% mutate(id = 1:n()) %>% gather(key = "trait", value = "truth", -id) true_summaries <- true_scores %>% group_by(trait) %>% summarise(true_mean = mean(truth), true_sd = sd(truth)) pred <- predict(fit_stan) %>% bind_rows(predict(fit_lavaan), predict(fit_mplus), .id = "source") %>% mutate( source = as.character(factor( source, levels = 1:3, labels = c("stan", "lavaan", "mplus") )), trait = tolower(trait) ) %>% inner_join(true_scores, by = c("id", "trait")) pred <- pred %>% inner_join( pred %>% group_by(trait, source) %>% summarise(cor_est_truth = cor(estimate, truth)), by = c("trait", "source") ) %>% mutate( sign = sign(cor_est_truth), estimate = ifelse(sign %in% -1, -estimate, estimate) ) %>% inner_join(true_summaries, by = "trait") %>% group_by(trait, source) %>% mutate( est_mean = mean(estimate), est_sd = sd(estimate) ) %>% ungroup() %>% mutate( ztruth = (truth - true_mean) / true_sd, zestimate = (estimate - est_mean) / est_sd )
Various measures of model predictive accuracy can be computed, for instance, the reliability. For the present simulation condition, we would expect the reliability to be roughly between 0.85 and 0.9 for all traits. By looking at the results below, we can verify that this is indeed that case.
# A tibble: 15 x 3 # Groups: trait  trait source rel <chr> <chr> <dbl> 1 trait1 lavaan 0.780 2 trait1 mplus 0.821 3 trait1 stan 0.821 4 trait2 lavaan 0.817 5 trait2 mplus 0.855 6 trait2 stan 0.857 7 trait3 lavaan 0.793 8 trait3 mplus 0.840 9 trait3 stan 0.840 10 trait4 lavaan 0.773 11 trait4 mplus 0.802 12 trait4 stan 0.803 13 trait5 lavaan 0.831 14 trait5 mplus 0.857 15 trait5 stan 0.858
Lastly, we can also compute and investigate the trait correlations between different engines so check whether they really estimate equivalent trait scores.
We would expect the correlations of the estimates of the same trait across engines to be very high, that is, higher than 0.95 at least. We can verify that this is indeed the case as exemplified for
stan_trait1 lavaan_trait1 mplus_trait1 stan_trait1 1.00 0.98 1.00 lavaan_trait1 0.98 1.00 0.97 mplus_trait1 1.00 0.97 1.00
Taken together, we have seen how to set up and perform a simple simulation study to test the parameter recovery of Thurstonian IRT models and, at the same time, test the correctness of the thurstonianIRT software.
Bürkner P. C., Schulte N., & Holling H. (2019). On the Statistical and Practical Limitations of Thurstonian IRT Models. Educational and Psychological Measurement. 79(5). 827–854.