**mecor** is an R package for Measurement Error CORrection. **mecor** implements measurement error correction methods for linear models with continuous outcomes. The measurement error can either occur in a continuous covariate or in the continuous outcome. This vignette discusses covariate measurement error correction by means of *standard regression calibration* in **mecor**.

*Regression calibration* is one of the most popular measurement error correction methods for covariate measurement error. This vignette shows how *standard regression calibration* is applied in an internal validation study, a replicates study, a calibration study and an external validation study. Each of the four studies will be introduced in the subsequent sections, along with examples of how *standard regression calibration* can be applied in each of the four studies using **mecor**’s function `mecor()`

. In all four studies, our interest lies in estimating the association between a continuous reference exposure \(X\) and a continuous outcome \(Y\), given covariates \(Z\). Instead of \(X\), the substitute error-prone exposure \(X^*\) is measured.

The simulated data set `icvs`

in **mecor** is an internal covariate-validation study. An internal validation study includes a subset of individuals of whom the reference exposure \(X\) is observed. The data set `icvs`

contains 1000 observations of the outcome \(Y\), the error-prone exposure \(X^*\) and the covariate \(Z\). The reference exposure \(X\) is observed in approximately 25% of the individuals in the study.

```
# load internal covariate validation study
data("icvs", package = "mecor")
head(icvs)
#> Y X_star Z X
#> 1 -3.473164 -0.2287010 -1.5858049 NA
#> 2 -3.327934 -1.3320494 -0.6077454 NA
#> 3 1.314735 2.0305727 0.4461727 2.2256377
#> 4 1.328727 0.3027101 0.1739813 NA
#> 5 1.240446 -0.8465389 1.5480392 -0.7521792
#> 6 3.183868 0.1081888 1.1230232 NA
```

When ignoring the measurement error in \(X^*\), one would naively regress \(X^*\) and \(Z\) on \(Y\). This results in a biased estimation of the exposure-outcome association:

```
# naive estimate of the exposure-outcome association
lm(Y ~ X_star + Z, data = icvs)
#>
#> Call:
#> lm(formula = Y ~ X_star + Z, data = icvs)
#>
#> Coefficients:
#> (Intercept) X_star Z
#> -0.03947 0.41372 2.08457
```

Alternatively, one could perform an analysis restricted to the internal validation set:

```
# analysis restricted to the internal validation set
lm(Y ~ X + Z, data = subset(icvs, !is.na(X)))
#>
#> Call:
#> lm(formula = Y ~ X + Z, data = subset(icvs, !is.na(X)))
#>
#> Coefficients:
#> (Intercept) X Z
#> -0.07297 0.44316 1.87832
```

Although the above would result in an unbiased estimation of the exposure-outcome association, approximately 75% of the data is thrown out. Instead of doing an analysis restricted to the internal validation set, you could use *standard regression calibration* to correct for the measurement error in \(X^*\). The following code chunk shows *standard regression calibration* with `mecor()`

:

```
mecor(formula = Y ~ MeasError(substitute = X_star, reference = X) + Z,
data = icvs,
method = "standard", # defaults to "standard"
B = 0) # defaults to 0
#>
#> Call:
#> mecor(formula = Y ~ MeasError(substitute = X_star, reference = X) +
#> Z, data = icvs, method = "standard", B = 0)
#>
#> Coefficients Corrected Model:
#> (Intercept) X Z
#> -0.03730093 0.49290832 2.00122724
#>
#> Coefficients Uncorrected Model:
#> (Intercept) X_star Z
#> -0.03946702 0.41371614 2.08457045
```

As shown in the above code chunk, the `mecor()`

function needs a *formula* argument, a *data* argument, a *method* argument and a *B* argument. Presumably, you are familiar with the structure of a formula in R. The only thing that’s different here is the use of a `MeasError()`

object in the formula. A `MeasError()`

object is used to declare the substitute measure, in our case \(X^*\), and the reference measure, in our case \(X\). The *B* argument of `mecor()`

is used to calculate bootstrap confidence intervals for the corrected coefficients of the model. Let us construct 95% confidence intervals using the bootstrap with 999 replicates:

```
# save corrected fit
rc_fit <-
mecor(formula = Y ~ MeasError(substitute = X_star, reference = X) + Z,
data = icvs,
method = "standard", # defaults to "standard"
B = 999) # defaults to 0
```

Print the confidence intervals to the console using `summary()`

:

```
summary(rc_fit)
#>
#> Call:
#> mecor(formula = Y ~ MeasError(substitute = X_star, reference = X) +
#> Z, data = icvs, method = "standard", B = 999)
#>
#> Coefficients Corrected Model:
#> Estimate SE SE (btstr)
#> (Intercept) -0.037301 0.035943 0.035086
#> X 0.492908 0.036959 0.037753
#> Z 2.001227 0.051999 0.053961
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI LCI (btstr) UCI (btstr)
#> (Intercept) -0.037301 -0.107748 0.033147 -0.103395 0.033592
#> X 0.492908 0.420470 0.565347 0.423385 0.571515
#> Z 2.001227 1.899311 2.103143 1.889610 2.104471
#> Bootstrap Confidence Intervals are based on 999 bootstrap replicates using percentiles
#>
#> The measurement error is corrected for by application of regression calibration
#>
#> Coefficients Uncorrected Model:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.039467 0.033355 -1.1832 0.237
#> X_star 0.413716 0.028883 14.3239 <2e-16
#> Z 2.084570 0.044323 47.0316 <2e-16
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI
#> (Intercept) -0.039467 -0.104921 0.025987
#> X_star 0.413716 0.357038 0.470394
#> Z 2.084570 1.997594 2.171547
#>
#> Residual standard error: 1.052587 on 997 degrees of freedom
```

Two types of 95% confidence intervals are shown in the output of the `summary()`

object. Bootstrap confidence intervals and Delta method confidence intervals. The default method to constructing confidence intervals in **mecor** is the Delta method. Further, Fieller method confidence intervals and zero variance method confidence intervals can be constructed with `summary()`

:

```
# fieller method ci and zero variance method ci and se's for 'rc_fit'
summary(rc_fit, zerovar = TRUE, fieller = TRUE)
#>
#> Call:
#> mecor(formula = Y ~ MeasError(substitute = X_star, reference = X) +
#> Z, data = icvs, method = "standard", B = 999)
#>
#> Coefficients Corrected Model:
#> Estimate SE SE (btstr) SE (zerovar)
#> (Intercept) -0.037301 0.035943 0.035086 0.033365
#> X 0.492908 0.036959 0.037753 0.034412
#> Z 2.001227 0.051999 0.053961 0.048431
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI LCI (btstr) UCI (btstr) LCI (zerovar)
#> (Intercept) -0.037301 -0.107748 0.033147 -0.103395 0.033592 -0.102695
#> X 0.492908 0.420470 0.565347 0.423385 0.571515 0.425463
#> Z 2.001227 1.899311 2.103143 1.889610 2.104471 1.906303
#> UCI (zerovar) LCI (fieller) UCI (fieller)
#> (Intercept) 0.028093 NA NA
#> X 0.560354 0.421772 0.566887
#> Z 2.096151 NA NA
#> Bootstrap Confidence Intervals are based on 999 bootstrap replicates using percentiles
#>
#> The measurement error is corrected for by application of regression calibration
#>
#> Coefficients Uncorrected Model:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.039467 0.033355 -1.1832 0.237
#> X_star 0.413716 0.028883 14.3239 <2e-16
#> Z 2.084570 0.044323 47.0316 <2e-16
#>
#> 95% Confidence Intervals:
#> Estimate LCI UCI
#> (Intercept) -0.039467 -0.104921 0.025987
#> X_star 0.413716 0.357038 0.470394
#> Z 2.084570 1.997594 2.171547
#>
#> Residual standard error: 1.052587 on 997 degrees of freedom
```

Fieller method confidence intervals are only constructed for the corrected covariate (in this case \(X\)).

The simulated data set `rs`

in **mecor** is a replicates study. A replicates study includes a subset of individuals of whom the error-prone substitute exposure is repeatedly measured. The dataset `rs`

contains 1000 observations of the outcome Y, three replicate measures of the error-prone exposure \(X_1^*, X_2^*\) and \(X_3^*\), and two covariates \(Z_1\) and \(Z_2\). It is assumed that there is ‘random’ measurement error in the repeatedly measured substitute exposure measure.

```
# load replicates study
data("rs", package = "mecor")
head(rs)
#> Y X_star_1 X_star_2 X_star_3 Z1 Z2
#> 1 4.3499125 2.3579436 2.109616 2.5476139 1.3052590 1.0955135
#> 2 1.5012541 2.4388260 2.951810 1.1754237 0.5207398 0.1326150
#> 3 0.8249127 2.1473954 1.640624 1.1649830 -0.2035013 -0.2137111
#> 4 5.5258514 2.8678562 2.547374 3.7509709 2.2994651 0.3441749
#> 5 1.9177871 1.1964137 2.280420 2.1699911 -0.2930064 1.9122899
#> 6 -1.0732459 0.8809592 -1.246339 -0.0832355 -0.8107538 0.7414892
```

When ignoring the measurement error in \(X_1^*\), one would naively regress \(X_1^*\), \(Z_1\) and \(Z_2\) on \(Y\). Which results in a biased estimation of the exposure-outcome association:

```
# naive estimate of the exposure-outcome association
lm(Y ~ X_star_1 + Z1 + Z2,
data = rs)
#>
#> Call:
#> lm(formula = Y ~ X_star_1 + Z1 + Z2, data = rs)
#>
#> Coefficients:
#> (Intercept) X_star_1 Z1 Z2
#> -0.03821 0.44014 2.04397 1.08130
```

Or alternatively, one could calculate the mean of each of the three replicate measures. Yet, this would still lead to a biased estimation of the exposure-outcome association:

```
## calculate the mean of the three replicate measures
rs$X_star_123 <- with(rs, rowMeans(cbind(X_star_1, X_star_2, X_star_3)))
# naive estimate of the exposure-outcome association version 2
lm(Y ~ X_star_123 + Z1 + Z2,
data = rs)
#>
#> Call:
#> lm(formula = Y ~ X_star_123 + Z1 + Z2, data = rs)
#>
#> Coefficients:
#> (Intercept) X_star_123 Z1 Z2
#> -0.05795 0.48892 2.00379 1.03905
```

For an unbiased estimation of the exposure-outcome association, one could use regression calibration using `mecor()`

:

```
mecor(formula = Y ~ MeasError(X_star_1, replicate = cbind(X_star_2, X_star_3)) + Z1 + Z2,
data = rs)
#>
#> Call:
#> mecor(formula = Y ~ MeasError(X_star_1, replicate = cbind(X_star_2,
#> X_star_3)) + Z1 + Z2, data = rs)
#>
#> Coefficients Corrected Model:
#> (Intercept) cor_X_star_1 Z1 Z2
#> -0.07416626 0.54994384 1.95079727 0.98285575
#>
#> Coefficients Uncorrected Model:
#> (Intercept) X_star_1 Z1 Z2
#> -0.03821319 0.44013564 2.04396587 1.08129753
```

Instead of using the *reference* argument in the `MeasError()`

object, the *replicate* argument is used. Standard errors of the regression calibration estimator and confidence intervals can be constructed similar to what was shown for an internal validation study.

The simulated data set `ccs`

in **mecor** is a covariate calibration study. In a calibration study, two types of measurements are used to measure the exposure. A measurement method prone to ‘systematic’ error, and a measurement method prone to ‘random’ error. The measurement prone to ‘systematic’ error is observed in the full study, the measurement prone to ‘classical’ error is observed in a subset of the study and repeatedly measured. The dataset `ccs`

contains 1000 observations of the outcome \(Y\), the substitute exposure \(X^{*^s}\) prone to systematic error, the covariate \(Z\) and two replicates of the substitute exposure prone \(X^*_1\) and \(X^*_2\) prone to ‘random’ error. The two replicate measures are observed in the first 500 individuals of the study population.

```
# load calibration study
data("ccs", package = "mecor")
head(ccs)
#> Y X_star Z X_star_1 X_star_2
#> 1 -0.3406228 -1.53874595 -0.0936835 0.031424096 -1.1768822
#> 2 -2.3789052 0.05586177 -0.3031960 0.008015595 0.1158643
#> 3 0.9907555 2.24113988 1.1151805 0.774927602 0.9707306
#> 4 3.0693600 3.71006822 1.0004028 2.406897833 1.2369752
#> 5 5.1141197 3.20337227 1.6738982 1.399716377 0.8444616
#> 6 2.7177172 2.10226409 1.1320830 1.141724967 1.3529544
```

When ignoring the measurement error in \(X^{*^s}\), one would naively regress \(X^{*^s}\) and \(Z\) on \(Y\). Which results in a biased estimation of the exposure-outcome association:

```
## uncorrected regression
lm(Y ~ X_star + Z,
data = ccs)
#>
#> Call:
#> lm(formula = Y ~ X_star + Z, data = ccs)
#>
#> Coefficients:
#> (Intercept) X_star Z
#> 0.02212 0.22726 2.10578
```

Alternatively, one could use the first half of the study population and use the mean of each of the two replicate measures. Yet, this would still lead to a biased estimation of the exposure-outcome association:

```
## calculate mean of three replicate measures
ccs$X_star_12 <- with(ccs, rowMeans(cbind(X_star_1, X_star_2)))
## uncorrected regression version 2
lm(Y ~ X_star_12 + Z,
data = ccs)
#>
#> Call:
#> lm(formula = Y ~ X_star_12 + Z, data = ccs)
#>
#> Coefficients:
#> (Intercept) X_star_12 Z
#> -0.06003 0.46183 2.09969
```

For an unbiased estimation of the exposure-outcome association, one could use *standard regression calibration* using `mecor()`

:

```
mecor(formula = Y ~ MeasError(X_star, replicate = cbind(X_star_1, X_star_2)) + Z,
data = ccs)
#>
#> Call:
#> mecor(formula = Y ~ MeasError(X_star, replicate = cbind(X_star_1,
#> X_star_2)) + Z, data = ccs)
#>
#> Coefficients Corrected Model:
#> (Intercept) cor_X_star Z
#> 0.03148803 0.47565243 2.10394044
#>
#> Coefficients Uncorrected Model:
#> (Intercept) X_star Z
#> 0.02212343 0.22725887 2.10577516
```

The reader who is paying attention would perhaps notice that the subset of 500 individuals of whom \(X_1^*\) and \(X_2^*\) were observed resembles a replicates study. *Standard regression calibration* therefore could alternatively be applied in the following way:

```
mecor(formula = Y ~ MeasError(X_star_1, replicate = X_star_2) + Z,
data = subset(ccs, !is.na(X_star_2)))
#>
#> Call:
#> mecor(formula = Y ~ MeasError(X_star_1, replicate = X_star_2) +
#> Z, data = subset(ccs, !is.na(X_star_2)))
#>
#> Coefficients Corrected Model:
#> (Intercept) cor_X_star_1 Z
#> -0.05777153 0.51851033 2.04627765
#>
#> Coefficients Uncorrected Model:
#> (Intercept) X_star_1 Z
#> -0.06288496 0.43184108 2.12827829
```

Standard errors of the regression calibration estimator and confidence intervals can be constructed similar to what was shown for an internal validation study.

The simulated data set `ecvs`

in **mecor** is a external covariate-validation study. An external validation study is used when in the main study, no information is available to correct for the measurement error in the exposure \(X\). Suppose for example that \(X\) is not observed in the internal covariate-validation study `icvs`

. An external validation study is a (small) sub study containing observations of the reference measure \(X\), the error-prone substitute measure \(X^*\) and the covariate \(Z\) of the original study. The external validation study is then used to estimate the calibration model, that is subsequently used to correct for the measurement error in the main study.

```
# load internal covariate validation study
data("ecvs", package = "mecor")
head(ecvs)
#> X X_star Z
#> 1 -1.8177022 -1.0599121 -0.90610225
#> 2 -0.6256011 -1.1090224 -2.02664356
#> 3 1.9752408 1.3199730 1.37740119
#> 4 0.3140878 -0.1107001 0.24907466
#> 5 1.3191880 1.6079029 -0.52663051
#> 6 -0.6149009 -0.7632676 0.03778671
data("icvs", package = "mecor")
```

Suppose reference measure \(X\) is not observed in dataset `icvs`

. To correct the bias in the naive association between exposure \(X^*\) and outcome \(Y\) given \(Z\), using the external validation study, one can proceed as follows using `mecor()`

:

```
# Estimate the calibration model in the external validation study
calmod <- lm(X ~ X_star + Z,
data = ecvs)
# Use the calibration model for measurement error correction:
mecor(Y ~ MeasErrorExt(substitute = X_star, model = calmod) + Z,
data = icvs)
#>
#> Call:
#> mecor(formula = Y ~ MeasErrorExt(substitute = X_star, model = calmod) +
#> Z, data = icvs)
#>
#> Coefficients Corrected Model:
#> (Intercept) cor_X_star Z
#> -0.01428791 0.47721827 1.99464677
#>
#> Coefficients Uncorrected Model:
#> (Intercept) X_star Z
#> -0.03946702 0.41371614 2.08457045
```

In the above, a `MeasErrorExt()`

object is used, indicating that external information is used for measurement error correction. The model argument of a `MeasErrorExt()`

object takes a linear model of class lm (in the above `calmod`

). Alternatively, a named list with the coefficients of the calibration model can be used as follows:

```
# Use coefficients for measurement error correction:
mecor(Y ~ MeasErrorExt(X_star, model = list(coef = c(0, 0.8, 2))) + Z,
data = icvs)
#>
#> Call:
#> mecor(formula = Y ~ MeasErrorExt(X_star, model = list(coef = c(0,
#> 0.8, 2))) + Z, data = icvs)
#>
#> Coefficients Corrected Model:
#> (Intercept) cor_X_star Z
#> -0.03946702 0.51714517 1.05028010
#>
#> Coefficients Uncorrected Model:
#> (Intercept) X_star Z
#> -0.03946702 0.41371614 2.08457045
```