---
title: "Scale linking with PROsetta package"
output:
  html_document:
    number_sections: true
    toc: true
    toc_float:
      smooth_scroll: false
    toc_depth: 1
    css: styles.css
vignette: >
  %\VignetteIndexEntry{Scale linking with PROsetta package}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

# Introduction

This vignette explains how to perform scale linking with the **PROsetta** package. For the purpose of illustration, we replicate the linking between two scales as was studied by Choi, Schalet, Cook, and Cella (2014):

- **Scale to be linked**: The Center for Epidemiologic Studies Depression scale (CES-D); 20 items, 1-4 points each, raw score range = 20-80.
- **Anchoring scale**: The PROMIS Depression scale; 28 items, 1-5 points each, raw score range = 28-140.

```{r, echo = FALSE, message = FALSE, output = "hide"}
library(knitr)
library(kableExtra)
library(PROsetta)
library(dplyr)
```

# Load datasets

The first step is to load the input dataset using `loadData()`. Scale linking requires three input parts: (1) response data, (2) item map, and (3) anchor data. In this vignette, we use example datasets `response_dep`, `itemmap_dep`, `anchor_dep`. These are included in the **PROsetta** package.
 
```{r}
d <- loadData(
  response  = response_dep,
  itemmap   = itemmap_dep,
  anchor    = anchor_dep
)
```

The `loadData()` function requires three arguments. These are described in detail below.

## Response data

The response data contains individual item responses on both scales. The data must include the following columns.

* **Person ID column.** The response data must have a column for unique IDs of participants.
* * The example data `response_dep` uses `prosettaid` as the column name, but it can be named differently as long as the same column name does not appear in other data parts. For example, see how item map `itemmap_dep` and anchor data `anchor_dep` do not have a column named `prosettaid`.
* **Item response columns.** The response data must include item response columns
* * The item response columns must be named using their item IDs.
* * The item IDs must match item IDs appearing in item map and anchor data. For example, see how the item IDs appearing in `response_dep` exactly matches item IDs appearing in `itemmap_dep`. Also, see how the item IDs appearing in `anchor_dep` are a subset of item IDs appearing in `response_dep`.

Below is an example of response data.

```{r}
head(response_dep)
```

## Item map data

The item map data contains information on which items belong to which scale. The following columns are required.

* **Scale ID column.** The item map data must have a scale ID column indicating which scale each item belongs to.
* * The example data `itemmap_dep` uses `scale_id` as the column name, but it can be named differently as long as the same column name does not appear in other data parts.
* * Also, the example data `itemmap_dep` codes the anchor scale as Scale 1 and the scale to be linked as Scale 2, but these can be flipped if the user wants to do so.
* **Item ID column.** The item map data must have an item ID column.
* * The item IDs must match item IDs appearing in other data parts. For example, see how the item IDs appearing in `itemmap_dep` exactly matches item IDs appearing in `response_dep`. Also, see how the item IDs appearing in `anchor_dep` are a subset of item IDs appearing in `itemmap_dep`.
* * The example data `itemmap_dep` uses `item_id` as the column name, but it can be named differently as long as item ID columns use the same name across data parts. For example, see how `itemmap_dep` and `anchor_dep` both use `item_id`.
* **Item model column.** The item map data must have an item model column.
* * Accepted values are `GR` for graded response model, and `GPC` for generalized partial credit model.
* * The example data `itemmap_dep` uses `item_model` as the column name, but it can be named differently as long as item model columns use the same name across data parts. For example, see how `itemmap_dep` and `anchor_dep` both use `item_model`.
* `ncat`: The item map data must have a column named `ncat` containing the number of response categories of each item.

Below is an example of item map data.

```{r}
head(itemmap_dep)
```

## Anchor data

The anchor data contains item parameters for the anchoring scale. The following columns are required:

* **Item ID column.** The anchor data must have an item ID column.
* * The item IDs must match item IDs appearing in other data parts. For example, see how the item IDs appearing in `anchor_dep` are a subset of item IDs appearing in `itemmap_dep`.
* * The example data `anchor_dep` uses `item_id` as the column name, but it can be named differently as long as item ID columns use the same name across data parts. For example, see how `itemmap_dep` and `anchor_dep` both use `item_id`.
* **Item model column.** The anchor data must have an item model column.
* * Accepted values are `GR` for graded response model, and `GPC` for generalized partial credit model.
* * The example data `anchor_dep` uses `item_model` as the column name, but it can be named differently as long as item model columns use the same name across data parts. For example, see how `itemmap_dep` and `anchor_dep` both use `item_model`.
* **Item parameter columns.** The anchor data must have item parameter columns named `a`, `cb1`, `cb2`, and so on. These must be in the A/B parameterization (i.e., traditional IRT parameterization for unidimensional models) so that item probabilities are calculated based on $a(\theta - b)$.

Below is an example of anchor data.

```{r}
head(anchor_dep)
```

# Preliminary analysis

Preliminary analyses are commonly conducted before the main scale linking to check for various statistical aspects of the data. The **PROsetta** package offers a set of helper functions for preliminary analyses.

## Basic descriptive statistics

The frequency distribution of each item in the response data is obtained by `runFrequency()`.

```{r freq}
freq_table <- runFrequency(d)
head(freq_table)
```

The frequency distribution of the summed scores for the combined scale can be plotted as a histogram with `plot()`. The required argument is a `PROsetta_data` object created with `loadData()`. The optional `scale` argument specifies for which scale the summed score should be created. Setting `scale = "combined"` plots the summed score distribution for the combined scale.

```{r, fig.align = "center", fig.width = 7, fig.height = 7}
plot(d, scale = "combined", title = "Combined scale")
```

The user can also generate the summed score distribution for the first or second scale by specifying `scale = 1` or `scale = 2`.

```{r eval = FALSE}
plot(d, scale = 1, title = "Scale 1") # not run
plot(d, scale = 2, title = "Scale 2") # not run
```

Basic descriptive statistics are obtained for each item by `runDescriptive()`.

```{r desc}
desc_table <- runDescriptive(d)
head(desc_table)
```

## Classical reliability analysis

Classical reliability statistics can be obtained by `runClassical()`. By default, the analysis is performed for the combined scale.

```{r alpha}
classical_table <- runClassical(d)
summary(classical_table$alpha$combined)
```

From the above summary, we see that the combined scale (48 items) has a good reliability (alpha = 0.98).

The user can set `scalewise = TRUE` to request an analysis for each scale separately in addition to the combined scale.

```{r alpha2}
classical_table <- runClassical(d, scalewise = TRUE)
summary(classical_table$alpha$`2`)
```

From the above summary, we see that the scale to be linked (Scale 2; CES-D) has a good reliability (alpha = 0.93).

Specifying `omega = TRUE` returns the McDonald's $\omega$ coefficients as well.

```{r omega, eval = FALSE}
classical_table <- runClassical(d, scalewise = TRUE, omega = TRUE)
classical_table$omega$combined # omega values for combined scale
classical_table$omega$`1`      # omega values for each scale, created when scalewise = TRUE
classical_table$omega$`2`      # omega values for each scale, created when scalewise = TRUE
```

Additional arguments can be supplied to `runClassical()` to pass onto `psych::omega()`.

```{r eval = FALSE}
classical_table <- runClassical(d, scalewise = TRUE, omega = TRUE, nfactors = 5) # not run
```

## Dimensionality analysis

A key assumption in item response theory is the unidimensionality assumption. Dimensionality analysis is performed with confirmatory factor analysis by `runCFA()`, which fits a one-factor model to the data. Setting `scalewise = TRUE` performs the dimensionality analysis for each scale separately in addition to the combined scale.

```{r cfa, results = "hide"}
out_cfa <- runCFA(d, scalewise = TRUE)
```

`runCFA()` calls for `lavaan::cfa()` internally and can pass additional arguments onto it.

```{r, eval = FALSE}
out_cfa <- runCFA(d, scalewise = TRUE, std.lv = TRUE) # not run
```

The CFA result for the combined scale is stored in the `combined` slot, and if `scalewise = TRUE`, the analysis for each scale is also stored in each numbered slot.

```{r}
out_cfa$combined
out_cfa$`1`
out_cfa$`2`
```

CFA fit indices can be obtained by using `summary()` from the *lavaan* package. For the combined scale:

```{r}
lavaan::summary(out_cfa$combined, fit.measures = TRUE, standardized = TRUE, estimates = FALSE)
```

and also for each scale separately:

```{r, eval = FALSE}
lavaan::summary(out_cfa$`1`, fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
lavaan::summary(out_cfa$`2`, fit.measures = TRUE, standardized = TRUE, estimates = FALSE) # not run
```

## Item parameter calibration

`runCalibration()` can be used to perform free IRT calibration for diagnostic purposes. `runCalibration()` calls `mirt::mirt()` internally, and additional arguments can be supplied to be passed onto `mirt`, e.g., to increase the number of EM cycles to 1000, as follows:

```{r calib, results = "hide", error = TRUE, message = FALSE}
out_calib <- runCalibration(d, technical = list(NCYCLES = 1000))
```

As a safeguard, if the model fit process does not converge, `runCalibration()` explicitly raises an error and does not return its results.

```{r calib2, results = "hide", error = TRUE, message = FALSE}
out_calib <- runCalibration(d, technical = list(NCYCLES = 10))
```

The output object from `runCalibration()` can be used to generate diagnostic output with functions from the *mirt* package:

```{r mirt_plot, eval = FALSE, results = "hide"}
mirt::itemfit(out_calib, empirical.plot = 1)
mirt::itemplot(out_calib, item = 1, type = "info")
mirt::itemfit(out_calib, "S_X2", na.rm = TRUE)
```

Scale information functions can be plotted with `plotInfo`. The two required arguments are an output object from `runCalibration()` and a `PROsetta` object from `loadData()`. The additional arguments specify the labels, colors, and line types for each scale and the combined scale. The last values in arguments `scale_label`, `color`, `lty` represent the values for the combined scale.

```{r, fig.align = "center", fig.width = 7, fig.height = 7}
plotInfo(
  out_calib, d,
  scale_label = c("PROMIS Depression", "CES-D", "Combined"),
  color = c("blue", "red", "black"),
  lty = c(1, 2, 3)
)
```

# Scale linking: IRT-based

Scale linking can be performed in multiple ways. This section describes the workflow for performing an IRT-based linking.

## Obtain linked item parameters

The first step of IRT-based linking is to obtain linked item parameters. This is done by `runLinking()`, which performs scale linking based on supplied anchor item parameters. A variety of scale linking methods are available in the **PROsetta** package.

### Fixed parameter calibration method

A fixed-parameter calibration is performed by constraining item parameters of anchor items to anchor data values, and freely estimating item parameters for non-anchor items. The mean and the variance of $\theta$ is freely estimated to capture the difference between the current participant group relative to the anchor participant group, assuming the anchor group follows $\theta \sim N(0, 1)$.

Scale linking through fixed parameter calibration is performed by setting `method = "FIXEDPAR"`.

```{r fixedpar, results = "hide", message = FALSE}
link_fixedpar <- runLinking(d, method = "FIXEDPAR")
```

From the output, the linked parameters are stored in the `$ipar_linked` slot.

```{r}
head(link_fixedpar$ipar_linked)
```

The group characteristics are stored in the `$mu_sigma` slot.

```{r}
link_fixedpar$mu_sigma
```

From the above output, we see that the participant group in `response_dep` had a slightly lower $\theta$ of `-0.060` compared to the anchor group, and a slightly smaller variance of `0.950` compared to the anchor group.

### Linear transformation methods

Linear transformation methods determine linear transformation constants, i.e., a slope and an intercept, to transform freely estimated item parameters to the metric defined by the anchor items. Linear transformation methods include Mean-Mean, Mean-Sigma, Haebara, and Stocking-Lord methods.

Scale linking through linear transformation is performed by setting the `method` argument to one of the following options:

* `MM` (Mean-Mean)
* `MS` (Mean-Sigma)
* `HB` (Haebara)
* `SL` (Stocking-Lord)

```{r sl, results = "hide", error = TRUE, message = FALSE}
link_sl <- runLinking(d, method = "SL", technical = list(NCYCLES = 1000))
link_sl
```

The item parameter estimates linked to the anchor metric are stored in the `$ipar_linked` slot.

```{r}
head(link_sl$ipar_linked)
```

Transformation constants (A = slope; B = intercept) for the specified linear transformation method are stored in the `$constants` slot.

```{r}
link_sl$constants
```

### Calibrated projection (using fixed-parameter calibration)

Calibrated projection is an IRT-based multidimensional scale linking method. Calibrated projection is performed through a multidimensional model where the items in one scale measures its own $\theta$ dimension, and items in another scale measures another $\theta$ dimension.

In this vignette, the anchor scale (PROMIS Depression) was coded as Scale 1, and the scale to be linked (CES-D) was coded as Scale 2. This leads to Scale 1 items having non-zero $a$-parameters in dimension 1, and Scale 2 items having non-zero $a$-parameters in dimensions 2.

For the purpose of scale linking, calibrated projection can be performed in conjunction with the fixed-parameter calibration technique. The anchor item parameters are constrained to their anchor data values, and item parameters in the non-anchor items are freely estimated. The mean/variance of $\theta$ in the anchor dimension are freely estimated to capture the difference between the current participant group relative to the anchor participant group, assuming the anchor group follows $\theta \sim N(0, 1)$. The mean/variance of $\theta$ in the scale-to-be-linked dimension are constrained to be 0/1.

In this vignette, this means that the mean/variance of dimension 1 (the anchor dimension) are freely estimated, and the mean/variance of dimension 2 are constrained to be 0/1.

Scale linking through calibrated projection (with fixed parameter calibration) is performed by setting `method = "CP"`.

```{r, results = "hide", message = FALSE}
link_cp <- runLinking(d, method = "CP")
```

From the output, the linked parameters are stored in the `$ipar_linked` slot. See how there are two $a$-parameters, with items in the anchor scale (PROMIS Depression) loaded onto dimension 1.

```{r}
head(link_cp$ipar_linked)
```

The group characteristics are stored in the `$mu_sigma` slot.

```{r}
link_cp$mu_sigma
```

From the above output, we see that the participant group in `response_dep` had a slightly lower $\theta$ of `-0.070` compared to the anchor group, and a slightly smaller variance of `0.971` compared to the anchor group. Also, we see that the constructs represented by the two scales had a correlation of `0.919`.

## Making crosswalk tables

The second step of IRT-based scale linking is to generate raw-score-to-scale-score (RSSS) crosswalk tables. This is done by `runRSSS()`. The `runRSSS()` function requires the dataset object created by `loadData()`, and the output object from `runLinking()`.

```{r}
rsss_fixedpar <- runRSSS(d, link_fixedpar)
```

The output from `runRSSS()` includes three crosswalk tables (labeled as `1`, `2`, and `combined`), one for each scale and the third one for the combined scale. In this vignette, the anchor scale (PROMIS Depression) was coded as Scale 1 and the scale to be linked (CES-D) was coded as Scale 2.

The crosswalk table for the scale to be linked (CES-D; Scale 2) is shown here as an example.

```{r}
head(round(rsss_fixedpar$`2`, 3))
```

The columns in the crosswalk tables include:

* `raw_2`: a raw score level in Scale 2 (CES-D; 20 items, 1-4 points each, raw score range = 20-80).
* `tscore`: the corresponding T-score in the anchor group metric.
* `tscore_se`: the standard error associated with the T-score.
* `eap`: the corresponding $\theta$ in the anchor group metric.
* `eap_se`: the standard error associated with the $\theta$ value.
* `escore_1`: the expected Scale 1 (PROMIS Depression) raw score derived from $\theta$.
* `escore_2`: the expected Scale 2 (CES-D) raw score derived from $\theta$.
* `escore_combined`: the expected Combined Scale raw score derived from $\theta$.

For example, row 6 in the above table shows:

- A raw score 25 points in CES-D corresponds to a T-score of 46.2 in the anchor group.
- A raw score 25 points in CES-D corresponds to $\theta = -0.382$ in the anchor group, assuming the anchor group metric is $\theta \sim N(0, 1)$.
- A raw score 25 points in CES-D corresponds to an expected raw score of 35.853 in the PROMIS Depression scale (raw score range = 28-140).
- A raw score 25 points in CES-D corresponds to an expected raw score of 60.630 in the combined scale (raw score range = 48-220).

# Scale linking: score-based

This section describes the workflow for performing a score-based scale linking. This is done by `runEquateObserved()`, which performs equipercentile linking using observed raw sum-scores. The function removes cases with missing responses to be able to generate correct sum-scores in concordance tables.

This function requires four arguments:

* `scale_from`: the scale ID of the scale to be linked.
* `scale_to`: the scale ID of the anchor scale.
* `eq_type`: the type of equating to be performed, `equipercentile` for this example. See `?equate::equate` for details.
* `smooth`: the type of presmoothing to perform

## Equipercentile linking: raw to raw

By default, `runEquateObserved()` performs raw-raw equipercentile linking. In this example, each raw sum-score in the scale-to-be-linked (Scale 2; CES-D, raw score range 20-80) is linked to a corresponding raw sum-score in the anchor scale (Scale 1; PROMIS Depression, raw score range 28-140) with loglinear presmoothing.

```{r eqp_raw, results = "hide", message = FALSE}
rsss_equate_raw <- runEquateObserved(
  d,
  scale_from = 2, # CES-D (scale to be linked)
  scale_to = 1,   # PROMIS Depression (anchor scale)
  eq_type = "equipercentile", smooth = "loglinear"
)
```

The crosswalk table can be obtained from the `concordance` slot:

```{r}
head(rsss_equate_raw$concordance)
```

From row 6 in the above output, we see that:

- A raw score 25 points in CES-D corresponds to an expected raw score of 36.882 in the PROMIS Depression scale (raw score range = 28-140).

## Equipercentile method: raw to T-score

Alternatively, raw sum-scores can be linked to T-scores by specifying `type_to = "tscore"` in `runEquateObserved()`. In the following example, the raw sum-scores from the scale-to-be-linked (Scale 2; CES-D, raw score range 20-80) are linked to T-score equivalents in the anchor scale (Scale 1; PROMIS Depression, mean = 50 and SD = 10).

This requires a separate RSSS table to be supplied for the purpose of converting anchor scale raw scores to T-scores.

```{r eqp_dir, results = "hide", message = FALSE}
rsss_equate_tscore <- runEquateObserved(
  d,
  scale_from = 2, # CES-D (scale to be linked)
  scale_to = 1,   # PROMIS Depression (anchor scale)
  type_to = "tscore",
  rsss = rsss_fixedpar, # used to convert PROMIS Depression (anchor scale) raw to T
  eq_type = "equipercentile", smooth = "loglinear"
)
```

Again, the crosswalk table can be retrieved from the `concordance` slot:

```{r}
head(rsss_equate_tscore$concordance)
```

From row 6 in the above output, we see that:

- A raw score 25 points in CES-D corresponds to a T-score of 46.961 in the PROMIS Depression scale.

# Comparison between linking methods

The plot below shows the scale link produced by the equipercentile method (red dotted line) and the link produced by the fixed-parameter calibration method (blue solid line).

```{r, fig.align = "center", fig.width = 7, fig.height = 7}
plot(
  rsss_fixedpar$`2`$raw_2,
  rsss_fixedpar$`2`$tscore,
  xlab = "CES-D (Scale to be linked), raw score",
  ylab = "PROMIS Depression (Anchor scale), T-score",
  type = "l", col = "blue")
lines(
  rsss_equate_tscore$concordance$raw_2,
  rsss_equate_tscore$concordance$tscore_1,
  lty = 2, col = "red")
grid()
legend(
  "topleft",
  c("Fixed-Parameter Calibration", "Equipercentile Linking"),
  lty = 1:2, col = c("blue", "red"), bg = "white"
)
```

To better understand the performance of the two methods, we add a best-case method where pattern-scoring is used on the response data to obtain $\theta$ estimates.

## Raw scores from Scale 2

To begin with, we create an object `scores` using `getScaleSum()` to contain raw summed scores on Scale 2 (i.e., CES-D). `NA` will result for any respondents with one or more missing responses on Scale 2. We could also create a summed score variable for Scale 1 using the same function, e.g., `getScaleSum(d, 1)`.

```{r}
scores <- getScaleSum(d, 2)
head(scores)
```

## PROMIS T-scores based on item response patterns

To establish an ideal case scenario, we obtain $\theta$ estimates on the anchor scale (Scale 1; PROMIS Depression) based on item response patterns using the `getTheta()` function.

The `getTheta()` function requires three arguments:

- The `data` argument requires a data object from `loadData()`. In this example, we use the object we created earlier.
- The `ipar` argument requires item parameter estimates for all items. Here, we use the item parameter estimates that were previously obtained from fixed-parameter calibration, `out_link_fixedpar$ipar_linked`.
- The `scale` argument requires a scale ID to perform $\theta$ estimation on. Here, we use Scale 1 (the anchor scale; PROMIS Depression).

The function returns participant-wise EAP $\theta$ estimates and their associated standard errors:

```{r, message = FALSE}
theta_promis <- getTheta(data = d, ipar = link_fixedpar$ipar_linked, scale = 1)$theta
head(theta_promis)
```

The $\theta$ estimates for PROMIS Depression are then converted to T-scores.

```{r}
t_promis_pattern <- data.frame(
  prosettaid = theta_promis$prosettaid,
  t_promis_pattern = round(theta_promis$theta_eap * 10 + 50, 1)
)
head(t_promis_pattern)
```

These T-scores will be used as best-case scenario values in a later stage for comparing different RSSS tables from different methods.

We then merge the PROMIS Depression T-scores with CES-D raw scores.

```{r}
scores <- merge(scores, t_promis_pattern, by = "prosettaid")
head(scores)
```

From the above output, we see that participant `100048` (row 1) had scored a raw-score of 21 on CES-D, and their PROMIS Depression T-score was 45.8. See how participants with the same raw score of 21 have different T-scores in the above output, from the usage of pattern scoring.

## PROMIS T-scores based on RSSS table from fixed-parameter calibration

Second, we use the raw-score-to-scale-score (RSSS) crosswalk table obtained above to map raw scores in the scale to be linked (Scale 2; CES-D) onto T-scores on the anchor scale (Scale 1; PROMIS Depression).

```{r, message = FALSE}
rsss_fixedpar <- data.frame(
  raw_2 = rsss_fixedpar[["2"]]$raw_2,
  t_promis_rsss_fixedpar = round(rsss_fixedpar[["2"]]$tscore, 1)
)
scores <- merge(scores, rsss_fixedpar, by = "raw_2")
head(scores)
```

From the above output, we see that CES-D raw scores of 20 were mapped to T-scores of 34.5 using the RSSS table.

## PROMIS T-scores based on RSSS table from equipercentile linking

Next, we use the concordance table from equipercentile linking to map each raw summed score on Scale 2 onto a T-score on the PROMIS Depression metric, `t_cesd_eqp`.

```{r, message = FALSE}
rsss_eqp <- data.frame(
  raw_2 = rsss_equate_tscore$concordance$raw_2,
  t_promis_rsss_eqp = round(rsss_equate_tscore$concordance$tscore_1, 1)
)
scores <- merge(scores, rsss_eqp, by = "raw_2")
head(scores)
```

From the above output, we see that CES-D raw scores of 20 were mapped to T-scores of 33.6 using the RSSS table.

## Comparison of RSSS tables

Finally, use `compareScores()` to compare which RSSS table gives closer results to pattern-scoring.

```{r}
c_fixedpar <- compareScores(
  scores$t_promis_pattern, scores$t_promis_rsss_fixedpar)
c_eqp <- compareScores(
  scores$t_promis_pattern, scores$t_promis_rsss_eqp)

stats           <- rbind(c_fixedpar, c_eqp)
rownames(stats) <- c("Fixed-parameter calibration", "Equipercentile")
stats
```