---
title: "Item Analysis of a Scale or an Index"
author: "Daniel Lüdecke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Item Analysis of a Scale or an Index}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")

if (!requireNamespace("dplyr", quietly = TRUE) ||
    !requireNamespace("sjmisc", quietly = TRUE) ||
    !requireNamespace("parameters", quietly = TRUE) ||
    !requireNamespace("psych", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
} else {
  knitr::opts_chunk$set(eval = TRUE)
}
```

This document shows examples for using the `tab_itemscale()` function of the sjPlot package.

## Performing an item analysis of a scale or index

This function performs an item analysis with certain statistics that are useful for scale or index development. Following statistics are computed for each variable (column) of a data frame:

* percentage of missing values
* mean value
* standard deviation
* skew
* item difficulty
* item discrimination
* Cronbach's Alpha if item was removed from scale
* mean (or average) inter-item-correlation

Optional, following statistics can be computed as well:

* kurstosis
* Shapiro-Wilk Normality Test

If the argument `factor.groups` is _not_ `NULL`, the data frame df will be splitted into groups, assuming that `factor.groups` indicate those columns (variables) of the data frame that belong to a certain factor (see, for instance, return value of function `tab_pca()` or `parameters::principal_components()` as example for retrieving factor groups for a scale). This is useful when you have perfomed a principal component analysis or factor analysis as first step, and now want to see whether the found factors / components represent a scale or index score.

To demonstrate this function, we first need some data:

```{r, echo=FALSE, message=FALSE, warning=FALSE}
library(sjPlot)
library(sjmisc)
library(dplyr)
data(efc)
# create data frame with COPE-index scale
mydf <- dplyr::select(efc, contains("cop"))
```

## Index score with one component

The simplest function call is just passing the data frame as argument. In this case, the function assumes that all variables of the data frame belong to one factor only.

```{r}
tab_itemscale(mydf)
```

To interprete the output, we may consider following values as rule-of-thumbs for indicating a reliable scale:

* item difficulty should range between 0.2 and 0.8. Ideal value is p+(1-p)/2 (which mostly is between 0.5 and 0.8)
* for item discrimination, acceptable values are 0.2 or higher; the closer to 1 the better
* in case the total Cronbach's Alpha value is below the acceptable cut-off of 0.7 (mostly if an index has few items), the mean inter-item-correlation is an alternative measure to indicate acceptability; satisfactory range lies between 0.2 and 0.4

## Index score with more than one component

The items of the COPE index used for our example do not represent a single factor. We can check this, for instance, with a principle component analysis. If you know, which variable belongs to which factor (i.e. which variable is part of which component), you can pass a numeric vector with these group indices to the argument `factor.groups`. In this case, the data frame is divided into the components specified by `factor.groups`, and each component (or factor) is analysed.

```{r}
library(parameters)
# Compute PCA on Cope-Index, and retrieve 
# factor indices for each COPE index variable
pca <- parameters::principal_components(mydf)
factor.groups <- parameters::closest_component(pca)
```

The PCA extracted two components. Now `tab_itemscale()` ...

1. performs an item analysis on both components, showing whether each of them is a reliable and useful scale or index score
2. builds an index of each component, by standardizing each scale
3. and adds a component-correlation-matrix, to see whether the index scores (which are based on the components) are highly correlated or not.

```{r}
tab_itemscale(mydf, factor.groups)
```

## Adding further statistics

```{r}
tab_itemscale(mydf, factor.groups, show.shapiro = TRUE, show.kurtosis = TRUE)
```