---
title: "Summary of Regression Models as HTML Table"
author: "Daniel Lüdecke"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Summary of Regression Models as HTML Table}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE)
if (!requireNamespace("sjlabelled", quietly = TRUE) ||
!requireNamespace("sjmisc", quietly = TRUE) ||
!requireNamespace("lme4", quietly = TRUE) ||
!requireNamespace("pscl", quietly = TRUE) ||
!requireNamespace("glmmTMB", quietly = TRUE)) {
knitr::opts_chunk$set(eval = FALSE)
} else {
knitr::opts_chunk$set(eval = TRUE)
library(sjPlot)
}
```
`tab_model()` is the pendant to `plot_model()`, however, instead of creating plots, `tab_model()` creates HTML-tables that will be displayed either in your IDE's viewer-pane, in a web browser or in a knitr-markdown-document (like this vignette).
HTML is the only output-format, you can't (directly) create a LaTex or PDF output from `tab_model()` and related table-functions. However, it is possible to easily export the tables into Microsoft Word or Libre Office Writer.
This vignette shows how to create table from regression models with `tab_model()`. There's a dedicated vignette that demonstrate how to change the [table layout and appearance with CSS](table_css.html).
**Note!** Due to the custom CSS, the layout of the table inside a knitr-document differs from the output in the viewer-pane and web browser!
```{r}
# load package
library(sjPlot)
library(sjmisc)
library(sjlabelled)
# sample data
data("efc")
efc <- as_factor(efc, c161sex, c172code)
```
## A simple HTML table from regression results
First, we fit two linear models to demonstrate the `tab_model()`-function.
```{r, results='hide'}
m1 <- lm(barthtot ~ c160age + c12hour + c161sex + c172code, data = efc)
m2 <- lm(neg_c_7 ~ c160age + c12hour + c161sex + e17age, data = efc)
```
The simplest way of producing the table output is by passing the fitted model as parameter. By default, estimates, confidence intervals (_CI_) and p-values (_p_) are reported. As summary, the numbers of observations as well as the R-squared values are shown.
```{r}
tab_model(m1)
```
## Automatic labelling
As the **sjPlot**-packages features [labelled data](https://strengejacke.github.io/sjlabelled/), the coefficients in the table are already labelled in this example. The name of the dependent variable(s) is used as main column header for each model. For non-labelled data, the coefficient names are shown.
```{r}
data(mtcars)
m.mtcars <- lm(mpg ~ cyl + hp + wt, data = mtcars)
tab_model(m.mtcars)
```
If factors are involved and `auto.label = TRUE`, "pretty" parameters names are used (see [`format_parameters()`](https://easystats.github.io/parameters/reference/format_parameters.html).
```{r}
set.seed(2)
dat <- data.frame(
y = runif(100, 0, 100),
drug = as.factor(sample(c("nonsense", "useful", "placebo"), 100, TRUE)),
group = as.factor(sample(c("control", "treatment"), 100, TRUE))
)
pretty_names <- lm(y ~ drug * group, data = dat)
tab_model(pretty_names)
```
### Turn off automatic labelling
To turn off automatic labelling, use `auto.label = FALSE`, or provide an empty character vector for `pred.labels` and `dv.labels`.
```{r}
tab_model(m1, auto.label = FALSE)
```
Same for models with non-labelled data and factors.
```{r}
tab_model(pretty_names, auto.label = FALSE)
```
## More than one model
`tab_model()` can print multiple models at once, which are then printed side-by-side. Identical coefficients are matched in a row.
```{r}
tab_model(m1, m2)
```
## Generalized linear models
For generalized linear models, the ouput is slightly adapted. Instead of _Estimates_, the column is named _Odds Ratios_, _Incidence Rate Ratios_ etc., depending on the model. The coefficients are in this case automatically converted (exponentiated). Furthermore, pseudo R-squared statistics are shown in the summary.
```{r}
m3 <- glm(
tot_sc_e ~ c160age + c12hour + c161sex + c172code,
data = efc,
family = poisson(link = "log")
)
efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m4 <- glm(
neg_c_7d ~ c161sex + barthtot + c172code,
data = efc,
family = binomial(link = "logit")
)
tab_model(m3, m4)
```
### Untransformed estimates on the linear scale
To plot the estimates on the linear scale, use `transform = NULL`.
```{r}
tab_model(m3, m4, transform = NULL, auto.label = FALSE)
```
## More complex models
Other models, like hurdle- or zero-inflated models, also work with `tab_model()`. In this case, the zero inflation model is indicated in the table. Use `show.zeroinf = FALSE` to hide this part from the table.
```{r}
library(pscl)
data("bioChemists")
m5 <- zeroinfl(art ~ fem + mar + kid5 + ment | kid5 + phd + ment, data = bioChemists)
tab_model(m5)
```
You can combine any model in one table.
```{r}
tab_model(m1, m3, m5, auto.label = FALSE, show.ci = FALSE)
```
## Show or hide further columns
`tab_model()` has some argument that allow to show or hide specific columns from the output:
* `show.est` to show/hide the column with model estimates.
* `show.ci` to show/hide the column with confidence intervals.
* `show.se` to show/hide the column with standard errors.
* `show.std` to show/hide the column with standardized estimates (and their standard errors).
* `show.p` to show/hide the column with p-values.
* `show.stat` to show/hide the column with the coefficients' test statistics.
* `show.df` for linear mixed models, when p-values are based on degrees of freedom with Kenward-Rogers approximation, these degrees of freedom are shown.
### Adding columns
In the following example, standard errors, standardized coefficients and test statistics are also shown.
```{r}
tab_model(m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE)
```
### Removing columns
In the following example, default columns are removed.
```{r}
tab_model(m3, m4, show.ci = FALSE, show.p = FALSE, auto.label = FALSE)
```
### Removing and sorting columns
Another way to remove columns, which also allows to reorder the columns, is the `col.order`-argument. This is a character vector, where each element indicates a column in the output. The value `"est"`, for instance, indicates the estimates, while `"std.est"` is the column for standardized estimates and so on.
By default, `col.order` contains all possible columns. All columns that should shown (see previous tables, for example using `show.se = TRUE` to show standard errors, or `show.st = TRUE` to show standardized estimates) are then printed by default. Colums that are _excluded_ from `col.order` are _not shown_, no matter if the `show*`-arguments are `TRUE` or `FALSE`. So if `show.se = TRUE`, but`col.order` does not contain the element `"se"`, standard errors are not shown. On the other hand, if `show.est = FALSE`, but `col.order` _does include_ the element `"est"`, the columns with estimates are not shown.
In summary, `col.order` can be used to _exclude_ columns from the table and to change the order of colums.
```{r}
tab_model(
m1, show.se = TRUE, show.std = TRUE, show.stat = TRUE,
col.order = c("p", "stat", "est", "std.se", "se", "std.est")
)
```
### Collapsing columns
With `collapse.ci` and `collapse.se`, the columns for confidence intervals and standard errors can be collapsed into one column together with the estimates. Sometimes this table layout is required.
```{r}
tab_model(m1, collapse.ci = TRUE)
```
## Defining own labels
There are different options to change the labels of the column headers or coefficients, e.g. with:
* `pred.labels` to change the names of the coefficients in the _Predictors_ column. Note that the length of `pred.labels` must exactly match the amount of predictors in the _Predictor_ column.
* `dv.labels` to change the names of the model columns, which are labelled with the variable labels / names from the dependent variables.
* Further more, there are various `string.*`-arguments, to change the name of column headings.
```{r}
tab_model(
m1, m2,
pred.labels = c("Intercept", "Age (Carer)", "Hours per Week", "Gender (Carer)",
"Education: middle (Carer)", "Education: high (Carer)",
"Age (Older Person)"),
dv.labels = c("First Model", "M2"),
string.pred = "Coeffcient",
string.ci = "Conf. Int (95%)",
string.p = "P-Value"
)
```
## Including reference level of categorical predictors
By default, for categorical predictors, the variable names and the categories for regression coefficients are shown in the table output.
```{r}
library(glmmTMB)
data("Salamanders")
model <- glm(
count ~ spp + Wtemp + mined + cover,
family = poisson(),
data = Salamanders
)
tab_model(model)
```
You can include the reference level for categorical predictors by setting `show.reflvl = TRUE`.
```{r}
tab_model(model, show.reflvl = TRUE)
```
To show variable names, categories and include the reference level, also set `prefix.labels = "varname"`.
```{r}
tab_model(model, show.reflvl = TRUE, prefix.labels = "varname")
```
## Style of p-values
You can change the style of how p-values are displayed with the argument `p.style`. With `p.style = "stars"`, the p-values are indicated as `*` in the table.
```{r}
tab_model(m1, m2, p.style = "stars")
```
Another option would be scientific notation, using `p.style = "scientific"`, which also can be combined with `digits.p`.
```{r}
tab_model(m1, m2, p.style = "scientific", digits.p = 2)
```
### Automatic matching for named vectors
Another way to easily assign labels are _named vectors_. In this case, it doesn't matter if `pred.labels` has more labels than coefficients in the model(s), or in which order the labels are passed to `tab_model()`. The only requirement is that the labels' names equal the coefficients names as they appear in the `summary()`-output.
```{r}
# example, coefficients are "c161sex2" or "c172code3"
summary(m1)
pl <- c(
`(Intercept)` = "Intercept",
e17age = "Age (Older Person)",
c160age = "Age (Carer)",
c12hour = "Hours per Week",
barthtot = "Barthel-Index",
c161sex2 = "Gender (Carer)",
c172code2 = "Education: middle (Carer)",
c172code3 = "Education: high (Carer)",
a_non_used_label = "We don't care"
)
tab_model(
m1, m2, m3, m4,
pred.labels = pl,
dv.labels = c("Model1", "Model2", "Model3", "Model4"),
show.ci = FALSE,
show.p = FALSE,
transform = NULL
)
```
## Keep or remove coefficients from the table
Using the `terms`- or `rm.terms`-argument allows us to explicitly show or remove specific coefficients from the table output.
```{r}
tab_model(m1, terms = c("c160age", "c12hour"))
```
Note that the names of terms to keep or remove should match the coefficients names. For categorical predictors, one example would be:
```{r}
tab_model(m1, rm.terms = c("c172code2", "c161sex2"))
```