beginR Extra: Regression Tables

Author

University of North Carolina at Chapel Hill

Necessary Packages

#install.packages(c("modelsummary", "tinytable", "gt", "kableExtra"))
library(modelsummary)

Formula notation in R

lm is R’s function for fitting Ordinary Least Squares (OLS) regressions. There are many other specialized regression models available in R, but fortunately the vast majority use similar notation for formulas.

The arguments given to lm are:

  • Formula: y~x
    • R uses ~ in place of the equals sign in formulas

    • R automatically includes an intercept term

    • This formula is therefore equivalent to:

      \[y=intercept + slope * x\]

  • Data:
    • The dataframe containing the variables of interest.
Symbol Role Example Equivalent
+ Add variable mpg~vs+disp \[mpg = intercept + \beta_1 vs + \beta_2 disp\]
* Interactions mpg~vs*disp \[mpg = intercept + \beta_1 vs + \beta_2 disp + \beta_3 vs*disp\]
. Include all variables in dataframe mpg~. \[mpg = intercept + \beta_1 cyl + \beta_2 disp + ... + \beta_{10} carb\]
- Exclude variable mpg~.-disp-hp \[mpg = intercept + \beta_1 cyl + \beta_2 drat + ... + \beta_{8} carb\]

Let’s fit a simple model with the iris built-in data:

Formatting lm style model results

Once you’ve fit a linear or some other model, you may want to report results. The modelsummary package makes this relatively simple to do, especially in an Quarto document. The code below will produce a common model summary format for a journal or presentation.

data(iris)
mod <- lm(data=iris,Sepal.Length~Species)
mod1 <- lm(data=iris,Sepal.Length~Petal.Width+Species)
mod2 <- lm(data=iris,Sepal.Length~Petal.Length+Petal.Width+Species)

model_list <- list("Species"=mod,
                  "Species+Width" = mod1,
                  "Species+Width+Length" = mod2)

modelsummary(model_list)
Species Species+Width Species+Width+Length
(Intercept) 5.006 4.780 3.683
(0.073) (0.083) (0.107)
Speciesversicolor 0.930 -0.060 -1.598
(0.103) (0.230) (0.206)
Speciesvirginica 1.582 -0.050 -2.113
(0.103) (0.358) (0.304)
Petal.Width 0.917 -0.006
(0.194) (0.156)
Petal.Length 0.906
(0.074)
Num.Obs. 150 150 150
R2 0.619 0.669 0.837
R2 Adj. 0.614 0.663 0.832
AIC 231.5 212.1 108.2
BIC 243.5 227.1 126.3
Log.Lik. -111.726 -101.034 -48.116
F 119.265 98.525 185.769
RMSE 0.51 0.47 0.33

The list("name"=object) notation creates a named list in R. This is useful here to provide names for your models. Names are not required!

Remember, you can use ?modelsummary to get a full list of parameters to change the output table. Here are some commonly changed parameters:

modelsummary(model_list, 
             stars=T, #add significance stars
             fmt = 2, #2 digits
             statistic = "[{conf.low}, {conf.high}]", #replace standard errors with confidence intervals
             coef_rename = c("Petal.Width" = "Petal Width", #rename variables
                             "Petal,Length" = "Petal Length",
                             "Speciesversicolor" = "Versicolor",
                             "Speciesvirginica" = "Virginica"),
             notes = "These models are fit to Fisher's or Anderson's iris dataset.")
Species Species+Width Species+Width+Length
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
These models are fit to Fisher's or Anderson's iris dataset.
(Intercept) 5.01*** 4.78*** 3.68***
[4.86, 5.15] [4.62, 4.94] [3.47, 3.90]
Versicolor 0.93*** -0.06 -1.60***
[0.73, 1.13] [-0.52, 0.40] [-2.00, -1.19]
Virginica 1.58*** -0.05 -2.11***
[1.38, 1.79] [-0.76, 0.66] [-2.71, -1.51]
Petal Width 0.92*** -0.01
[0.53, 1.30] [-0.31, 0.30]
Petal.Length 0.91***
[0.76, 1.05]
Num.Obs. 150 150 150
R2 0.619 0.669 0.837
R2 Adj. 0.614 0.663 0.832
AIC 231.5 212.1 108.2
BIC 243.5 227.1 126.3
Log.Lik. -111.726 -101.034 -48.116
F 119.265 98.525 185.769
RMSE 0.51 0.47 0.33

Output Formats

modelsummary can output a wide variety of filetypes with the output argument. Some of these outputs may require installing additional packages.

modelsummary(model_list, output = "my_models.xlsx")
modelsummary(model_list, output = "my_models.docx")
modelsummary(model_list, output = "my_models.html")
modelsummary(model_list, output = "my_models.png")

Sometimes it can be useful to output a table to one of the more general table packages in R for formatting. Some of the most common packages for general tables are:

  • gt
  • kableExtra
  • flextable
  • huxtable
#install.packages("gt")
library(gt)
modelsummary(model_list, output = "gt") %>% 
  gt::tab_header("My Regression Models") %>% 
  gt::tab_spanner(label = "Simple Regression", columns = 2) %>% 
  gt::tab_spanner(label = "Multiple Regression", columns = 3:4)
My Regression Models
Simple Regression Multiple Regression
Species Species+Width Species+Width+Length
(Intercept) 5.006 4.780 3.683
(0.073) (0.083) (0.107)
Speciesversicolor 0.930 -0.060 -1.598
(0.103) (0.230) (0.206)
Speciesvirginica 1.582 -0.050 -2.113
(0.103) (0.358) (0.304)
Petal.Width 0.917 -0.006
(0.194) (0.156)
Petal.Length 0.906
(0.074)
Num.Obs. 150 150 150
R2 0.619 0.669 0.837
R2 Adj. 0.614 0.663 0.832
AIC 231.5 212.1 108.2
BIC 243.5 227.1 126.3
Log.Lik. -111.726 -101.034 -48.116
F 119.265 98.525 185.769
RMSE 0.51 0.47 0.33

Data Summaries

The datasummary family of functions can be useful for overview tables within modelsummary.

datasummary_skim(iris)
Unique Missing Pct. Mean SD Min Median Max Histogram
Sepal.Length 35 0 5.8 0.8 4.3 5.8 7.9
Sepal.Width 23 0 3.1 0.4 2.0 3.0 4.4
Petal.Length 43 0 3.8 1.8 1.0 4.3 6.9
Petal.Width 22 0 1.2 0.8 0.1 1.3 2.5
Species N %
setosa 50 33.3
versicolor 50 33.3
virginica 50 33.3

Here’s an example with another sample dataset:

#install.packages("palmerpenguins")
library(palmerpenguins)
data(penguins)
datasummary_skim(penguins)
Unique Missing Pct. Mean SD Min Median Max Histogram
bill_length_mm 165 1 43.9 5.5 32.1 44.5 59.6
bill_depth_mm 81 1 17.2 2.0 13.1 17.3 21.5
flipper_length_mm 56 1 200.9 14.1 172.0 197.0 231.0
body_mass_g 95 1 4201.8 802.0 2700.0 4050.0 6300.0
year 3 0 2008.0 0.8 2007.0 2008.0 2009.0
N %
species Adelie 152 44.2
Chinstrap 68 19.8
Gentoo 124 36.0
island Biscoe 168 48.8
Dream 124 36.0
Torgersen 52 15.1
sex female 165 48.0
male 168 48.8

Exercise

  • Pick a sample dataset (call data() to get a list of available datasets, or use data(penguins) from library(palmerpenguins)).
  • Pick some variables and use lm to fit an OLS model.
  • Create a modelsummary table output. Embed in a Quarto document or output html.
  • Use ?modelsummary to learn about and then change one or more default settings, or use a table package to further format your table.