Data and other downloads

US Cheese Consumption

Cleaned data courtesy of the Tidy Tuesday project

Today

This workshop aims to cover:

  • Review of Previous Topics
    • Starting a New Project
    • Loading the tidyverse
    • Importing Data
    • Graphics using ggplot
    • Data transformations
    • Summarizing Data
  • Data Transformations
    • Reproducible Reports using R Markdown
  • Chapter 27

 

Start a New Project in R

It is best practice to set up a new directory each time we start a new project in R. To do so, complete the following steps:

  1. Go to File > New Project > New Directory > New Project.
  2. Type in a name for your directory and click Browse. Be sure to pick a place for your directory that you will be able to find later.
  3. Go to Finder on Mac or File Explorer on PC and find the directory you just created.
  4. Inside your project directory, create a new folder called data.
  5. Download or copy the data file (clean_cheese.csv) into the data folder.

 

Motivation

R Markdown provides a straightforward way to create reports that combine code and the output from that code with text commentary. This allows for the creation of automated, reproducible reports. R Markdown can knit together your analysis results with text and output it directly into HTML, PDF, or Word documents. In fact, we have been using R Markdown to generate the webpage for all of our R Open Labs workshops!

 

RMarkdown Structure

R Markdown has three components.

  1. An (optional) header in a language called YAML. This allows you to specify the type of output file and configure other options.
  2. R code chunks wrapped by ```
  3. Text mixed with simple formatting markup.

To create a new R Markdown document (.Rmd), select File -> New File -> R Markdown.

You will have the option to select the output: we’ll use the default HTML for this workshop. Give the document a title and enter your name as author: this will create the header for you at the top of your new .html page! RStudio will create a new R Markdown document filled with examples of code chunks and text.

Code Chunks

R code chunks are surrounded by ```. Inside the curly braces, it specifies that this code chunk will use R code (other programming languages are supported), then it names this chunk “setup”. Names are optional.

After the name, you specify options on whether you want the code or its results to be displayed in the final document. For this chunk, the include=FALSE options tells R Markdown that we want this code to run, but we do not want it to be displayed in the final HTML document. The R code inside the chunk knitr::opts_chunk$set(echo = TRUE) tells R Markdown to display the R code along with the results of the code in the HTML output for all code chunks below.

Use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks.

Formatted Text

This is plain text with simple formatting added. The ## tells R Markdown that “R Markdown” is a section header. The ** around “Knit” tells R Markdown to make that word bold.

The RStudio team has helpfully condensed these code chunk and text formatting options into a cheatsheet.

You can get pretty far with options in the R Markdown cheatsheet, but R Markdown is a very powerful, flexible language that we do not have time to fully cover. More detailed references are:

https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf

https://bookdown.org/yihui/rmarkdown

Visual R Markdown

Newer versions of R Studio provide a visual editor for R markdown documents. This can be accessed by toggling between the “Source” and “Visual” options in the top left corner of your Rmd script editor pane. (Note: In some older versions of R Studio, this is available as a compass-shaped icon in the top right corner instead).

Once activated, this interface is similar to a word processing software like Microsoft Word - shortcuts for bolding, italics, etc. are usually the same and there are icons and drop down menus available for lists, bullets, links, and more.

You can still use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks, or use the Insert>Code Chunk>R menu in the visual editor.

Read more about the visual editor here:
https://rstudio.github.io/visual-markdown-editing/

Generating the HTML document

Click the Knit button, and R Studio will generate an HTML report based on your R Markdown document.

Let’s try creating an R Markdown document to explore the US cheese consumption data and review what we learned in weeks 1-3.

Data Import

consumption <- read_csv("data/clean_cheese.csv")
## Rows: 48 Columns: 17
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (17): Year, Cheddar, American Other, Mozzarella, Italian other, Swiss, B...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Useful functions for exploring dataframes

Include one of the following in your document. We’ve used eval = FALSE here to prevent this code chunk from running!

head(consumption)
tail(consumption)
summary(consumption)

Tables and knitr::kable

By default, R Markdown will display tables the way they appear in the R console. We can use knitr::kable function to get cleaner tables.

knitr::kable(head(consumption), caption = "The first six rows of the cheese consumption data")
The first six rows of the cheese consumption data
Year Cheddar American Other Mozzarella Italian other Swiss Brick Muenster Cream and Neufchatel Blue Other Dairy Cheese Processed Cheese Foods and spreads Total American Chese Total Italian Cheese Total Natural Cheese Total Processed Cheese Products
1970 5.79 1.20 1.19 0.87 0.88 0.10 0.17 0.61 0.15 0.41 3.32 2.20 7.00 2.05 11.37 5.53
1971 5.91 1.42 1.38 0.92 0.94 0.11 0.19 0.62 0.15 0.41 3.55 2.31 7.32 2.29 12.03 5.86
1972 6.01 1.67 1.57 1.02 1.06 0.10 0.22 0.63 0.17 0.56 3.51 2.62 7.68 2.59 13.01 6.13
1973 6.07 1.76 1.76 1.03 1.06 0.11 0.21 0.66 0.18 0.65 3.31 2.68 7.83 2.80 13.49 5.99
1974 6.31 2.16 1.86 1.09 1.18 0.11 0.23 0.70 0.16 0.61 3.42 2.92 8.47 2.95 14.41 6.34
1975 6.04 2.11 2.11 1.12 1.09 0.09 0.24 0.74 0.16 0.57 3.35 3.35 8.15 3.23 14.27 6.69

Adding a new variable

We’ve covered two ways to add a new variable to a dataframe.

Note: R allows non-standard variable names that include spaces, parentheses, and other special characters. The way to refer to variable names that contain wonky symbols is to use the backtick symbol `, found at the top left of your keyboard with the tilde ~.

The base R way covered in lesson 1 using the $ operator and with() function

#Base R way, covered in lesson 1
consumption$amer_ital_ratio <- with(consumption, `Total American Cheese` / `Total Italian Cheese`)
## Error in eval(substitute(expr), data, enclos = parent.frame()): object 'Total American Cheese' not found

Oops. Better check the variable names.

consumption$amer_ital_ratio1 <- with(consumption, `Total American Chese` / `Total Italian Cheese`)

The tidyverse way covered in lesson 3 using the mutate() function

#Tidyverse way, covered in lesson 3
consumption <- mutate(consumption, amer_ital_ratio2 = `Total American Chese` / `Total Italian Cheese`)

Selecting Columns

consumption <- select(consumption, Year, Cheddar, Mozzarella, `Cream and Neufchatel`)

Renaming Columns

consumption <- rename(consumption, Cream_and_Neufchatel = `Cream and Neufchatel`)

Plotting

ggplot(consumption, aes(x = Year)) + 
    geom_point(aes(y = Cheddar, col = "Cheddar")) + 
    geom_point(aes(y = Mozzarella, col = "Mozzarella")) + 
    geom_point(aes(y = Cream_and_Neufchatel, col = "Cream and Neufchatel")) +
    ylab("Consumption in Pounds Per Person")

Hmm. I don’t love that legend title. Time for google!

ggplot(consumption, aes(x = Year)) + 
    geom_point(aes(y = Cheddar, col = "Cheddar")) + 
    geom_point(aes(y = Mozzarella, col = "Mozzarella")) + 
    geom_point(aes(y = Cream_and_Neufchatel, col = "Cream and Neufchatel")) +
    ylab("Consumption in Pounds Per Person") + 
    guides(col=guide_legend(title="Cheese Type"))

Bibliography

R Markdown also provides a nifty way to incorporate a bibliography and references. We’ll haven an example of this in the exercises, but here’s a brief summary of the steps required to use a BibTex bibliography.

  1. Create a plain-text .bib file in the same directory as your .Rmd R Markdown document.
  2. Fill that .bib file with BibTex citations. The Citation Machine can generate the BibTex citations for you.
  3. Make sure each citation has a unique citation-key, the first entry
  4. Add a bibliography field to the YAML header that tells R Markdown the name of your bibliography file, for example bibliography: references.bib
  5. Add citations throughout the document using square brackets.

Review

We have now covered R Studio projects and R Markdown. With these two tools, you have everything you need to create portable and reproducible reports. We’ve also introduced tidyverse tools for data import (readr), graphics (ggplot2), and data transformation (dplyr). These are foundational tools that you will need in every data analysis project.

Exercises:

  1. Download the cheese RStudio Project file and extract the R Project contained within. Then, knit the cheeseConsumption.Rmd report. It should generate an HTML report for you.

  2. In the cheeseConsumption.Rmd file, find the code chunk named setup. change echo=FALSE to echo=TRUE. Try knitting the document again. What changed? Did this affect the whole document?

  3. In the cheeseConsumption.Rmd file, find the code chunk named import. change message=FALSE to message=TRUE. Try knitting the document again. What changed? Did this affect the whole document?

  4. Create another R Markdown document analyzing cheese production data contained in the state_milk_productions.csv file. You can use the data dictionary found here to make sense of the different variables. Hint: you’ll need to use the group-by %>% summarize idiom we learned in Week 3 to sum up all the state level data within each year. You’ll probably want to feed the output from that group-by %>% summarize step into knitr::kable() to get a prettier table for your report.

  5. Once you have created an R Markdown report analyzing cheese production, send the entire R Project to a friend (or us!) and ask them to knit that .Rmd document. If they have RStudio and the tidyverse installed, they should be able to seamlessly generate the exact report you generated, without having to make any changes.