Note: This version uses |> pipes instead of %>%. In most places they should be interchangeable.
Start a New Project in R
It is best practice to set up a new directory each time we start a new project in R. To do so, complete the following steps:
Go to File > New Project > New Directory > New Project.
Type in a name for your directory and click Browse. Be sure to pick a place for your directory that you will be able to find later.
Go to Finder on Mac or File Explorer on PC and find the directory you just created.
Inside your project directory, create a new folder called data.
Download or copy the data file (clean_cheese.csv) into the data folder.
Motivation
R Markdown provides a straightforward way to create reports that combine code and the output from that code with text commentary. This allows for the creation of automated, reproducible reports. R Markdown can knit together your analysis results with text and output it directly into HTML, PDF, or Word documents. In fact, we have been using R Markdown to generate the webpage for all of our R Open Labs workshops!
RMarkdown Structure
R Markdown has three components.
An (optional) header in a language called YAML. This allows you to specify the type of output file and configure other options.
R code chunks wrapped by ```
Text mixed with simple formatting markup.
To create a new R Markdown document (.Rmd), select File -> New File -> R Markdown.
You will have the option to select the output: we’ll use the default HTML for this workshop. Give the document a title and enter your name as author: this will create the header for you at the top of your new .html page! RStudio will create a new R Markdown document filled with examples of code chunks and text.
Header
At the top of the page is the optional Yet Another Markup Language (YAML) header. This header is a powerful way to edit the formatting of your report (e.g. figure dimensions, presence of a table of contents, identifying the location of a bibliography file).
Code Chunks
R code chunks are surrounded by ```. Inside the curly braces, it specifies that this code chunk will use R code (other programming languages are supported), then it names this chunk “setup”. Names are optional.
After the name, you specify options on whether you want the code or its results to be displayed in the final document. For this chunk, the include=FALSE options tells R Markdown that we want this code to run, but we do not want it to be displayed in the final HTML document. The R code inside the chunk knitr::opts_chunk$set(echo = TRUE) tells R Markdown to display the R code along with the results of the code in the HTML output for all code chunks below.
Use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks.
Formatted Text
This is plain text with simple formatting added. The ## tells R Markdown that “R Markdown” is a section header. The ** around “Knit” tells R Markdown to make that word bold.
The RStudio team has helpfully condensed these code chunk and text formatting options into a cheatsheet.
You can get pretty far with options in the R Markdown cheatsheet, but R Markdown is a very powerful, flexible language that we do not have time to fully cover. More detailed references are:
Newer versions of R Studio provide a visual editor for R markdown documents. This can be accessed by toggling between the “Source” and “Visual” options in the top left corner of your Rmd script editor pane. (Note: In some older versions of R Studio, this is available as a compass-shaped icon in the top right corner instead).
Once activated, this interface is similar to a word processing software like Microsoft Word - shortcuts for bolding, italics, etc. are usually the same and there are icons and drop down menus available for lists, bullets, links, and more.
You can still use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks, or use the Insert>Code Chunk>R menu in the visual editor.
Click the Knit button, and R Studio will generate an HTML report based on your R Markdown document.
Let’s try creating an R Markdown document to explore the US cheese consumption data and review what we learned in weeks 1-3.
Data Import
consumption <-read_csv("data/clean_cheese.csv")
Rows: 48 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (17): Year, Cheddar, American Other, Mozzarella, Italian other, Swiss, B...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Useful functions for exploring dataframes
Include one of the following in your document. We’ve used eval = FALSE here to prevent this code chunk from running!
By default, R Markdown will display tables the way they appear in the R console. We can use knitr::kable function to get cleaner tables.
knitr::kable(head(consumption), caption ="The first six rows of the cheese consumption data")
The first six rows of the cheese consumption data
Year
Cheddar
American Other
Mozzarella
Italian other
Swiss
Brick
Muenster
Cream and Neufchatel
Blue
Other Dairy Cheese
Processed Cheese
Foods and spreads
Total American Chese
Total Italian Cheese
Total Natural Cheese
Total Processed Cheese Products
1970
5.79
1.20
1.19
0.87
0.88
0.10
0.17
0.61
0.15
0.41
3.32
2.20
7.00
2.05
11.37
5.53
1971
5.91
1.42
1.38
0.92
0.94
0.11
0.19
0.62
0.15
0.41
3.55
2.31
7.32
2.29
12.03
5.86
1972
6.01
1.67
1.57
1.02
1.06
0.10
0.22
0.63
0.17
0.56
3.51
2.62
7.68
2.59
13.01
6.13
1973
6.07
1.76
1.76
1.03
1.06
0.11
0.21
0.66
0.18
0.65
3.31
2.68
7.83
2.80
13.49
5.99
1974
6.31
2.16
1.86
1.09
1.18
0.11
0.23
0.70
0.16
0.61
3.42
2.92
8.47
2.95
14.41
6.34
1975
6.04
2.11
2.11
1.12
1.09
0.09
0.24
0.74
0.16
0.57
3.35
3.35
8.15
3.23
14.27
6.69
Adding a new variable
We’ve covered two ways to add a new variable to a dataframe.
Note: R allows non-standard variable names that include spaces, parentheses, and other special characters. The way to refer to variable names that contain wonky symbols is to use the backtick symbol `, found at the top left of your keyboard with the tilde ~.
The base R way covered in lesson 1 using the $ operator and with() function
#Base R way, covered in lesson 1consumption$amer_ital_ratio <-with(consumption, `Total American Cheese`/`Total Italian Cheese`)
Error in eval(substitute(expr), data, enclos = parent.frame()): object 'Total American Cheese' not found
Oops. Better check the variable names.
consumption$amer_ital_ratio1 <-with(consumption, `Total American Chese`/`Total Italian Cheese`)
The tidyverse way covered in lesson 3 using the mutate() function
#Tidyverse way, covered in lesson 3consumption <-mutate(consumption, amer_ital_ratio2 =`Total American Chese`/`Total Italian Cheese`)
Selecting Columns
consumption <-select(consumption, Year, Cheddar, Mozzarella, `Cream and Neufchatel`)
Renaming Columns
consumption <-rename(consumption, Cream_and_Neufchatel =`Cream and Neufchatel`)
Plotting
ggplot(consumption, aes(x = Year)) +geom_point(aes(y = Cheddar, col ="Cheddar")) +geom_point(aes(y = Mozzarella, col ="Mozzarella")) +geom_point(aes(y = Cream_and_Neufchatel, col ="Cream and Neufchatel")) +ylab("Consumption in Pounds Per Person")
ggplot(consumption, aes(x = Year)) +geom_point(aes(y = Cheddar, col ="Cheddar")) +geom_point(aes(y = Mozzarella, col ="Mozzarella")) +geom_point(aes(y = Cream_and_Neufchatel, col ="Cream and Neufchatel")) +ylab("Consumption in Pounds Per Person") +guides(col=guide_legend(title="Cheese Type"))
Bibliography
R Markdown also provides a nifty way to incorporate a bibliography and references. We’ll haven an example of this in the exercises, but here’s a brief summary of the steps required to use a BibTex bibliography.
Create a plain-text .bib file in the same directory as your .Rmd R Markdown document.
Fill that .bib file with BibTex citations. The Citation Machine can generate the BibTex citations for you.
Make sure each citation has a unique citation-key, the first entry
Add a bibliography field to the YAML header that tells R Markdown the name of your bibliography file, for example bibliography: references.bib
Add citations throughout the document using square brackets.
Review
We have now covered R Studio projects and R Markdown. With these two tools, you have everything you need to create portable and reproducible reports. We’ve also introduced tidyverse tools for data import (readr), graphics (ggplot2), and data transformation (dplyr). These are foundational tools that you will need in every data analysis project.
Exercises:
Download the cheese RStudio Project file and extract the R Project contained within. Then, knit the cheeseConsumption.Rmd report. It should generate an HTML report for you.
In the cheeseConsumption.Rmd file, find the code chunk named setup. change echo=FALSE to echo=TRUE. Try knitting the document again. What changed? Did this affect the whole document?
In the cheeseConsumption.Rmd file, find the code chunk named import. change message=FALSE to message=TRUE. Try knitting the document again. What changed? Did this affect the whole document?
Create another R Markdown document analyzing cheese production data contained in the state_milk_productions.csv file. You can use the data dictionary found here to make sense of the different variables. Hint: you’ll need to use the group-by %>% summarize idiom we learned in Week 3 to sum up all the state level data within each year. You’ll probably want to feed the output from that group-by %>% summarize step into knitr::kable() to get a prettier table for your report.
Once you have created an R Markdown report analyzing cheese production, send the entire R Project to a friend (or us!) and ask them to knit that .Rmd document. If they have RStudio and the tidyverse installed, they should be able to seamlessly generate the exact report you generated, without having to make any changes.