This workshop aims to cover:
It is best practice to set up a new directory each time we start a new project in R. To do so, complete the following steps:
R Markdown provides a straightforward way to create reports that combine code and the output from that code with text commentary. This allows for the creation of automated, reproducible reports. R Markdown can knit together your analysis results with text and output it directly into HTML, PDF, or Word documents. In fact, we have been using R Markdown to generate the webpage for all of our R Open Labs workshops!
R Markdown has three components.
```
To create a new R Markdown document (.Rmd), select File -> New File -> R Markdown.
You will have the option to select the output: we’ll use the default HTML for this workshop. Give the document a title and enter your name as author: this will create the header for you at the top of your new .html page! RStudio will create a new R Markdown document filled with examples of code chunks and text.
At the top of the page is the optional Yet Another Markup Language (YAML) header. This header is a powerful way to edit the formatting of your report (e.g. figure dimensions, presence of a table of contents, identifying the location of a bibliography file).
R code chunks are surrounded
by ```
. Inside the curly braces, it specifies that this
code chunk will use R code (other programming languages are supported),
then it names this chunk “setup”. Names are optional.
After the name, you specify options on whether you want the code or
its results to be displayed in the final document. For this chunk, the
include=FALSE
options tells R Markdown that we want this
code to run, but we do not want it to be displayed in the final HTML
document. The R code inside the chunk
knitr::opts_chunk$set(echo = TRUE)
tells R Markdown to
display the R code along with the results of the code in the HTML output
for all code chunks below.
Use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks.
This is plain text with simple formatting added. The ##
tells R Markdown that “R Markdown” is a section header. The
**
around “Knit” tells R Markdown to make that word
bold.
The RStudio team has helpfully condensed these code chunk and text formatting options into a cheatsheet.
You can get pretty far with options in the R Markdown cheatsheet, but R Markdown is a very powerful, flexible language that we do not have time to fully cover. More detailed references are:
https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf
Newer versions of R Studio provide a visual editor for R markdown documents. This can be accessed by toggling between the “Source” and “Visual” options in the top left corner of your Rmd script editor pane. (Note: In some older versions of R Studio, this is available as a compass-shaped icon in the top right corner instead).
Once activated, this interface is similar to a word processing software like Microsoft Word - shortcuts for bolding, italics, etc. are usually the same and there are icons and drop down menus available for lists, bullets, links, and more.
You can still use CTRL+ALT+i (PC) or CMD+OPTION+i (Mac) to insert R code blocks, or use the Insert>Code Chunk>R menu in the visual editor.
Read more about the visual editor here:
https://rstudio.github.io/visual-markdown-editing/
Click the Knit button, and R Studio will generate an HTML report based on your R Markdown document.
Let’s try creating an R Markdown document to explore the US cheese consumption data and review what we learned in weeks 1-3.
consumption <- read_csv("data/clean_cheese.csv")
## Rows: 48 Columns: 17
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (17): Year, Cheddar, American Other, Mozzarella, Italian other, Swiss, B...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Include one of the following in your document. We’ve used
eval = FALSE
here to prevent this code chunk from
running!
head(consumption)
tail(consumption)
summary(consumption)
knitr::kable
By default, R Markdown will display tables the way they appear in the
R console. We can use knitr::kable function
to get cleaner
tables.
knitr::kable(head(consumption), caption = "The first six rows of the cheese consumption data")
Year | Cheddar | American Other | Mozzarella | Italian other | Swiss | Brick | Muenster | Cream and Neufchatel | Blue | Other Dairy Cheese | Processed Cheese | Foods and spreads | Total American Chese | Total Italian Cheese | Total Natural Cheese | Total Processed Cheese Products |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1970 | 5.79 | 1.20 | 1.19 | 0.87 | 0.88 | 0.10 | 0.17 | 0.61 | 0.15 | 0.41 | 3.32 | 2.20 | 7.00 | 2.05 | 11.37 | 5.53 |
1971 | 5.91 | 1.42 | 1.38 | 0.92 | 0.94 | 0.11 | 0.19 | 0.62 | 0.15 | 0.41 | 3.55 | 2.31 | 7.32 | 2.29 | 12.03 | 5.86 |
1972 | 6.01 | 1.67 | 1.57 | 1.02 | 1.06 | 0.10 | 0.22 | 0.63 | 0.17 | 0.56 | 3.51 | 2.62 | 7.68 | 2.59 | 13.01 | 6.13 |
1973 | 6.07 | 1.76 | 1.76 | 1.03 | 1.06 | 0.11 | 0.21 | 0.66 | 0.18 | 0.65 | 3.31 | 2.68 | 7.83 | 2.80 | 13.49 | 5.99 |
1974 | 6.31 | 2.16 | 1.86 | 1.09 | 1.18 | 0.11 | 0.23 | 0.70 | 0.16 | 0.61 | 3.42 | 2.92 | 8.47 | 2.95 | 14.41 | 6.34 |
1975 | 6.04 | 2.11 | 2.11 | 1.12 | 1.09 | 0.09 | 0.24 | 0.74 | 0.16 | 0.57 | 3.35 | 3.35 | 8.15 | 3.23 | 14.27 | 6.69 |
We’ve covered two ways to add a new variable to a dataframe.
Note: R allows non-standard variable names that
include spaces, parentheses, and other special characters. The way to
refer to variable names that contain wonky symbols is to use the
backtick symbol `
, found at the top left of your keyboard
with the tilde ~
.
The base R way covered in lesson 1 using the $
operator
and with()
function
#Base R way, covered in lesson 1
consumption$amer_ital_ratio <- with(consumption, `Total American Cheese` / `Total Italian Cheese`)
## Error in eval(substitute(expr), data, enclos = parent.frame()): object 'Total American Cheese' not found
Oops. Better check the variable names.
consumption$amer_ital_ratio1 <- with(consumption, `Total American Chese` / `Total Italian Cheese`)
The tidyverse way covered in lesson 3 using the mutate()
function
#Tidyverse way, covered in lesson 3
consumption <- mutate(consumption, amer_ital_ratio2 = `Total American Chese` / `Total Italian Cheese`)
consumption <- select(consumption, Year, Cheddar, Mozzarella, `Cream and Neufchatel`)
consumption <- rename(consumption, Cream_and_Neufchatel = `Cream and Neufchatel`)
ggplot(consumption, aes(x = Year)) +
geom_point(aes(y = Cheddar, col = "Cheddar")) +
geom_point(aes(y = Mozzarella, col = "Mozzarella")) +
geom_point(aes(y = Cream_and_Neufchatel, col = "Cream and Neufchatel")) +
ylab("Consumption in Pounds Per Person")
Hmm. I don’t love that legend title. Time for google!
ggplot(consumption, aes(x = Year)) +
geom_point(aes(y = Cheddar, col = "Cheddar")) +
geom_point(aes(y = Mozzarella, col = "Mozzarella")) +
geom_point(aes(y = Cream_and_Neufchatel, col = "Cream and Neufchatel")) +
ylab("Consumption in Pounds Per Person") +
guides(col=guide_legend(title="Cheese Type"))
R Markdown also provides a nifty way to incorporate a bibliography and references. We’ll haven an example of this in the exercises, but here’s a brief summary of the steps required to use a BibTex bibliography.
bibliography: references.bib
We have now covered R Studio projects and R Markdown. With these two tools, you have everything you need to create portable and reproducible reports. We’ve also introduced tidyverse tools for data import (readr), graphics (ggplot2), and data transformation (dplyr). These are foundational tools that you will need in every data analysis project.
Download the cheese RStudio Project file and extract the R Project contained within. Then, knit the cheeseConsumption.Rmd report. It should generate an HTML report for you.
In the cheeseConsumption.Rmd file, find the code chunk named
setup. change echo=FALSE
to
echo=TRUE
. Try knitting the document again. What changed?
Did this affect the whole document?
In the cheeseConsumption.Rmd file, find the code chunk named
import. change message=FALSE
to
message=TRUE
. Try knitting the document again. What
changed? Did this affect the whole document?
Create another R Markdown document analyzing cheese production data contained in the state_milk_productions.csv file. You can use the data dictionary found here to make sense of the different variables. Hint: you’ll need to use the group-by %>% summarize idiom we learned in Week 3 to sum up all the state level data within each year. You’ll probably want to feed the output from that group-by %>% summarize step into knitr::kable() to get a prettier table for your report.
Once you have created an R Markdown report analyzing cheese production, send the entire R Project to a friend (or us!) and ask them to knit that .Rmd document. If they have RStudio and the tidyverse installed, they should be able to seamlessly generate the exact report you generated, without having to make any changes.