# Setup -----------------------------------------------------------
library(tidyverse)
<- read_csv("data/olympics_2016.csv") athletes
beginR: Scripting 2 - Loops and Error Handling
Data and other downloads
Today
This workshop aims to cover:
Getting started with loops
Output
While loops
Loops with conditionals and functions
Error handling
Getting Started With Loops
In Scripting 1, we learned that we ought to avoid copying and pasting chunks of code over and over again, i.e., we should reduce duplication. Last week, we reduced duplication in our code by creating our own functions. This week, we’ll learn another tool that helps avoid duplication: iteration, a.k.a., loops. Below is a very simple loop.
for(number in 1:5){
print(number)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
The for
function tells R this loop will iterate through each possible value supplied in the parentheses, known as a sequence. The sequence supplies what we are looping through. In this case, it’s the vector 1:5, so the loop is going to iterate 5 times. Our sequence also has a variable called “number”. In each iteration of the loop, “number” will be set to one of the values in the vector:
- number = 1
- number = 2
- number = 3
- etc.
“Number” is a completely arbitrary name. The variable used in our sequence can be called anything we want. The examples below will all work the same.
for(i in 1:5){
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
for(x in 1:5){
print(x)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
for(fobbywobble in 1:5){
print(fobbywobble)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
The code between the curly brackets, or the body of the loop, tells R what to do each time our sequence is iterated. The body can hold whatever kind of code we want. In the above case, we are only printing each value in the vector, but often we may want to execute many lines of code over many iterations. When that happens, we need to worry about efficiency. One way to be more efficient when writing loops is to assign an object to hold our output.
Output
Before we start the loop, we should assign a sufficient space for the output by creating an output object. If we want to store our output in a vector, we can use the vector
function to create one. vector
takes two arguments: the type of the vector and the length. For more information on object classes and types, review Week 2.3: Objects and classes.
Below, we’re creating an empty numeric vector called “output” with 5 columns.
<- vector("numeric", 5)
output
output
[1] 0 0 0 0 0
Now, each time we iterate through the loop, we can store the results in our output vector.
for (i in 1:5){
<- i
output[i]
}
output
[1] 1 2 3 4 5
Although this step may seem unnecessary with such a simple example, it is very important for efficiency when using loops with large amounts of data. If you neglect to create an output object and simply grow the output at each iteration, your loop will be very slow.
If we are using loops to modify existing data, our output object already exists, so we don’t need to create one. For this example, let’s use our dataset of olympic athletes. We have a column for each athlete’s height in centimeters. Let’s convert the heights to inches using a loop.
In this case, our output object is the column althletes$height
. Note that we can determine the length of the object by using the length()
function.
for (i in 1:length(athletes$height)) {
$height[i] <- athletes$height[i] / 2.4
athletes }
While Loops
While loops are used when we only want to run a loop while a certain condition is met. Once the condition is no longer met, the loop stops immediately.
<- 1
i
while(i < 6){
print(i)
= i + 1
i }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
This type of loop is not used very often in R, but it can be helpful in situations where we do not know how many times the loop should iterate. For example, we might use a while loop when random numbers or random sampling is involved, or when taking user input as part of an interactive application created in R Shiny.
##Loops with Conditionals and Functions
Loops and conditionals are a powerful combination that allow us to perfom many kinds of transformations. Let’s say we want to see how many of our athletes are at, above or below the average for height in our dataset, and we want to store that information in a new column. First, we need to determine the average heights for both men and women in our dataset.
%>%
athletes group_by(sex) %>%
summarize(avg_height = mean(height, na.rm = TRUE))
# A tibble: 2 × 2
sex avg_height
<chr> <dbl>
1 F 70.6
2 M 75.6
Next, let’s create a new column in our dataset to store the output. Since each column is itself a vector, we’ll use the vector
function to create a new one. It will be a character vector that’s the same length as all the other columns in the dataset.
$height_class <- vector("character", length(athletes$ID)) athletes
Next, let’s work on the conditional statements. We’ll start out by focusing on only the first record in our dataset. We’ll need to use nested conditionals in this case, because we’re checking two different columns: sex and height.
if (athletes$sex[1] == "F"){
if (athletes$height[1] > 70.6) {
$height_class[1] <- "above average"
athleteselse if (athletes$height[1] == 70.6) {
} $height_class[1] <- "average"
athleteselse {
} $height_class[1] <- "below average"
athletes
}
else{
}
if (athletes$height[1] > 75.6) {
$height_class[1] <- "above average"
athleteselse if (athletes$height[1] == 75.6) {
} $height_class[1] <- "average"
athleteselse {
} $height_class[1] <- "below average"
athletes
}
}
When we run the code above, we see that it works correctly on our first record. Of course, we don’t want to copy and paste the code over and over again for each record, which is why we’ll use it in the body of a loop. But what if we want to use it on multiple datasets? What if we want to loop this code at some point in our script, but run it again without looping at another point? This code will be less bulky and more flexible if we turn it into a function.
<- function(s, h, c) {
getHeightClass if (s == "F"){
if (h > 70.6) {
<- "above average"
c else if (h == 70.6) {
} <- "average"
c else {
} <- "below average"
c
}
else{
}
if (h > 75.6) {
<- "above average"
c else if (h == 75.6) {
} <- "average"
c else {
} <- "below average"
c
}
} }
Now let’s see if our function works on the second row.
$height_class[2] <- getHeightClass(athletes$sex[2], athletes$height[2], athletes$height_class[2]) athletes
Great! So we just need to loop through the dataset and apply the function to all of our records. In fact, we can use the same sequence from the loop we used to convert the height to inches!
for (i in 1:length(athletes$height)) {
$height_class[i] <- getHeightClass(athletes$sex[i], athletes$height[i], athletes$height_class[i])
athletes
}
Error in if (h > 75.6) {: missing value where TRUE/FALSE needed
Uh oh! It looks like we’ve got an error. Unfortunately, we are missing height information for some of the athletes. Because it’s not uncommon to encounter an error when iterating through loops, there are special functions in R for dealing with them.
Error handling
There are multiple ways of dealing with errors in loops. One of the easier ways is to ignore them and continue moving through the loop. This is accomplished with the try
function which simply wraps around the entire body of the loop.
By default, try
will continue the loop even if there’s an error, but will still show the error message. We can supress the error messages by using silent = TRUE
.
for (i in 1:length(athletes$height)) {
try(athletes$height_class[i] <- getHeightClass(athletes$sex[i], athletes$height[i], athletes$height_class[i]), silent = TRUE)
}
That works for our purposes above, but there may be times when you want to handle errors differently. You may not only want to stop the loop, but also provide a specific error message. In that case, you can use the tryCatch
function.
Let’s test this by creating a list of both numbers and characters.
<- list(12, 9, 2, "cat", 25, 10, "bird") stuff
Now, we’ll loop over the list and try to get the log of each item. When indexing lists, we need to use double square brackets.
for (i in 1:length(stuff)) {
try (print(log(stuff[[i]])))
}
[1] 2.484907
[1] 2.197225
[1] 0.6931472
Error in log(stuff[[i]]): non-numeric argument to mathematical function
[1] 3.218876
[1] 2.302585
Error in log(stuff[[i]]): non-numeric argument to mathematical function
Ok, let’s change the error message to include both the index number of the item in the list that’s giving us the error as well as the contents of that item.
for (i in 1:length(stuff)) {
tryCatch (print(log(stuff[[i]])),
error = function(e){
message(paste("An error occurred for item", i, stuff[[i]],":\n"), e)
}) }
[1] 2.484907
[1] 2.197225
[1] 0.6931472
An error occurred for item 4 cat :
Error in log(stuff[[i]]): non-numeric argument to mathematical function
[1] 3.218876
[1] 2.302585
An error occurred for item 7 bird :
Error in log(stuff[[i]]): non-numeric argument to mathematical function
Exercises
Using a loop, convert the weight of all the athletes from kilograms to pounds.
Write a loop that prints random numbers, but stops after printing a number greater than 1. Hint: use
rnorm()
.Create a new column in
athletes
called “under_21”. Use a loop to add TRUE to a record if the athlete’s age is less than 21 and FALSE if it isn’t.Use
tryCatch
to loop through every column in theathletes
dataset and print the maximum value for each numeric column. If a column is not numeric, print the error message “Column x is not numeric.” where x is the column number. Hint: don’t forget to usena.rm = TRUE
Very often in R, we will want to apply a function to multiple parts of an object. While we can often accomplish this using a for loop, we can also use certain functions that will provide the same output with fewer lines of code. There are a family of
apply
functions in base R that focus on this, as well of a family ofmap
functions included in the tidyverse package that do similar things. Read about theapply
functions on the DataCamp website. Then, read about themap
functions in R For Data Science Chapter 21.5. Which ones would you prefer to use? Why?