Data and other downloads

Weather in Austin, TX

Today

This workshop aims to cover:

  • When you should write a function
  • Steps to writing a function
  • Naming conventions
  • Arguments
  • Returns
  • Conditionals
  • Environment

  • R for Data Science Chapter 19

 

When you should write a function

So far, you have been using functions that are either available in base R or written by third parties and distributed in packages that you have been loading into R. With so many functions available from so many different places, why would you ever need to write your own?

You should consider writing a function when you find yourself copying and pasting chunks of code over and over again.

Take a look at the example below. I have a dataset that includes temperatures for the city of Austin, TX over time. All of the temperatures are in Farenheight, and I want to change them to Celcius.

# Setup -----------------------------------------------------------
library(tidyverse)

# Data Import ------------------------------------------------------
weather <- read_csv("data/austin_weather.csv")

#Convert temperature columns from Farenheight to Celcius
weather$TempHigh <- (weather$TempHigh - 32) * 5/9
weather$TempAvg <- (weather$TempAvg - 32) * 5/9
weather$TempLow <- (weather$TempLow - 32) * 5/9
weather$DewPointHigh <- (weather$TempHigh - 32) * 5/9
weather$DewPointAvg <- (weather$DewPointAvg - 32) * 5/9
weather$DewPointLow <- (weather$DewPointLow - 32) * 5/9

Can you see the mistake I made above? I forgot to change the second reference to the weather$DewPointHigh column. It will end up having the wrong values if I run the code. Copying and pasting like this leaves our code very prone to human errors that can be difficult for us to notice. In the next example, I’ve written a function that can do the same work in a more efficient and less fallible way.

# Function for Converting to Celcius
toCelcius <- function(x) {
  (x - 32) * 5/9
}

# Convert relevant columns to Celcius
weather[2:7] <- toCelcius(weather[2:7])

Note that we can now use our toCelcius function to convert other values to Celcius - any value we can think of!

toCelcius(350)
## [1] 176.6667

Steps to writing a function

  1. Pick a name.
  2. List inputs, or arguments.
  3. Add code to the body of the function inside the curly brackets.

Let’s go over each of these steps in detail.

Naming conventions

R doesn’t (usually) care what your functions are called, but it’s important to give them names that are meaningful to other people. It should be short and easy to write, but also descriptive enough to clearly evoke what the function does. That can be difficult!

One of the more important things to remember is that your function name should not overriride functions that already exist in base R. You should try not to override names of functions in third party packages either, but it is impossible to know of every function that does or doesn’t already exist. You can always try using the ? character to see if a function already exists in base R.

Arguments

Arguments supply important information our function needs in order to work. In the toCelcius function, we use an argument called x to represent any data that needs to be converted to Celcius. Many functions have an argument for the data that needs to be supplied, and many also include additional arguments that control specific details necessary to determine how the function should run.

For example, the log() function uses argument x to reperesent the data it’s computing, but also an argument called base which lets the function know what base to use for the logorithm (base 2, base 10, etc.)

Likewise, the mean() function uses argument x for data as well as argument na.rm to specify how it should handle missing values.

Let’s make a function that counts the total number of seconds in any given combination of hours, minutes and seconds. Our function needs three arguments: one for hours, one for minutes and one for seconds.

toSeconds <- function(h, m, s) {
  (h * 3600) + (m * 60) + s
}

#Count the total number of seconds in 2 hours, 8 minutes and 36 seconds.
toSeconds(h = 2, m = 8, s = 36)
## [1] 7716
#Typing in the name of each argument is optional if we already know the order they are supposed to be in.
toSeconds(2, 8, 36)
## [1] 7716

Returns

In R, the “return” value of a function is implicit, but we can include it if we want to. Below is an example of the toSeconds function with the return value explicitly written in.

toSeconds <- function(h, m, s) {
  total <- (h * 3600) + (m * 60) + s
  return(total)
}

Specifying the return value can be helpful if we are writing a very long and complicated function and we want it to be easy for someone else to read and understand. It’s also important to think about our return values when we want to make our function pipeable.

With pipes, we are often passing the return value of the first function to the first argument of the second function. Below, we’ll use a pipe to pass the values from the toCelcius function we wrote earlier to the arrange function so we can order by the lowest average temperature in Celcius.

weather %>%
  select(TempHigh:DewPointLow) %>%
  toCelcius() %>%
  arrange(TempAvg)
##       TempHigh   TempAvg   TempLow DewPointHigh DewPointAvg DewPointLow
## 1    -26.96845 -28.16872 -29.36900    -32.76025   -31.76955   -32.79835
## 2    -27.65432 -28.16872 -28.68313    -33.14129   -30.56927   -30.91221
## 3    -27.31139 -28.16872 -29.19753    -32.95077   -30.56927   -30.91221
## 4    -26.96845 -27.99726 -29.19753    -32.76025   -31.08368   -32.28395
## 5    -26.79698 -27.65432 -28.68313    -32.66499   -31.08368   -31.42661
## 6    -25.59671 -27.65432 -29.88340    -31.99817   -31.25514   -32.11248
## 7    -26.28258 -27.48285 -28.68313    -32.37921   -30.22634   -31.42661
## 8    -26.45405 -27.48285 -28.51166    -32.47447   -30.74074   -31.94102
## 9    -26.62551 -27.31139 -28.16872    -32.56973   -29.02606   -30.22634
## 10   -26.96845 -27.31139 -27.65432    -32.76025   -27.82579   -29.88340
## 11   -26.79698 -27.31139 -27.99726    -32.66499   -28.16872   -28.68313
## 12   -26.79698 -27.31139 -27.99726    -32.66499   -28.16872   -28.68313
## 13   -26.96845 -27.31139 -27.82579    -32.76025   -28.85460   -29.88340
## 14   -26.96845 -27.31139 -27.82579    -32.76025   -27.82579   -28.68313
## 15   -25.42524 -27.31139 -29.19753    -31.90291   -31.08368   -31.76955
## 16   -24.91084 -27.13992 -29.36900    -31.61713   -31.25514   -32.45542
##  [ reached getOption("max.print") -- omitted 1303 rows ]

Notice that we don’t need to specify the x argument in our toCelcius function above because it is supplied by the return value of the the select function. Furthermore, we don’t need to specify the data argument in our arrange function because it is being supplied by the return value of toCelcius.

Another context in which we may want to specify our return values is when we’re working with conditionals.

Conditionals

Conditionals are often called “if/else statements” because they work like this:

if (condition) {
  # If the condition is TRUE, run the code the code that is written here.
} else {
  # If the condition is FALSE, run the code the code that is written here.
}

Using explicit return statements are necessary when the return value is different depending on the condition. For example, let’s write a function that returns the larger of two numbers:

isBigger <- function(x, y) {
  if (x > y) {
    return(x)
  } else {
    return(y)
  }
}

isBigger(11, 200)
## [1] 200

But wait! What if both numbers are the same? Should we make our function respond to that? We can add more than two conditions using the following syntax:

isBigger <- function(x, y) {
  if (x > y) {
    return(x)
  } else if (x == y) {
    return("They are equal.")
  } else {
    return(y)
  }
}

isBigger(80, 80)
## [1] "They are equal."

If you have a function that uses many different conditions, it can be cumbersome to read and write. Luckily, R provides us with a function called switch that we can use to make conditionals look better.

doThis <- function(x, y, operation) {
    switch(operation,
     add = x + y,
     subtract = x - y,
     multiply = x * y,
     divide = x / y
   )
 }

doThis(120, 2, "divide")
## [1] 60

Environment

What’s wrong with the function below?

getBirthYr <- function(age) {
    crntYear - age
}

That was a trick question! In many programming languages, the above function will ALWAYS produce an error because crntYear isn’t defined inside the function. However, R differs from other programming languages in this respect.

If an argument isn’t defined inside the function, R will still look for it in the environment outside of the function. One good way to tell what’s in the enviornment is to take a look at your “Environment” tab in R studio on the upper right. Any function in R will have access to values defined there.

Below, we do NOT have crntYear defined either within our function our within our environment so we still get an error.

getBirthYr(21)
## Error in getBirthYr(21): object 'crntYear' not found

However, if we add crntYear to our environment, we can use the function.

crntYear <- 2018

getBirthYr(21)
## [1] 1997

Exercises

  1. Take a look at the function below. What does it do? What would be a better name for it?
f <- function(x, y) {
  x[is.na(x)] <- y
  return(x)
}
  1. What arguments does the head function take and how are they used?
  2. You want to write a function that calculates any given exponential. How many arguments do you need?
  3. Write the function described in #3.
  4. There are three columns in our weather dataset that indicate visibility in miles. Write a function to determine the equivalent visibility in kilometers.
  5. Write a function that accepts a latitude and longitude and returns which hemispheres those coordinates belong to. For example, inputs of 35.911434 and -79.048106 would return “north west”. HINT: You can use logical operators with conditionals.