R: Intro to functions
A function has a name used to call the function and followed by 0 or more arguments in parenthesis ( ), e.g., function(argument=True). The code chunk below includes some basic functions included with R.
Functions: the basics
#get the square root of 4
4^(0.5)
#or using the function: sqrt()
sqrt(4)
#get the sum of 2 + 3
2 + 3
#or using the function: sum()
sum(2,3)
#find max value in a series of numbers
max(1,2,3,4)
#or minimum
min(1,2,3,4)
#round a number
round(3.142)
# 3.142 is itself an argument, the first argument is automatically assumed to be
round(x = 3.142)
#provide additional argument to specify format
round(3.142, digits = 1) # rounding to 1 decimal place
round(3.142, digits = 2) # rounding to 2 decimal places
#notice where every function starts and ends
#by clicking after a bracket to see the bracket closing it
round(
sqrt(
sum(1,1)
)
, 1)
#you can print a text
print("Functions make life easier")
# or paste text chunks together
paste("Functions", "make", "life", "easier")
# seperate each string by a whitespace (ws)
paste("Functions", "make", "life", "easier", sep = " ")
ifelse() function
If statements
The logic of a simple if statement: If a test expression is met, then the code within the if statement (body expression) should be executed.
grade <- 2.3
# if the (condition), grade is smaller than 3, is met,
# then execute the code between the curly brackets.
if(grade < 3) {
print("Good Job")
}
Else statements
What if the condition is not met? Executing above function on a grade that does not meet the condition would do absolutely nothing. You can add an else statement to declare code to be executed if the condition is not met.
grade <- 2.3
# if the (condition), grade is smaller than 3, is met,
# then execute the code between the curly brackets.
if(grade < 3) {
print("Good Job")
} else {
print("Life goes on")
}
grade <- 1.0
if (grade == 1.0) {
print("Perfect")
} else if (grade < 2.0) {
print("Amazing")
} else if (grade < 3.0) {
print("Good Job")
} else {
print("Life goes on")
}
We can chain else statements and if-else statements e.g., if (test expr) {body expr} else if(test expr2) {body expr2} else if(..., to evaluate a set of conditions.
ifelse() function
The ifelse() function evaluates a condition and executes the if (and else) body expressions. It can be chained as well, e.g.:
ifelse(
grade == 1.0, # if grade is equal to 1.0
"Perfect", # then: perfect
ifelse( # else: if...
grade < 2, # ...grade is better than 2
"Amazing", # then: amazing
ifelse( # else: if...
grade < 3, # ...
"Good Job", # then:
"Life goes on" # else (grade >= 3): life goes on
)
)
)
Packages provide more function(ality)
Packages include additional functions. To see how to declare your own function, check the following section. Simply put, most packages are collections of user-written functions.
Installing and loading packages
To install packages you use the install.packages() command in R.
Note that it is necessary to directly install a package in R.
This step is only required once. For example, to install the viridis package providing usefol tools for color palettes used in visualizations:
install.packages("viridis") # Put the name of the package in "quotation marks"
library(viridis) # Load installed package (no "quotation marks" needed)
While you only need to install a package once, you need to load it every time you
open a script. You can do that with the library() function in R.
Loading a package (pkg) to load data
By default, R does not provide a straightforward function to read Excel-files. Unless you want manually select and copy data in Excel and use read.table() to import the
copied data from the clipboard, you rely on packages to read Excel-files, e.g.,
readxl or xlsx.
install.packages("readxl") # install pkg once
library(readxl) # load pkg everytime you open R(-project)
df <- read_excel("data.xlsx") # load data
install.packages("xlsx") # install pkg once
library(readxl) # load pkg everytime you open R(-project)
df <- read.xlsx("data.xlsx", 1) # load first sheet of Excel-file
# read.xlsx2 is faster on large files than read.xlsx
install.packages("xlsx") # install pkg once
library(readxl) # load pkg everytime you open R(-project)
df <- read.xlsx2("data.xlsx", 1) # load first sheet of Excel-file
The same goes for writing from R to Excel-files with packages like writexl, xlsx, and r2excel.
Different functions for different data
For base R, many operations to load data are calls to read.table() with different presets of default values for arguments. Files from other programs, e.g., Excel (as shown above) or Stata, commonly require additional functions to load and write.
There are various different types of data and files, below are some of the most common ones, esp. for social science research.
| File | File Extension | Command |
|---|---|---|
| Excel | .xlsx; .xls | readxl::read_excel(), xlsx::read.xlsx() |
| CSV | .csv | read.csv() (German csv: read.csv2()) |
| Stata | .dta | haven::read_dta() |
| RData | .RData;.rds | load() , read.rds() |
CSV is short for comma seperated values. By default, read.csv() identifies commas “,” as seperators (to seperate data entries),
whereas read.csv2() identifies semicolons “;” as seperators.
An alternative to using read.csv2() is to explicitly declare the seperator using
read.csv(file, sep = ";"). For further information, consult the documentation
in R using the command ?read.csv.
Hint: readxl:: before the command read_excel() defines the origin namespace of the command. In other words, the package, readxl, that includes the function read_excel().
If you utilize APIs or databases you may also work with .json files, which allow for more complex structures–incl. nesting–compared to plain text stored as .csv.
Defining your own functions
Sometimes, you need to declare your own function to complete a task or solve a problem. Once declared, you can use your function as many times as you need it.
Step 1: Declare function
*Hint: install.packages() may be a boring name but it clearly communicates the purpose of the function. A good function name communicates its purpose in the shortest way possible. The following grade_score() is defined and used to grade (test) scores.
# Solution 2: Define function using repeated call to ifelse() function
grade_score <- function(score, threshold_A, threshold_B, threshold_C, threshold_D) {
# assign grade A if test score is higher than threshold_A
score <- ifelse(score >= threshold_A, "A", score)
# else assign "B" if score is lower than threshold_A and higher than threshold_B
score <- ifelse(score < threshold_A & score >= threshold_B, "B", score)
# else assign "C" if score is lower than threshold_B and higher than threshold_C
score <- ifelse(score < threshold_B & score >= threshold_C, "C", score)
# else assign "D" if score is lower than threshold_C and higher than threshold_D
score <- ifelse(score < threshold_C & score >= threshold_D, "D", score)
# else assign "F" if score is lower than threshold D (if not, keep score)
score <- ifelse(score < threshold_D, "F", no = score)
# or: if(score < threshold_d) { score <- "F"}
# return 'score' vector as output
return(score)
}
If no return is declared, R returns the last evaluated expression.
Step 2: Call function
# Apply function
grade_score(scores,
threshold_A = 90,
threshold_B = 80,
threshold_C = 70,
threshold_D = 60
)
Did you know?
If instead of
ifelse(score < threshold_C & score >= threshold_D, "D", score)
you declare the value for the “else” (or “no”) argument as "F", e.g.:
ifelse(score < threshold_C & score >= threshold_D, "D", "F"),
then you would replace all values that do not meet the conditions, i.e. all values
- smaller than threshold_C AND
- greater than OR equal to threshold_D,
including previously assigned grade values “A”, “B” and “C” because in R,
character values are defined as larger than any numeric values and:
"Z">"Y">"A"(larger to smaller in reverse alphabetical order),"B">"AA"(first letter matters, not length of string),- and
"B">"b") (capital letter is larger than lowercase)