R basics

Author
Affiliation

Benjamin Delory

Environmental sciences group, Copernicus institute of sustainable development, Utrecht University

Installing R packages

An R package is like a toolbox, except that instead of containing tools, it contains functions for performing specific tasks such as filtering data or fitting a statistical model. Most of the R packages you will need for these tutorials are freely available from CRAN (The Comprehensive R Archive Network) or GitHub. You can install CRAN R packages using install.packages().

#Install an R package to import data from Excel files
install.packages("readxl")

If you want to install an R package stored in a GitHub repository, use install_github() in the devtools R package.

#Install tbi R package to calculate Tea Bag Index paramaters
devtools::install_github("BenjaminDelory/tbi/tbi")

Commenting

You can add comments to your R script using the hash tag symbol: #

Every line that starts with # will be ignored by R and will not be executed.

#This line of code is a comment and will be ignored by R when running the code
Always comment your R script!

We strongly advise you to add comments to your R code. At the very least, these comments should indicate why you have written each section of your code. It can also be useful to add information about the what and how. These comments can save a lot of time if you need to go back over your code after a while or, even more difficult, if someone else needs to go through your code and understand what you’ve done.

Creating R objects

The assignment operator

New R objects are created using the assigment operator: <-

You can think of this assignment operator as an arrow that puts what’s on its right side into an R object located on its left side.

For instance, let’s create an object called x that contains the value 2.

#Create object
x <- 2

#Show content of R object
x
[1] 2
RStudio shortcut

Press Alt and the minus sign on your keyboard (Alt+-) to quickly write the assignment operator.

Scalars

A scalar is a quantity that can only hold one value at a time. Here are the most common types of scalars in R:

  • Numeric: numbers with a decimal value (e.g., 17.8)
  • Integer: numbers without a decimal value (e.g., 18)
  • Character: a letter or a combination of letters. Character strings must be enclosed by quotes in your R code.
  • Factor: data type used in statistical modelling to specify what are the factors in the model
  • Logical: a logical variable can be either TRUE or FALSE

You can check the data type of an R object using the class() function.

x <- 2
class(x)
[1] "numeric"

Vectors

A vector is a sequence of data elements of the same type. Vectors can be created using the c() function.

#Numeric vector
x1 <- c(1,2,3,4,5)
x1 <- c(1:5)

#Character vector
x <- c("control", "treatment")

#Logical vector
x <- c(TRUE, TRUE, FALSE)

You can check how many elements there are in a vector using the length() function.

length(x1)
[1] 5

Matrices

A matrix is an ensemble of data elements of the same type arranged in a 2D layout (i.e., like a table). Matrices can be created using the matrix() function.

#Generate 25 random numbers between 0 and 1 from a uniform distribution
x2 <- runif(25)

#Arrange these random numbers into a matrix with 5 rows and 5 columns
x2 <- matrix(x2,
            ncol = 5,
            nrow = 5)

#View matrix
x2
          [,1]       [,2]      [,3]      [,4]       [,5]
[1,] 0.6445135 0.03635731 0.3081061 0.4967868 0.71427079
[2,] 0.5644063 0.50678207 0.2357141 0.9245398 0.72489181
[3,] 0.4417767 0.94662147 0.5808922 0.6914159 0.03381332
[4,] 0.3135167 0.22928658 0.4377485 0.4733986 0.32219592
[5,] 0.8875961 0.60270605 0.8314840 0.5604592 0.35591664

You can check the size of a matrix using the dim() function. The first element of the output is the number of rows. The second element of the output is the number of columns.

dim(x2)
[1] 5 5

You can also extract the number of rows and columns using nrow() and ncol(), respectively.

nrow(x2)
[1] 5
ncol(x2)
[1] 5

Data frames

A data frame is an ensemble of data elements arranged in a 2D layout (i.e., like a table). Different columns of a data frame can contain different types of data (character, logical, numeric, etc.). It is probably the most common data structure used when analysing ecological data. Data frames can be created using the data.frame() function.

#Create data frame
x3 <- data.frame(Var1=c(1:6),
                 Var2=c("R", "i", "s", "f", "u", "n"),
                 Var3=c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE))

#View data frame
x3
  Var1 Var2  Var3
1    1    R  TRUE
2    2    i  TRUE
3    3    s FALSE
4    4    f FALSE
5    5    u  TRUE
6    6    n FALSE

The functions dim(), ncol(), and nrow() can also be used on data frames.

Lists

A list is a vector containing other objects (vectors, matrices, data frames, other lists, etc.). It can contain elements of various data types. Lists can be created using the list() function.

#Create a list
x4 <- list(x1, x2, x3)

#View list
x4
[[1]]
[1] 1 2 3 4 5

[[2]]
          [,1]       [,2]      [,3]      [,4]       [,5]
[1,] 0.6445135 0.03635731 0.3081061 0.4967868 0.71427079
[2,] 0.5644063 0.50678207 0.2357141 0.9245398 0.72489181
[3,] 0.4417767 0.94662147 0.5808922 0.6914159 0.03381332
[4,] 0.3135167 0.22928658 0.4377485 0.4733986 0.32219592
[5,] 0.8875961 0.60270605 0.8314840 0.5604592 0.35591664

[[3]]
  Var1 Var2  Var3
1    1    R  TRUE
2    2    i  TRUE
3    3    s FALSE
4    4    f FALSE
5    5    u  TRUE
6    6    n FALSE

The length() function can be used to check how many data elements there are in a list.

length(x4)
[1] 3

Indexing

One of the main advantages of R is that it is very easy to extract any given value from a data set. This is called indexing. Let’s have a look at a few examples.

Vectors

To extract the ith value of a vector object called x, you should write x[i].

#Extract the third value of the x1 object 
#x1 is a vector

x1[3]
[1] 3

Matrices and data frames

To extract the value located at the intersection between the ith row and jth column of a matrix or a data frame object called x, you should write x[i,j].

#Extract the value at the intersection of row 2 and column 3 in the x2 object
#x2 is a matrix

x2[2,3]
[1] 0.2357141

With a data frame, there are a couple of other options to extract data from specific columns. One option is to use the dollar symbol ($) followed by the column name.

#Extract all the values stored in the second column of the x3 object
#x3 is a data frame

x3$Var2
[1] "R" "i" "s" "f" "u" "n"

Note that the following code would also work and would produce the same result. To extract all the values from a specific column, simply leave the square brackets empty before the comma. It is important to specify the name of the column (in quotes), otherwise you will simply extract all the values from your data frame.

#Extract all the values stored in the second column of the x3 object
#x3 is a data frame

x3[,"Var2"]
[1] "R" "i" "s" "f" "u" "n"

If you want to subset a matrix or a data frame called x (i.e., selecting only specifics rows and columns), you should write:

x[rows to select, columns to select]

#Extract only the values located between rows 2 and 4 
#in the second column of the x3 object
#x3 is a data frame

x3[2:4, 2]
[1] "i" "s" "f"

Note that writing 2:4 means “from index 2 to index 4”. It is exactly the same as writing c(2,3,4).

Lists

To extract the ith element of a list object called x, you should write x[[i]].

#Extract the second element of the x4 object
#x4 is a list

x4[[2]]
          [,1]       [,2]      [,3]      [,4]       [,5]
[1,] 0.6445135 0.03635731 0.3081061 0.4967868 0.71427079
[2,] 0.5644063 0.50678207 0.2357141 0.9245398 0.72489181
[3,] 0.4417767 0.94662147 0.5808922 0.6914159 0.03381332
[4,] 0.3135167 0.22928658 0.4377485 0.4733986 0.32219592
[5,] 0.8875961 0.60270605 0.8314840 0.5604592 0.35591664