All course materials at tinyurl.com/introRDatasite
What part of your education are you in? (Bachelor, PhD, prof…)
What is your faculty/background? (Economics, Medicine, Biology…)
What is your motivation for learning R?
What is your experience with R?
9:30 Introductions
10:00 Base R + Exercises 1- 6
11:25 Recap & questions
11:30 Coffee break
11:45 Programming + Exercises 7-9
12:45 Lunch break
13:30 Reconvene for afternoon program
Part 1: Basics of R
Download the course materials.
Store them in a local (i.e. not on a mounted drive), accessible location.
Unzip the download to create a single folder. What animal is displayed on animal.png
?
Double-click the course-materials.Rproj file. Or: Go to File > Open Project > select course-materials.Rproj > Open
From the ‘Files’ menu (bottom right), click baseR_exercises.Rmd.
You can execute Exercise 0 chunk as a whole with the green triangle:
You can assign both numbers and text to a variable:
You will see your variabe (R object) appear in your Environment (top right panel).
See the cheatsheets
folder. Or download it.
Saving information as an R object:
Asking for information to be returned:
Note the difference in syntax:
<-
operator: storing information = no immediate ‘answer’Functions: code that performs a specific task based on the arguments provided.
Examples:
mean(x)
: calculate the mean of xmean(x, na.rm = TRUE)
: calculate the mean of x by leaving out the NAsgetwd()
: print the working directory to the screen (requires no arguments)You can perform math with your variables:
and store the results as new variables:
Check “Maths Functions” on the Base R cheatsheet:
A logical is TRUE
or FALSE
, and can also be written as T
or F
.
Logicals are mostly used as tests:
== |
is equal to |
!= |
is not |
>= |
larger than or equal to |
< |
smaller than |
For example:
Vectors are created with the function c()
A numeric vector:
What is this vector?
Yep, a character vector!
Vector type defaults to the “lowest common denominator”: everything can be a character, but not everything can be a number or a logical.
Order:
Vectors can be used in mathematical operations
Operations with multiple vectors are performed by aligning the index
c()
function.We have two vectors: name
and age
How do we combine them?
How about combining name
and age
in a two-dimensional table structure?
Or: in a multi-dimensional list.
number of dimensions | function | |
---|---|---|
vector | 1 | c() |
data frame | 2 | data.frame() |
list | any number | list() |
NB: dataframes and lists appear under Data
in the Environment (top right panel in RStudio), vectors under Values
.
Special type of vector, defined by levels. Usually as categorical variable in a data frame.
age
.By position:
By position:
df
, return complete rows for everyone living in a country of your choice.df
under 40.Let’s add a column to our data:
name age country pet
1 Ann 35 UK cat
2 Bob 22 US none
3 Chloe 50 NL
4 Dan 51 BE <NA>
Notice that:
[1] TRUE
[1] NA
[1] NA
[1] TRUE
So: want to test if a value is NA? Use is.na()
!
Do we know about our participants’ jobs?
NA |
Information is Not Available |
NULL |
Information does not exist |
none or 0 |
Data entry specifying content of 0 |
"" |
Empty character value |
An if statement tests if a condition is TRUE or FALSE and exectues code depending on the outcome of that test.
To build an if-statement, start with the function if()
:
Within the {}
, insert the code that should be executed if the condition is met:
Make an if statement that tests if a number is larger than 18. Assign the result to the variable age_category
.
Functions consist of (multiple) instruction(s) that form a cohesive unit:
Functions can also be used to make a complex line of code easier to write/read:
You write the function once:
Now, every time you want to find Bob’s age you use:
Functions are the bread and butter of programming!
A good script will consist mostly of functions, with a minimal amount of code that applies the functions.
To make a function, use the function function()
:
The sequence of operations is in the body of the function (between { }
):
First, run the code with the function itself. It will appear in your environment:
Turn the if-statement from the last exercise into a function. Let the user provide the value for number
, and return the age_category
.
A loop starts with the iterable object (in this case the vector 1:5
), and the temporary name for each item (in this case a_number
):
Within { }
, you place the instructions:
Note that a_number
is 1 in the first iteration of the loop, 2 in the second, etc. It does not exist outside the for loop!
Go over the age column in your dataframe df, and for each age: print()
the age category using the test_age function from the previous exercise.
[ ]
( )
{ }
What data types have you encountered so far?
logical
numeric
character
How can data be missing?
NA
(not available)
NULL
(non-existent)
""
(empty)
What data structures have you encountered?
vector
(one dimension)
data frame
(two dimensions)
list
(++ dimensions)
What functions have you encountered so far?
c()
data.frame()
is.na()
mean()
summary()
Programming basics
How does a function work? Type in your console:
?mean
Use a search engine (often useful: Stackoverflow)
(Generative AI)