Reading Guide
Week 1
Online material
If you are unfamiliar with Spyder explore the program:
To do and read
Practical 1 and 2
2.2.4 pandas DataFrames
2.4.1 Seaborn: Data Visualization
2.4.3 Statsmodels—Tools for Statistical Modeling
3.1 Text
Chapter 5 Basic Statistical Concepts
Additional Reading
Optional but strongly recommended as a quick refresher:
2.2.2 Indexing and Slicing
2.2.3 Numpy Vectors and Arrays
Week 2
Online material
To do and read
Practical 3 and 4
4.5 Displaying Statistical Datasets
4.6 Exercises
Chapter 6 Distributions of One Variable
7.1 Typical Analysis Procedure
7.2 Hypothesis Tests and Power Analyses
7.3 Sensitivity and Specificity
Week 3
Online material
Already seen, but we get deeper into data handling keep at hand:
https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Background reading on tidy data with lots of code examples:
Python for Data Science ch 12: https://byuidatascience.github.io/python4ds/tidy-data.html
To do and read
Do: Practical 5 and 6
8.1 Distribution of a Sample Mean
8.2 Comparison of two Groups
(8.4) Summary: Selecting the right test
Do: 8.5 Exercises: 8.1
9.1 One Proportion
9.2 Frequency Tables
Do: 9.3 Exercises: 9.1 and 9.2
Week 4
Online material
> Familiarize yourself: Handouts and cheat sheets for visualization with matplotlib
To do and read
Do: Practical 7 and 8
ANOVA:
8.3 Comparison of two Groups
8.4 Summary: Selecting the right test
Do: 8.5 Exercises: 8.2
Covariance:
11.1 Cross Correlation
11.2 Correlation Coefficient
Additional Reading
Optional
Matplotlib website (with reference, Tutorials, examples etc.).
Already seen… make more fun plots!
Week 5
To do and read
Practical 9 and 10
11.2 Correlation Coefficient
11.3 Coefficient of Determination
11.4 Scatterplot Matrix
12.1 – 12.5 Linear Regression Models (excluding 12.4.2 e and f)
Week 6
To do and read
Practical 11 and 12
11.6 Autocorrelation
11.7 Time-Series Analysis
Week 7
Online material
To do and read
Practical 13 and 14
Same as for Week 5, but now also including 12.4.2 e and f.
In particular, this parts you read before were only covered this week:
12.2.3 Multilinear Regression
12.2.5 Design Matrix
12.3.2 Noisy quadratic polynomial
12.4.2 e and f
12.5.2 Interpreting Multilinear Regression Models
Week 8
Online material
For the multivariate part of the course, we will use the book: An Introduction to Statistical Learning with Applications in Python by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor. Which you find here: An Introduction to Statistical Learning: with Applications in Python | SpringerLink
To do and read
Practical 15 and 16
Chapters 12.1, 12.4 and 12.5.3
Additional Reading
Chapter 7 Ecological resemblance from the book: Numerical Ecology by P. Legendre and Louis Legendre
- the book is available online: ProQuest Ebook Central - Book Details
- Browse through the chapter for more information on dissimilarity measures
Chapter Cluster Analysis from the book Multivariate Analysis by Klaus Backhaus, Bernd Erichson, Sonja Gensler, Rolf Weiber, Thomas Weiber
- the book is available online: Multivariate Analysis: An Application-Oriented Introduction | SpringerLink
- easy to read and well-illustrated introduction, but using the commercial software SPSS
Week 9
To do and read
Practical 17 and 18
From the book An Introduction to Statistical Learning with Applications in Python Chapters 12.2 and 12.5.1
Additional Reading
If you want to know more about PCA search the web, which is full of explanations. A nice overview extending beyond the scope of this course can be found here: https://www.nature.com/articles/s43586-022-00184-w