Reading Guide

Week 1

Online material

If you are unfamiliar with Spyder explore the program:

https://docs.spyder-ide.org/current/index.html

To do and read

Practical 1 and 2

2.2.4 pandas DataFrames

2.4.1 Seaborn: Data Visualization

2.4.3 Statsmodels—Tools for Statistical Modeling

3.1 Text

Chapter 5 Basic Statistical Concepts

Additional Reading

Optional but strongly recommended as a quick refresher:

2.2.2 Indexing and Slicing

2.2.3 Numpy Vectors and Arrays

Week 2

Online material

https://seaborn.pydata.org/examples/index.html

To do and read

Practical 3 and 4

4.5 Displaying Statistical Datasets

4.6 Exercises

Chapter 6 Distributions of One Variable

7.1 Typical Analysis Procedure

7.2 Hypothesis Tests and Power Analyses

7.3 Sensitivity and Specificity

Week 3

Online material

Already seen, but we get deeper into data handling keep at hand:

https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

Background reading on tidy data with lots of code examples:

Python for Data Science ch 12: https://byuidatascience.github.io/python4ds/tidy-data.html

To do and read

Do: Practical 5 and 6

8.1 Distribution of a Sample Mean

8.2 Comparison of two Groups

(8.4) Summary: Selecting the right test

Do: 8.5 Exercises: 8.1

9.1 One Proportion

9.2 Frequency Tables

Do: 9.3 Exercises: 9.1 and 9.2

Week 4

Online material

> Familiarize yourself: Handouts and cheat sheets for visualization with matplotlib

https://matplotlib.org/cheatsheets/

To do and read

Do: Practical 7 and 8

ANOVA:

8.3 Comparison of two Groups

8.4 Summary: Selecting the right test

Do: 8.5 Exercises: 8.2

Covariance:

11.1 Cross Correlation

11.2 Correlation Coefficient

Additional Reading

Optional

Matplotlib website (with reference, Tutorials, examples etc.).

https://matplotlib.org/

Already seen… make more fun plots!

https://seaborn.pydata.org/examples/index.html

Week 5

To do and read

Practical 9 and 10

11.2 Correlation Coefficient

11.3 Coefficient of Determination

11.4 Scatterplot Matrix

12.1 – 12.5 Linear Regression Models (excluding 12.4.2 e and f)

Week 6

To do and read

Practical 11 and 12

11.6 Autocorrelation

11.7 Time-Series Analysis

Week 7

Online material

To do and read

Practical 13 and 14

Same as for Week 5, but now also including 12.4.2 e and f.

In particular, this parts you read before were only covered this week:

12.2.3 Multilinear Regression

12.2.5 Design Matrix

12.3.2 Noisy quadratic polynomial

12.4.2 e and f

12.5.2 Interpreting Multilinear Regression Models

Week 8

Online material

For the multivariate part of the course, we will use the book: An Introduction to Statistical Learning with Applications in Python by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor. Which you find here: An Introduction to Statistical Learning: with Applications in Python | SpringerLink

To do and read

Practical 15 and 16

Chapters 12.1, 12.4 and 12.5.3

Additional Reading

Chapter 7 Ecological resemblance from the book: Numerical Ecology by P. Legendre and Louis Legendre

- the book is available online:  ProQuest Ebook Central - Book Details

- Browse through the chapter for more information on dissimilarity measures

Chapter Cluster Analysis from the book Multivariate Analysis by Klaus Backhaus, Bernd Erichson, Sonja Gensler, Rolf Weiber, Thomas Weiber

- the book is available online: Multivariate Analysis: An Application-Oriented Introduction | SpringerLink

- easy to read and well-illustrated introduction, but using the commercial software SPSS

Week 9

To do and read

Practical 17 and 18

From the book An Introduction to Statistical Learning with Applications in Python Chapters 12.2 and 12.5.1

Additional Reading

If you want to know more about PCA search the web, which is full of explanations. A nice overview extending beyond the scope of this course can be found here: https://www.nature.com/articles/s43586-022-00184-w