Phylogenetic tree of sea urchins

Author

Emilia Jarochowska

Preparation

  1. Create a new project in R Studio in a dedicated folder for this practical, using File -> New Project…

  2. In the folder where your project is located, create a sub-folder data

  3. Fill in the empty character_matrix_empty.csv with your data. Replace Character1 etc. with informative names e.g. “Symmetry”. Aim for more characters than taxa.

  4. Write a document explaining each character and the possible state, so you can revise the characters when you see how they shape the tree (yes, you will need this!). The states can be e.g. “Symmetry: 0. Five-fold, 1. Two-fold, 2. Asymmetrical” (this is just an example)

  5. Save the character matrix as character_matrix.csv in the data folder

  6. Install the packages phangorn and ape. You will be likely asked to install some additional packages (“dependencies”) - choose “yes”.

  7. Create a new script using File -> New File -> R Script, where you will put all the steps of the analysis. Save it in the project folder.

Tree estimation

library(phangorn)
library(ape)

Read your character matrix into R:

If you created your file using Excel on a system with Dutch language settings, your csv file likely uses a colon (;) instead of a comma (,) to separate columns. That is because in Dutch comma is used as a decimal separator, so to separate columns you need a different character. That is why the option sep=";" is used in this script. If your file uses a different character to separate columns, change the code accordingly.

Character_matrix <- read.csv(file="data/character_matrix.csv", 
                             header=T, 
                             sep=";", 
                             colClasses=rep("character", 8), # if you have 8 characters
                             row.names = 1) 

Convert the table into a character matrix format understood by phangorn:

mm_pd <- phangorn::phyDat(as.matrix(Character_matrix), 
                          type = "USER", 
                          levels = 0:8)

1. Distance matrix

Generate a distance matrix using the Hamming distance and examine the output:

dm = phangorn::dist.hamming(mm_pd)
dm

Can you read from the distance matrix:

  1. Which two taxa are the closest to each other?

  2. Which ones are equally distant?

2. Make a starting tree

From the distance matrix, create a starting tree using the Neighbor-Joining method:

start_tree <- ape::nj(dm)
plot(start_tree)
outgroup <- "Echinarachnius" # just an example
rooted_start_tree <- ape::root(start_tree, 
                       outgroup, 
                       resolve.root = TRUE,
                       edgelabel=TRUE)
plot(rooted_start_tree) 

3. Search for the best tree

For unordered morphometric data, the option model="ER+ASC" should be used.

  • ER: equal rates, all transitions occur at the same rate

  • ASC: ascertainment bias correction (Lewis 2001) corrects for the fact that there are no constant sites (as is mostly the case with morphometric data); otherwise the branch lengths may be overestimated

ml_nni <- phangorn::pml_bb(mm_pd,
             start = start_tree,
             model="ER+ASC", 
             rearrangement = "NNI")
plot(ml_nni,
     main = "NNI estimation")

Is this tree good? Can you see any polytomies?

Display a summary of the tree estimate and verify that the right model has been used:

ml_nni

4. Revise the character coding

If two different taxa are not well resolved in your tree, it might be that you did not choose your characters well. Or perhaps your characters were uninformative? You can go back to your character matrix and correct the coding.

  • What makes a good character?

  • Do your characters reflect synapomorphies?

  • Is your tree consistent with what is known about stratigraphic ranges of the genera? If not, does it mean it is wrong?

It may be that you are happy with your tree, in that case you don’t need to change anything. Maybe you have a good eye for phylogeny!

Visualize the tree

There are various conventions for tree plotting. The package ape gives you a lot of customization options. This solution is proposed in one of its vignettes:

plot_axes <- function() {
 col <- "blue"
 for (i in 1:2)
 axis(i, col = col, col.ticks = col, col.axis = col, las = 1)
 box(lty = "19")
 }

Use this function to make a plot comparing different conventions:

layout(matrix(1:4, 2, 2, byrow = TRUE))
par(xpd = TRUE,
    mar = c(2.5, 2.5, 1, 1))
plot(ml_nni); plot_axes()
plot(ml_nni, "c"); plot_axes()
plot(ml_nni, "u"); plot_axes()
plot(ml_nni, "f"); plot_axes()
box("outer")

If you want to fine-tune your trees, play with the parameter mpar (margin around the plot).

What variables are shown on the axes?

What is the unit on the axes?

You can choose which tree format is the most legible for you and use it for plotting.

Submit this plot as your final assignment result on Brightspace