library(phangorn)
library(ape)Phylogenetic tree of sea urchins
Preparation
Create a new project in R Studio in a dedicated folder for this practical, using File -> New Project…
In the folder where your project is located, create a sub-folder
dataFill in the empty
character_matrix_empty.csvwith your data. ReplaceCharacter1etc. with informative names e.g. “Symmetry”. Aim for more characters than taxa.Write a document explaining each character and the possible state, so you can revise the characters when you see how they shape the tree (yes, you will need this!). The states can be e.g. “Symmetry: 0. Five-fold, 1. Two-fold, 2. Asymmetrical” (this is just an example)
Save the character matrix as
character_matrix.csvin thedatafolderInstall the packages
phangornandape. You will be likely asked to install some additional packages (“dependencies”) - choose “yes”.Create a new script using File -> New File -> R Script, where you will put all the steps of the analysis. Save it in the project folder.
Tree estimation
Read your character matrix into R:
If you created your file using Excel on a system with Dutch language settings, your csv file likely uses a colon (;) instead of a comma (,) to separate columns. That is because in Dutch comma is used as a decimal separator, so to separate columns you need a different character. That is why the option sep=";" is used in this script. If your file uses a different character to separate columns, change the code accordingly.
Character_matrix <- read.csv(file="data/character_matrix.csv",
header=T,
sep=";",
colClasses=rep("character", 8), # if you have 8 characters
row.names = 1) Convert the table into a character matrix format understood by phangorn:
mm_pd <- phangorn::phyDat(as.matrix(Character_matrix),
type = "USER",
levels = 0:8)1. Distance matrix
Generate a distance matrix using the Hamming distance and examine the output:
dm = phangorn::dist.hamming(mm_pd)
dmCan you read from the distance matrix:
Which two taxa are the closest to each other?
Which ones are equally distant?
2. Make a starting tree
From the distance matrix, create a starting tree using the Neighbor-Joining method:
start_tree <- ape::nj(dm)
plot(start_tree)outgroup <- "Echinarachnius" # just an example
rooted_start_tree <- ape::root(start_tree,
outgroup,
resolve.root = TRUE,
edgelabel=TRUE)
plot(rooted_start_tree) 3. Search for the best tree
For unordered morphometric data, the option model="ER+ASC" should be used.
ER: equal rates, all transitions occur at the same rate
ASC: ascertainment bias correction (Lewis 2001) corrects for the fact that there are no constant sites (as is mostly the case with morphometric data); otherwise the branch lengths may be overestimated
ml_nni <- phangorn::pml_bb(mm_pd,
start = start_tree,
model="ER+ASC",
rearrangement = "NNI")
plot(ml_nni,
main = "NNI estimation")Is this tree good? Can you see any polytomies?
Display a summary of the tree estimate and verify that the right model has been used:
ml_nni4. Revise the character coding
If two different taxa are not well resolved in your tree, it might be that you did not choose your characters well. Or perhaps your characters were uninformative? You can go back to your character matrix and correct the coding.
What makes a good character?
Do your characters reflect synapomorphies?
Is your tree consistent with what is known about stratigraphic ranges of the genera? If not, does it mean it is wrong?
It may be that you are happy with your tree, in that case you don’t need to change anything. Maybe you have a good eye for phylogeny!
Visualize the tree
There are various conventions for tree plotting. The package ape gives you a lot of customization options. This solution is proposed in one of its vignettes:
plot_axes <- function() {
col <- "blue"
for (i in 1:2)
axis(i, col = col, col.ticks = col, col.axis = col, las = 1)
box(lty = "19")
}Use this function to make a plot comparing different conventions:
layout(matrix(1:4, 2, 2, byrow = TRUE))
par(xpd = TRUE,
mar = c(2.5, 2.5, 1, 1))
plot(ml_nni); plot_axes()
plot(ml_nni, "c"); plot_axes()
plot(ml_nni, "u"); plot_axes()
plot(ml_nni, "f"); plot_axes()
box("outer")If you want to fine-tune your trees, play with the parameter mpar (margin around the plot).
What variables are shown on the axes?
What is the unit on the axes?
You can choose which tree format is the most legible for you and use it for plotting.
Submit this plot as your final assignment result on Brightspace