Download the human transcript/protein for the gene of each dataset and its orthologs in the following primate species: Macaca mulatta, Pongo abelii, Papio anubis, Gorilla gorilla gorilla, Pan troglodytes, and the common mouse (Mus musculus) (listed below – Dataset 1 and 2). You can use the “Orthologs” option in NCBI after you search for the gene in humans and then select all these species or you can supply their individuals IDs one at a time.
- Align these sequences using the standard parameters in Clustal Omega and save the alignment in fasta format.
- Use IQ-Tree to infer the phylogenetic relationship among these species by using the following options: SubstitutionModel -> Sequence Type/Model -> Amino Acids -> ”Find best and apply”; Bootstrap -> Ultrafast with 1000 replications ; Root tree -> Specify outgroups (select the mouse as an outgroup after submitting these parameters).
- After the analysis is done, in the output page, find and save the file “Full Result”.
Dataset 1 – Use the human protein coded by the H2A1 gene (NP_613258.2). For the other primates and Mouse, the IDs are NP_001152985.1, NP_001229351.1, XP_014996351.1, XP_024102642.1, XP_003900166.1, XP_004042582.1.
Dataset 2 – Use the human protein coded by the RFX5 gene (NP_001020774.1) For the other primates and Mouse, the IDs are NP_059091.2, XP_513794.2, XP_028699841.1, XP_021788628.2, XP_002810287.1 and XP_004026680.1.
- Is the protein H2A1 useful for understanding the relationship among primates? Why?
- Based on your Histone protein conservation observations and the figure below, would you use histones to understand the relationships among any group of species that have less than 200 million years of divergence?
- Considering that the most recent common ancestor of all eutherian mammals probably existed around 120-160 million years ago, which of the four groups of genes from Figure 1 would you use to study the relationships among the Eutherian groups? Explain your answer.
- Compare the identity% of the H2A1 proteins with that of their respective mRNA. Are the mRNA more or less conserved than their respective proteins? (hint: after the alignment, go to “results viewer” tab and export your alignment to MView. Check the help to understand how to get the percentage identity)
- Comparing the conservation between H2A1 mRNA and protein, do you expect the variation in the mRNA sequences to be synonymous or non-synonymous? Explain your answer.
- Would the RFX5 gene useful to understand the relationships among primates? Explain your answer.
- Let us compare the tree obtained by analyzing the RFX5 proteins to the relationships estimated by a recent paper (https://doi.org/10.1371/journal.pgen.1001342). Check Figure 1 of this paper and compare the relationships among Humans, Chimps, Gorillas and Orangutans to the one you estimated using RFX5. Also, how many genes were used to study the phylogenetic relationship among primates in this work?
Dataset 3 – Myoglobin (MB) is a protein that is related to Oxygen storage and transportation. In humans its levels are high in newborns but diminish with age, being replaced by Hemoglobin as the major molecule in Oxygen transportation. In aquatic mammals, on the other hand, Myoglobin remains important part of this process even in adults, and this is partially due to the fact that Myoglobin forms a stronger bond with oxygen than Hemoglobin, which help these mammals to store oxygen for their long periods underwater. Let us study the evolution of Myoglobin protein by downloading the Human version of this protein (NP_005359.1) and also those that belong to some aquatic animals and bats: XP_004286254.1, XP_033721510.1, XP_024435142.1, XP_027449767.1, XP_032272566.1 and XP_035956092.1.
- Using a rooted tree with humans as the outgroup, paste a figure of the phylogenetic tree of the MB protein using the Maximum Likelihood method with the branch support given by 1000 resampling bootstraps.
- Studying many genes, we have a strong confidence that the true relationship among these mammals is (human, ((dolphins, whales), (bats, (seals, sea lion)))). Using your knowledge about the function of Myoglobin explain why our estimated tree does not reflect this known relationship.
- Add four species to the MB dataset: Felis catus, Canis lupus familiaris, Bos taurus andEquus caballus. Paste a figure of this new MB phylogeny. Does the inclusion of these new species change the obtained tree? Is this new tree more similar to the known relationship among aquatic mammals?