Please
send your answers per email to gogarten@uconn.edu,
or hand in a hardcopy
Please let me know, how far you got during the lab.
If most students didn't finish, we may continue this next week!
a)
For a dataset of your choice (or here or here (for the latter, the phylip formated file to be used for tree-puzzle is here) -NOTE: your sequences need to be aligned before you analyze them) use PhyML (on bbcsrv3, after qlogin, enter "phyml" at the command line) and
calculate the tree with the highest likelihood using a model for Among Site Rate
Variation (ASRV) that has a proportion of invariant site estimated from the data, and
that describes the remaining sites with 4 rate categories that are a discrete
approximation of a continuous Gamma distribution whose shape parameter is estimated
from the data.
Repeat the analysis using a model that does not include invariant sites. Perform a maximum likelihood ratio test (LRT) to determine if the more complex model (the one with an estimated percent invariant sites) leads to a significant improvement in likelihood.
Notes:
If you use phyml as implemented in seaview, DO NOT CLICK OK when the analysis is finished, rather click on copy at the bottom of the window, open a text editor and paste the content into the test window. We are interested in the log likelihood, and the last values estimated for the shape parameter, and the % invariable sites (if the latter two were estimated as part of the model). One important condition that has to be fulfilled before one can use a Likelihood Ratio Test (LRT) to compare two models, is that the models should be "nested". This means that the simpler model must be a constrained version of the parameter-rich model. The likelihood ratio test is performed by doubling the difference in log-likelihood scores and comparing this test statistic with the critical value from a chi-squared distribution having degrees of freedom equal to the difference in the number of estimated parameters in the two models. The parameter-rich model will always have a better fit, due to the extra parameters and will therefore have the highest log-likelihood, so the difference should be a positive number. In this case there is 1 degree of freedom between each of the models — the gamma shape parameter is one parameter and the % invariant sites is the second parameter. Use this online chi-square calculator to determine the significance of the test.
2) Using the same dataset and the same model in TREE-PUZZLE
Invoke TREE-PUZZLE from the command line by typing "puzzle"
Use the tree from (1) as usertree (option k). Take you time in selecting the correct model ! (four rate categories plus invariant sites).
2B) Strict Molecular Clock
Repeat the analyses from above, but estimate if a strict molecular clock is compatible with the data (option z).
Select the pinvar and alpha from the previous analysis.
For a large data set, this might
take some time (about 20 minutes for archaea_euk.phy). If you want control over the commanline,
you can send the process (running puzzle) into background.To do so, stop
the process in foreground by pressing down <ctrl> and <z> simultaneously.
Then restart the process in background by typing
bg
%1 ***
======================== END Assignment================
Work on your student project !