MCB 272 : Phylip and NJPLOT

Please send your answers per email to gogarten@uconn.edu, or hand in a hardcopy

Please let me know, how far you got during the lab. If most students didn't finish, we will continue this next week!

You should answer the questions in red!

You can do these exercises either with the sequences in atp_all.phy, infile1.txt or you can use an alignment of your choice.
REMEMBER: Programs in PHYLIP treat the "-" symbol as a character. If you want to treat gaps as missing characters, you need to replace them with "?"
To do so in vi:
> vi filename.phy
:%s/-/?/g
ZZ

Preparation:

 

1) Protein parsimony analyses using phylip.

a) In phylip_temp execute the program seqboot by typing
   > seqboot
Read the menu and enter the appropriate letters to generate 100 pseudo samples using bootstrap. If in doubt, read the manual.
   > cp outfile your_filename.boot.phy

b) do a protein parsimony analyses on the original dataset
  > protpars
Read the menu and enter the appropriate letters to a heuristic search for the most parsimonious tree. Jumble the input order twice.
  > cp outfile your_filename.protpars.outfile
  > cp outtree your_filename.protpars.outtree

c) do a protein parsimony analyses on the pseudo samples (your_filename.boot.phy)
  > protpars
Read the menu and enter the appropriate letters to a heuristic search for the most parsimonious tree. (Jumble the input order, but only once)
  > cp outtree your_filename.boot.protpars.outtree
Calculate a consensus tree from the trees in your_filename.boot.protpars.outtree
  > consense
     > cp outfile your_filename.boot.protpars.consense.outfile
  > cp outtree your_filename.boot.protpars.consense.outtree

Is the topology of the consensus tree different from the most parsimonious tree(s)?

 

2) Protein distance matrix analyses using phylip.

a) Calculate two protein distance matrices from your data.
  > protdist
Read the menu and enter the appropriate letters to calculate two distance matrices, one using the JTT substitution matrix without any correction for multiple substitutions (i.e. the default values). When done copy the outfile.
     > cp outfile your_filename.protdist.outfile
For the second analyses select to correct for multiple substitutions using the Gamma correction. Type G, then Y.
You will be asked to enter the
"Coefficient of variation of substitution rate among positions (must be positive) In gamma distribution parameters, this is 1/(square root of alpha)."

If you don't know this parameter, enter 1. In case of ATP_all.phy this parameter is 0.88, in case of infile1.txt this parameter is 1.24.
Save the outfile.
     > cp outfile your_filename.protdist_gamma.outfile

b) To calculate trees from the distance matrices use the programs neighbor and fitch (with the global rearrangement option).

  > fitch (follow the menu)
  > cp outfile your_filename.protdist.fitch.outfile
  > cp outtree your_filename.protdist.fitch.outtree

  > neighbor
  > cp outfile your_filename.protdist.neighbor.outfile
  > cp outtree your_filename.protdist.neighbor.outtree

Are the trees from NEIGHBOR and FITCH different in their topology"?

  > fitch
  > cp outfile your_filename.protdist_gamma.fitch.outfile
  > cp outtree your_filename.protdist_gamma.fitch.outtree

Is the tree calculated using the gamma correction different in topology from the one calculated without the Gamma correction?

sftp the trees calculated from the distance matrices onto your IMAC and open them in njplot.
Explore the different options in njplot: re-root the trees in a place that seems appropriate.

What is the difference in the trees calculated from distances with and without gamma correction?