If you are working from home:
Download and install Seaview on your computer (you can download the
program at http://doua.prabi.fr/software/seaview).
Seaview includes alignment (muscle and clustalo) and phylogenetic
reconstruction programs (Neighbor joining and parsimony analysis from PHYLIP,
a collection of programs for phylogenetic analyses written by Joe
Felsenstein, and phyml, a maximum likelihood program).
Advantages of seaview are
- You can designate sites as subsets or groups and analyze them
separately.
- You can save (and read) multiple sequence files in different formats
(seaview has its own format, called mase, and it is recommended that
you use it, if you specified groups of sites or groups of sequences).
- you can switch between displaying OpenReading frames as nucleotide sequences,
and display and align them as amino acid sequences, and then go back
to the nucleotide sequences.
- You can modify the alignments by hand.
- This is a great program to get a quick idea of what is going on in
your data sets
Open Seaview,and load the multiple fasta file Yeast_vma1_all_not_aligned.fst into seaview. (in the archive downloaded above) The file
contains a selection of nucleotide sequences that encode the vacuolar
ATPase in different yeasts. Some of these have been invaded by an
intein.
3.1) In the Seaview window, select Props, place a check mark into
view as protein. If you downloaded the sequences as ORFs or from
an alignment resulting from a tblastn search, you should not have any
stop codons (little * in the view as proteins display).
Do you see any stop codons in your
sequences?
Delete the sequence that has stop codons (click on the name of the
sequence, so that it turns white on black, then select edit -> delete
sequences. Uncheck view as protein and save the file in fst
format. Go back to view as protein.
Select Align -> Alignment options -> muscle then Select
Align. How many alignment columns are in
your alignment?
(scroll to the right click the last column, on top is tells you sequence
and position in the alignment | position in the sequence).
3.2) The first four sequences have not been invaded by an intein.
Can you find the place where the intein begins and where it ends?
What are the first two and the last three
amino acids of the intein?
3.3) Create sets of sites the correspond to the extein and the
intein. First go to Sites create a set called "all sites", then
duplicate this set, call it intein. Scroll to the right, and
in the row of xxx below the alignment, click on the x below the last aa
of the N-extein (the x disappears and the column is grayed out).
Then right click on any of the xxx below the N-extein, -> all the x
below the N-extein should disappear. Do the same at the end
of the intein: remove the x under the first aa of the C-Extein, then
right click on any of the xes to the right.
This might be a good point in time to save your file in mase
format. To do this you first need to unselect select view as protein (Props -> remove checkmak. After saving, return to view as protein.)
Do the same for the sites corresponding to the extein: Sites ->
all sites, then Sites -> duplicate set, call it extein.
Move to the right click on the xs below the first and
the last aa of the intein, the right click on an x under the intein.
(The right click removes all the x between to non-x columns. If
you right click before the last column of the intein is removed, you
remove everything till the end :( ).
If you want information how to modify an alignment by hand, check out
the help pages.
3.4) Uncheck view as protein. Save the alignment in mase
format. Select sites -> extein. Then highlight all the
intein containing sequences. Select Trees ->phml -> model
GTR (everything else as default -> RUN. After a minute the tree
building window is complete. If you do serious work, you want to
copy all and place the results into your notebook BEFORE you click
ok.
After you click ok, the window opens with the calculated maximum
likelihood tree.
Explore the Swap and Re-root buttons on top. These operations do
not change the tree (which is calculated as an unrooted tree).
If you click on Br support, the estimated probability that the branch is
real is displayed next to the branch. (just in case select file
-> save unrooted tree and give it a name. Also, copy past the
image of the tree into your notebook.
3.5) Repeat this for the intein sequence: Sites -> intein ;
select the intein containing sequences, Trees-> phyml (GTR)
> RUN (copy all and save the program output before clicking
ok. Save the tree, and compare it to the extein tree.
Do you see any similarities between the
trees calculated for the intein and the extein?
3.6) If you have time, rename the intein free genes
(select the sequence by clicking on the name, add a prefix (e.g. N_) at
the beginning of the name. Select sites extein, select all (click
on the names) sequences.) trees > phyme > RUN.
Do the intein free genes form a
clann/clade?