Assignment 9

Dotlet Exercises

Your name:
Your email address:

The Swiss Institute for Bioinformatics provides a Java applet called Dotlet that performs interactive dot plots.
Warning: Google's Chrome, Safari, firefox, and Microsoft's Edge browser do not support Java applets. Therefore, you will need to run today's exercises in Waterfox or Internet Explorer (Waterfox also works, if you are on a Mac).

To make these particular JAVA applets dotlet and jalview) run in your browser, you need to find the java sequity panel and add the following exceptions:

https://myhits.isb-sib.ch/
https://myhits.isb-sib.ch/cgi-bin/
https://myhits.isb-sib.ch/cgi-bin/dotlet
http://www.jalview.org/
http://www.jalview.org/old/v2_10_5/jalview.jnlp
http://www.jalview.org/getdown/release/jalview-all-2.11.0-j1.8.jar

A java script version of dotlet is in beta testing. This version works fine for protein to protein comparisons, but fails tor DNA to protein (exercise 3 below - i.e try to get the applet version to work!).

The main use of dot plots is to detect domains, duplications, insertions, deletions, and, if you work at the DNA level, inversions. (Note dotlet compares both strands, and if you compare DNA and protein, the DNA is "translated" in the three forward reading frames. Excellent illustrations of the use of dot plots are given on the examples page).

Comparing yeast ATPase catalytic subunit with yeast HO endonuclease.
This is similar to the analysis we did using pairwise blast. However in this case, we plot all pairwise comparisons between windows, i.e. we do not need to worry about E value cutoffs and low complexity filters.

Go to the applet and input the sequences:

Sce_VMA.fa (the vacuolar ATPase catalytic subunit from yeast),

SceHO.fa (the mating type switching HO endonuclease from yeast),

vma1Neurospora.fa (the vacuolar ATPase catalytic subunit from Neurospora crassa) and

Sce_intein.fa.

Careful , once you leave the webpage, the back arrow will only return you to the applet, but you have to input the sequences again (so make sure that your applet is in a separate browser window).
Also, when you input sequences make sure you paste the sequence only , without a sequence description line. Give the sequences a name that allows you to recognize which sequence is which (e.g. Yeast_vma1, YeastHO, Neurospora_vma1, Yeast_intein)

Select Neurospora A-subunit (vma1Neurospora.fa) and the yeast subunit with intein (Sce_VMA.fa). Select a window size between 9 and 15 and click "compute". The program will compare every window of the chosen size in one sequence to all the possible windows in the other sequence. On the right you see a histogram that describes how often window pairs with the indicated score occurred. The sliding bars below and above the histogram let you select the colors with which matches are depicted. (I like black for matches, white for mismatches better than the default).
Note: the sequences may be longer than fit into the display window. Either you can use the levers on top and at the left side to move the display window down and/or to the right, or you can select a compression using the pull-down menu labeled as 1:1 (1:3 or 1:4 usually work)

If you click on the dot plot panel, the alignment window at the bottom aligns the two sequences accordingly. You can fine-tune the alignment using the arrows.

Which sequence positions (from ... to....) in the yeast sequence represent the intein?
If you compare the HO endonuclease (sex change enzyme) (SceHO.fa) to the intein (Sce_intein.fa) (PAM250, Window of 19 and 1:2 compression works well), does the complete intein sequence match to something in the HO endonuclease, or is there a part of the sequence in the HO endonuclease that might correspond to an extein?
Comparison of nucleotide sequence with introns vs. protein sequence it codes.
Dot plots have many different applications. One of them is to analyze and visualize the intron exon structure of genes. In dotlet, if you use a nucleotide sequence for the first sequence, and a protein sequence for the second, the program will compare the translation in all three frames to the protein sequence. Load the following two sequences into dotlet:

A) The genomic sequence from Arabidopsis thaliana containing the gene encoding the vacuolar ATPase (arab.fa), the given sequence is the reverse complement of a sequence that is part of chromosome 1.

B) The protein sequence as translated from the cDNA sequence as given in GI 3334404

(A windowsize of 29, the identity matrix, and a 1:5 compression work well. The DNA sequence needs to be the horizontal or 1st sequence)

How many exons are in the gene ?

Are neighboring exon sequences always in the same reading frame? (Use the mouse pointer to place the blue cross-hairs on the diagonal and then use the arrow key until one of the three frames matches to the protein sequence.) Try this for a couple of exons.

Repetitive proteins in Dotlet
Using dotlet load WP_010869717.1 and GI 19887539 (again omit the labels from the sequence, but give them a name so you can recognize them :)).
Compare the Methanocaldococcus protein against itself. Do you see any repetitive units? How many?
Does the choice of scoring matrix make a difference?
Compare the Methanopyrus sequence against the one from Methanocaldococcus. How many equivalents to the single repeat unit in Methanocaldococcus do you find?
How many repeats do you identify when you compare the Methanopyrus sequence against itself?
Compare the two sequences using Pairwise Blast. Which program should you use? What is the effect of turning the low complexity filter on or off?

========================================================================

If you have not finished exercise number 7, take ten to fifteen minutes to create one gnuplot comparing the location of homologs on two Aeromonas genomes, before starting with part 2.

========================================================================

Part2: Jalview

Jalview is a JAVA application to inspect and edit multiple sequence alignments. It also allows inspection of protein space for the aligned sequences. This works surprisingly well. The Jalview Homepage contains a lot of additional information and links to install the Jalview desktop on your computer (especially recommended for Macs).

Go to the Jalview Webstart for version 2.10.5 and launch the Jalview desktop. You will get warnings and will need to add more permissions ...
Alternatively, you can download this jar file and execute it (should open JAVA).

Close the windows that may have opened as a demonstration, except for the multiple sequence alignment window.

Load the sequences from the ATPase Subunit alignment (text paste link or alignment file download) into Jalview (either load from file, if the desktop application runs, or paste into the input window -- select new window after you are done pasting.).

Explore the different coloring options (COLOUR menu). Which one seems to work best (most meaningful - scroll through the alignment to a more conserved region).

Note: You can change/edit the alignment by clicking on an amino acid residue and dragging it to the right or left using the arrow keys. Try it, but leave the sequences in an aligned state before you move on. (If this doesn't work, press F2 and try again)

Select all sequences. CALCULATE an AVERAGE DISTANCE TREE USING % identity
Click somewhere in the resulting tree to color groups of related sequences in the same color. You can right-click (or command click) on a node to change color for a group of sequences.
Chose a color scheme that colors all subunits of the same type in the same color

CALCULATE the PRINCIPAL COMPONENT ANALYSIS.
In a principal component analyses, the new dimensions are calculated as a linear combination of the original dimensions, so that greatest variance by any projection of the data set comes to lie on the first axis, etc. for the following dimensions. Can you find a higher dimension that breaks up the vacuolar ATPase A subunits? (Their names start with A.).
Which of the A subunit sequences cluster together, if you use this dimension (1, 2 and 4 worked for me)?

Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone.