Dotlet Exercises
Your name: Your email address:
The Swiss Institute for Bioinformatics provides a Java applet called Dotlet that performs interactive dot plots. Warning: Google's Chrome, Safari, firefox, and Microsoft's Edge browser do not support Java applets. Therefore, you will need to run today's exercises in Waterfox or Internet Explorer (Waterfox works, if you are on a Mac). (Related, if experiencing problems: How do I enable Java in my web browser?)
A java script version of dotlet is in beta testing. This version works fine for protein to protein comparisons, but fails tor DNA to protein (exercise 3 below).
The main use of dot plots is to detect domains, duplications, insertions, deletions, and, if you work at the DNA level, inversions. (Note dotlet compares both strands, and if you compare DNA and protein, the DNA is "translated" in the three forward reading frames. Excellent illustrations of the use of dot plots are given on the examples page).
Comparing yeast ATPase catalytic subunit with yeast HO endonuclease. This is similar to the analysis we did using pairwise blast. However in this case, we plot all pairwise comparisons between windows, i.e. we do not need to worry about E value cutoffs and low complexity filters.
Go to the applet and input the sequences:
Sce_VMA.fa (the vacuolar ATPase catalytic subunit from yeast), SceHO.fa (the mating type switching HO endonuclease from yeast), vma1Neurospora.fa (the vacuolar ATPase catalytic subunit from Neurospora crassa) and Sce_intein.fa.
Sce_VMA.fa (the vacuolar ATPase catalytic subunit from yeast),
SceHO.fa (the mating type switching HO endonuclease from yeast),
vma1Neurospora.fa (the vacuolar ATPase catalytic subunit from Neurospora crassa) and
Sce_intein.fa.
Careful , once you leave the webpage, the back arrow will only return you to the applet, but you have to input the sequences again (so make sure that your applet is in a separate browser window). Also, when you input sequences make sure you paste the sequence only , without a sequence description line. Give the sequences a name that allows you to recognize which sequence is which (e.g. Yeast_vma1, YeastHO, Neurospora_vma1, Yeast_intein)
(A windowsize of 29, the identity matrix, and a 1:5 compression work well. The DNA sequence needs to be the horizontal or 1st sequence) How many exons are in the gene ? Are neighboring exon sequences always in the same reading frame? (Use the mouse pointer to place the blue cross-hairs on the diagonal and then use the arrow key until one of the three frames matches to the protein sequence.) Try this for a couple of exons.
(A windowsize of 29, the identity matrix, and a 1:5 compression work well. The DNA sequence needs to be the horizontal or 1st sequence) How many exons are in the gene ?
Are neighboring exon sequences always in the same reading frame? (Use the mouse pointer to place the blue cross-hairs on the diagonal and then use the arrow key until one of the three frames matches to the protein sequence.) Try this for a couple of exons.
========================================================================
If you have not finished exercise number 7, take ten to fifteen minutes to create one gnuplot comparing the location of homologs on two Aeromonas genomes, before starting with part 2.
Jalview is a JAVA application to inspect and edit multiple sequence alignments. It also allows inspection of protein space for the aligned sequences. This works surprisingly well. The Jalview Homepage contains a lot of additional information and links to install the Jalview desktop on your computer (especially recommended for Macs).
Go to the Jalview Homepage and lsunch the Jalview desktop (link on top right).
Close the windows that may have opened as a demonstration, except for the multiple sequence alignment window.
Load the sequences from the ATPase Subunit alignment (text paste link or alignment file download) into Jalview (either load from file, if the desktop application runs, or paste into the input window -- select new window after you are done pasting.).
Explore the different coloring options (COLOUR menu). Which one seems to work best (most meaningful - scroll through the alignment to a more conserved region). Note: You can change/edit the alignment by clicking on an amino acid residue and dragging it to the right or left using the arrow keys. Try it, but leave the sequences in an aligned state before you move on. (If this doesn't work, press F2 and try again) Select all sequences. CALCULATE an AVERAGE DISTANCE TREE USING % identity Click somewhere in the resulting tree to color groups of related sequences in the same color. You can right-click (or command click) on a node to change color for a group of sequences. Chose a color scheme that colors all subunits of the same type in the same color CALCULATE the PRINCIPAL COMPONENT ANALYSIS. In a principal component analyses, the new dimensions are calculated as a linear combination of the original dimensions, so that greatest variance by any projection of the data set comes to lie on the first axis, etc. for the following dimensions. Can you find a higher dimension that breaks up the vacuolar ATPase A subunits? (Their names start with A.). Which of the A subunit sequences cluster together, if you use this dimension (1, 2 and 4 worked for me)?
Check the appropriate radio button below before pressing the submit button:
Send email to your instructor (and yourself) upon submit Send email to yourself only upon submit (as a backup) Show summary upon submit but do not send email to anyone.