Your name: Your email address:
Dotlet Exercise
Aside: In class 13, an assignment was to read through the dotlet examples pages. Dotlet is a Java applet provided by the Swiss Institute for Bioinformatics that performs interactive dot plots. This application is great, but next to impossible to get to run from a "normal" browser, because JAVA applets are no longer supported in Google's Chrome, Safari, firefox, and Microsoft's Edge. Waterfox or Internet Explorer may work, but may be not, because the operating system may interfere). End Aside
Instead, we will use a java script version of dotlet that is in beta testing. This version works fine for protein to protein comparisons, but fails tor DNA to protein comparisons.
Comparing yeast ATPase catalytic subunit with yeast HO endonuclease. This is similar to the analysis we did using pairwise blast. However in this case, we plot all pairwise comparisons between windows, i.e. we do not need to worry about E value cutoffs and low complexity filters.
Go to the applet and input the two sequences to analyze (note: just the sequence, not the header). These are the sequences we will be using:
Sce_VMA.fa (the vacuolar ATPase catalytic subunit from yeast), SceHO.fa (the mating type switching HO endonuclease from yeast), vma1Neurospora.fa (the vacuolar ATPase catalytic subunit from Neurospora crassa) and
Sce_VMA.fa (the vacuolar ATPase catalytic subunit from yeast),
SceHO.fa (the mating type switching HO endonuclease from yeast),
vma1Neurospora.fa (the vacuolar ATPase catalytic subunit from Neurospora crassa) and
Compare two unrelated sequences (e.g., (SceHO.fa) and vma1Neurospora.fa). If you adjust the levels in the histogram, the dot matrix comparison looks a little like the rain in a Japanese woodcut (here or here). What is the reason for these many small diagonals? (Hint: Check the impact of the window size; move the hairline-cross along one of the little diagonal streaks.)
The difference to a gene plot is that in this case the search is done on the nucleotide level, and that the program keeps track of the + and the - strand. Mummer is installed on the cluster.
The following assumes that you established a filezilla and an ssh connection to xanadu (see lab 5). [note: establish the ssh connection first, to be sure your password has not expired!]
module load MUMmer/4.0.2
module load gnuplot
module load perl
mkdir lab13
cd lab13
srun --pty -p mcbstudent --qos=mcbstudent --mem=2G bash
Download two or more genomes you want to compare (browse microbial genomes at NCBI and download chromosome or genome (*.fna) files to your computer or use the curl -O command targeting the NCBI FTP site (to get the link, right click in the link in the ftp server).
For an example you could download four Haloferax genomes using
curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/025/685/GCF_000025685.1_ASM2568v1/GCF_000025685.1_ASM2568v1_genomic.fna.gz ( Haloferax volcanii DS2)
curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/005/406/325/GCF_005406325.1_ASM540632v1/GCF_005406325.1_ASM540632v1_genomic.fna.gz ( Haloferax mediterranei ATCC 33500)
curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/010/692/905/GCF_010692905.1_ASM1069290v1/GCF_010692905.1_ASM1069290v1_genomic.fna.gz ( Haloferax alexandrinus)
curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/190/965/GCF_001190965.1_ASM119096v1/GCF_001190965.1_ASM119096v1_genomic.fna.gz ( Haloferax gibbonsii)
To plot the matches we use a script that comes with mummer and that creates a file to be send to gnuplot. Sadly, a few of the options in mummerplot do not work on the cluster, resulting in an error message. Nevertheless, mummerplot (with the postscript option) generates a file that is being understood by gnuplot as installed on the cluster.
To runmummer plot: mummerplot --postscript --prefix give_it_a_name out.delta - out.delta is the output file from nucmer.
gnuplot give_it_a_name.gp
This creates a file in your working directory that ends on .ps (e.g., give_it_a_name.ps).
Use filezilla to transfer the file to your personal computer. On Macs, preview reads postscript files; I am told that Adobe's Acrobat reader opens .ps files under windows, but you could use inkscape or gimp.
If you want to compare many genomes, you can use the following script that compares all .fna files in the directory with one another. This takes time, you could use it get a snack or coffee. The script is here. Run it from the command line by typing perl mummer2.pl
Genome1: Genome2: Genome3: ... Description of your results
Check the appropriate radio button below before pressing the submit button:
Send email to your instructor (and yourself) upon submit Send email to yourself only upon submit (as a backup) Show summary upon submit but do not send email to anyone.