Midterm 2018Practical Portion – you are allowed to use your notes, the computer, google, … . ====================================================================== NOTE 1: If you log into bbcsrv3 from outside the University, you need to first establish a vpn connection. Juno Pulse Secure works great and is available here. NOTE 2: Never ever run you blast search on the master node. qlogin into a compute node. Use qlogin -q course.q to specify a particular queue (in this case "course.q". Use qacct -q to get info on the available queues. NOTE 3: Submit the answers/results before 4 pm on Wednesday March 21st per email to gogarten@uconn.edu, with a cc to Artemis.Louyakis@uconn.edu and to yourself. ====================================================================== |
First exercise: The most conserved protein homologs.Given two multiple sequence fasta files that each include all the proteins encoded in an organism’s genome What are the most conserved genes? Which measure of similarity did you use? Hint: ====================================================================== Second exercise: Cumulative plot of palindromes along a genome.Palindromic DNA motifs are sequences that are identical to their reverse complement. E.g., CTAG. Your task is to explore the distribution of pairs of palindromes. You first should try to calculate the cumulative occurrence for CTAG and GATC. Note that the second palindrome is the first one backwards; however, these are different palindromes. In DNA a sequence is considered a palindrome only when its reverse complement is the same motif. Write a script that determines the cumulative occurrence of this pair of palindromes (CTAG and GATC) along the E.coli K12 chromosome < https://www.ncbi.nlm.nih.gov/nuccore/NC_000913.3?report=fasta or https://www.ncbi.nlm.nih.gov/nuccore/NC_000913.3?report=fasta&log$=seqview&format=text > You might need to wait a little before the sequnce is loaded. Submit the plot(s) you obtain as an image file, make sure that the axes are labeled and the numbers are readable. Only in case you do not complete this exercise, and if you want to receive partial credit, submit the latest script you have written. If time permits, use your script on a different genome, e.g., an archaeon. If you still have time, modify the script to also work on CATG and GTAC; or/and AGCT and TCGA; or/and ACGT and TGCA. If you got this far, write down a few sentences on the implications of your result. ======================================================================= Send your results (most similar genes and chosen measure of similarity; plot of cumulative occurrence as pdf, jpg, of png) to gogarten@uconn.edu with a cc to Artemis.Louyakis@uconn.edu and to yourself.
|