Assignments for Wednesday's class:
Assignment for Friday's class
- Go through blast slides (class 9 and 10)
- Think about how you will transfer files back and forth from the cluster.
Discussion
PAM versus Blosum
- Which to use for divergent sequences?
- What is the PAM/Blosum matrix with the highest number?
(illustrations of homologs that do not show significant sequence similarity in pairwise comparisons :
Jim Knox (MCB-UConn) has studied many
proteins involved in bacterial cell wall biosynthesis and antibiotic binding,
synthesis or destruction. Many of these proteins have identical 3-D structure,
and therefore can be assumed to be homologous; however, the tests based on pairwise sequence comparisons fail to
detect this homologies. (for example, enzymes with GRASP nucleotide binding sites
are depicted here.)
DNA
replication involves many different enzymes. Some of the proteins do the same
thing in bacteria, archaea and eukaryotes; they have similar 3-D structures (e.g.:
sliding clamp, E. coli dnaN and eukaryotic PCNA, see Edgell and Doolittle,
Cell 89, 995-998), but again, the above tests fail to detect homology.
Helicase
and F1-ATPase. Both form hexamers with something rotating in the middle (either
the gamma subunit or the DNA; D. Crampton, pers. communication). The monomers
have the same type of nucleotide binding fold (picture)
Additional Slides on blast and databanks (the slides contain links that only become accessible, after you switched to presentation mode)
Discussion 2
E-values and multiple tests
- If you select two sequences from the database and calculate their pairwise alignment score, what would be a useful Null hypothesis to assess the significance.
- How is this null hypothesis implemented in PRSS and FASTA?
- Are the E-value and P-values a measure for false positives or false negatives?
- Assume you have 100 students that repeat this exercise, what should be the expectation for a false positive if the individual test is required to pass the 1% significance level?
- What would you need to do to have false positives with an overall (for all 100 students) rate of 1%? Which significance level would the individual experiment need to pass?
If time:
- Discuss if the concept of phylogeny is compatible with reticulation events. (slides here)
Goals class 10
- Understand how the databanks at the NCBI are different from flatfile and relational databanks.
- Be able to discuss the advantages of the commandline in general and blast searches via the commandline in particular.
- Know which substitution matrices to use for comparing similar and for divergent sequences
- Understand the different types of error with as applied to data bank searches
- Know how to adjust significant levels of individual experiments to avoid fishing expeditions.
- Know about Margaret Dayhoff's accomplishments