Assignment for Friday's class:
- Go through blast slides (class 9 and 10)
- Think about how you will transfer files back and forth from the cluster
Assignment for Monday
- Complete take home exam #3 (due on Monday)
Discussion
E-values and multiple tests
- How can one assess the number of false positives in a Blast ?
- How can one assess the number of false negatives in a Blast ?
- Is the E-value of a match independent of the size of the databank?
- If you select two sequences at random and test their significant similarity, what does the E-value signify? Is this the same as the P-value?
- example of simple perl scripts:
- (perl script to extract top scoring hits is here (pdf)-- extra credit: how would you modify the script, to print out not only the first reported match for a query, but all hits that have equally good E-values to the first one? Note: in Perl the operator for "logical and" is && and "not equal" for a number is != and "not equal" for a string is ne)
- (perl script to replace gi number with position on the genome is here (pdf)
- (perl script to make gnuplot is here (pdf), output for Thermotoga maritima vs Th. petrophila is here, plots for comparison between different Aeromonas species are here, here and here). Discuss: Which recombination events could have created these patterns?
What can give rise to recombination events occuring mainly between points that are equidistant to the origin of replication?
- See Collins Tillier
- Strand bias (slides here)
If time:
Goals class 11:
- Appreciate that life may be older than the late heavy bombardment.
- Know how to adjust significant levels of individual experiments to avoid fishing expeditions.
- Know about the problem facing the scientific community through the underreporting of negative results.