Powerpoint slides on BLAST are here
| Blast, PSI–blast,
and Homology Are
two similar sequences homologous (i.e., is their similarity
due to shared ancestry) ? One
way to quantify the similarity between two sequences is to 1.
compare the actual sequences and calculate alignment score 2.
randomize (scramble) one (or both) of the sequences and calculate the alignment
score for the randomized sequences. 3.
repeat step 2 at least 100 times 4.
describe distribution of randomized alignment scores 5.
do a statistical test to determine if the score obtained for the real sequences
is significantly better than the score for the randomized sequences To
illustrate the assessment of similarity/homology we will use a program from Pearson's
FASTA package called PRSS. A
There are many other alignment programs. BLAST is a program that is widely used and offered through the NCBI (go here for more info). It also offers to do pairwise comparisons (go here, do example). To force the program to report an alignment increase the E-value. Rules
of thumb: If you can demonstrate significant
similarity using either randomization or an unweighted blast search, your sequences
are homologous (i.e. related by common ancestry). Convergent
evolution has not been shown to lead to sequence similarities detectable by these
means (see above - this might not be true for scores in PSI-blast) If
the actual alignment score is more than three standard deviations (of the randomized
sequences) better than the mean for the randomized sequences, the two sequences
are homologous (i.e. related by common ancestry). PRSS and many other program
use more accurate distributions to describe the distribution of random hits.
The expectation value for the alignment-score of the actual sequences is based
on these statistics.
E-values
give the expected number of matches with an alignment score this good or better,
BUT:
Examples:
Jim Knox (MCB-UConn) has studied many
proteins involved in bacterial cell wall biosynthesis and antibiotic binding,
synthesis or destruction. Many of these proteins have identical 3-D structure,
and therefore can be assumed to be homologous, however, the above tests fail to
detect this homologies. (for example, enzymes with GRASP nucleotide binging sites
are depicted here.) DNA
replication involves many different enzymes. Some of the proteins do the same
thing in bacteria, archaea and eukaryotes; they have similar 3-D structures (e.g.:
sliding clamp, E. coli dnaN and eukaryotic PCNA, see Edgell and Doolittle,
Cell 89, 995-998), but again, the above tests fail to detect homology. BLAST and PSI BLAST Run a blast trial with this sequence A BLAST tutorial for standard
blast is here; a more
general tutorial on BLAST, including PSI BLAST, is here
An easy way to force the program to report
less significant matches is to increase the expect value in the blast search form pages. The NCBI page describes PSI blast as follows: The
results of a normal blast search are aligned and a pattern of conserved residues
is extracted from the alignment. This pattern is used for the next iteration.
An important parameter to adjust is the E-value threshold down to which matches
are included in the alignment and pattern extraction. |
Assignment #3: Write down your answers! Blast
using the NCBI web interface:
|