Assignment for Wednesday

 

Assignment for today

  • Think about the question how new genes are created, and how genome size can increase. Everyone should have at least three distinct mechanisms.

  • Read The Evolutionary Fate and Consequences of Duplicate Genes
    Discussion:
    This article discusses the frequency and fate of duplicated genes. What is the most frequent outcome? What is the most frequent outcome, in case the gene doesn't undergon non-functionalization? What does any of this have to do with building up a post mating hyybrization barrier?
    Aside: How can genes be duplicated?

line

Discussion of midterm,
Red Letter Grade: Grade in midterm (based on percent correct answers, with three bonus question)
Green letter Grade: Current standing in Course (counting two of the take home exams/essays, and 6 lab assignments)
Blue number: Number of lab assignments I received.

Questions from midterm:

True/False - The 3 alpha and 3 beta subunits of the Bacterial type ATPsynthase and the 3 A and 3 B subunits in the Archaeal/Eukaryal ATPase are an example of convergent evolution, because the 3 and 3 subunit arrangement was settled on independently by these 2 enzymes.

Inteins are composed of which of the following domains? Choose 2.
A. Self-splicing domain
B. Walker motif
C. Nucleotide binding domain (GRASP)
D. Hydrolase domain
E. Helix-turn-Helix DNA binding domain
F. Homing endonuclease domain

Which of these domains is sometimes absent?

Which of the following features of life as we know it seems inescapable and will surely be found in all alien life discovered?
A. DNA
B. Parasites
C. The central dogma (DNA →RNA →Proteins)
D. RNA
E. All of the above

When searching a database with a query sequence, which of the following is true regarding the E-value?A. It is proportional to the size of the databank and can be larger than 1.
B. It is proportional to the size of the databank and canNOT be larger than 1.
C. It is NOT proportional to the size of the databank and can be larger than 1.
D. It is NOT proportional to the size of the databank and canNOT be larger than 1.
E. It is the number of standard deviations a match is above mean, generated by randomizing sequences.

When aligning two sequences that are about 20% identical, which of the following scoring matrices would be most appropriate:
(A) PAM 0.25
(B) PAM 1
(C) PAM 8.5
(D) PAM 25
(E) PAM 250

One databank search is done using FASTA with an amino acid sequence as query and the only reported match has an E-value of 10-2, what does this mean for the homology of the two sequences?
A) This is suggestive of homology, but to be certain, one might want to consider additional analyses
B) this proves (beyond reasonable doubt) that the two sequences are NOT homologs.
C) this proves (beyond reasonable doubt) that the two sequences ARE homologs.
D) this suggests that the two sequenced are not homologous
D) None of the above.

One databank search is done using FASTA with an amino acid sequence as query and the only reported match has an E-value of 8, what does this mean for the homology of the two sequences?
A) An E-value of this magnitude does not prove homology, but the sequences may never-the-less be homologous.
B) this indicates that the two sequences are NOT homologs.
C) this proves (beyond reasonable doubt) that the two sequences ARE homologs. 
D) this suggests that the two sequenced are not homologous
E) none of the above

example of simple perl scripts:

  • (perl script to extract top scoring hits is here (pdf)-- extra credit: how would you modify the script, to print out not only the first reported match for a query, but all hits that have equally good E-values to the first one? Note: in Perl the operator for "logical and" is && and "not equal" for a number is != and "not equal" for a string is ne)
  • (perl script to replace gi number with position on the genome is here (pdf) (faa_samplefeature_table sample)
  • How would a misplaced origin of replication look like in two similar genomes?
  • Which part of a circular bacterial genome is least conserved?
  • Aside strand bias: (G/G+C) other biases.

Gene plot via blast (slides with examples)

 

ELSE

line

 

Goals for class 13: