Assignment for Friday's class:

Assignment for Wednesday's class (next week):

  • Read this page and the power-point slides for class 1 and 2, try to understand what it says in the last two orange frames!
  • Refresh your memory on replication, transcription, translation (watch Drew Berry's movie on youtube)
  • Contemplate the following questions (see the orange boxes below for inspiration):
    Do most homologous proteins have similar functions?
    Are most proteins with similar function homologous? Are all proteins with similar function homologous?
    Are most proteins with significant sequence similarity homologous?
    Do most homologous proteins have significant sequence similarity?

Problems accessing huskyCT?

Student questions

Is it Bioinformatics or not?


Sequence space is big, how come we ever find a functioning protein?


SPDBV demo

load 1HEW.pdb
simplify display
display options
windows -> control panel
coloring commands
ribbon display
selection vs display
commands acting on selected aa
neighbors of selected aa


powerpoint slides class 1, class 2


Which of the traditional criteria for life appear biased?

Traditional criteria for Life:

  • Uptake and dissipation of Energy
  • Metabolism
  • Responsiveness
  • Gestalt (distinctive shape, separate from environment)
  • Growth
  • Reproduction with variation - Ability to evolve

Would a combination of fewer criteria be sufficient?

NASA's definition "life is a self-sustained chemical system capable of undergoing Darwinian evolution" (first put onto paper by Gerald Joyce)

Could a virus, if one considers its life cycle, be considered alive?
Claudiu Bandea from the CDC in Atlanta wrote an interesting article on viruses as molecular organisms. Patrick Forterre (here) arrives at a similar conclusion.
Can life be divided in living building blocks (individual cells), or is life a property of a larger assembly? If the latter, what assembly? (the organism, the biofilm, the biosphere, Gaia)


 

 

What does Bioinformatics have to do with Molecular Evolution? 

Problem: Application of first principles does not (yet) work

The following chain although (believed to be) mainly determined by the DNA sequence (plus other components of the cell which in turn are encoded by other parts of the genome) can at present not be simulated in a computer.  

DNA sequence ->
transcription ->
translation ->
protein folding ->
protein function (catalytic and other properties) ->
properties of the organism(s) ->
ecology (taking also the non biological environment into account) ->

... .

 

Most scientists believe that the principle of reductionism (plus new laws and relations emerging on each level) is true for this chain; however, this is clearly "in principle" only.
Biology relies on this sequence to work more or less unambiguously (prions), but:

At several steps along the way from DNA to function our understanding of the chemical and physical processes involved is so incomplete that prediction of protein function based on only a single DNA sequence is at present impossible (at least for a protein of reasonable size).

Solution:
Use evolutionary context:

"Nothing in biology makes sense except in the light of evolution"

Theodosius Dobzhansky



Present day proteins evolved through substitution and selection from ancestral proteins. Related proteins have similar sequence AND similar structure AND similar function.

In the above mantra "similar function" can refer to:

  • identical function,

  • similar function, e.g.:
    • identical reactions catalyzed in different organisms; or
    • same catalytic mechanism but different substrate (malic and lactic acid dehydrogenases);
    • similar subunits and domains that are brought together through a (hypothetical) process called domain shuffling, e.g. nucleotide binding domains in hexokinase, myosin, HSP70, and ATPsynthases.

The Size of Protein Sequence Space (back of the envelope calculation):

Consider a protein of 600 amino acids.
Assume that for every position there could be any of the twenty possible amino acid.
Then the total number of possibilities is
20 choices for the first position times 20 for the second position times 20 to the third .... = 20 to the 600 = 4*10^780 different proteins possible with lengths of 600 amino acids.

For comparison the universe contains only about 10^89 protons and has an age of about 5*10^17 seconds or 5*10^29 picoseconds.

If every proton in the universe were a computer that explored one possible protein sequence per picosecond, we only would have explored 5*10^118 sequences, i.e. a negligible fraction of the possible sequences with length 600 (one in about 10^662).

The following is based on observation and not on an a priori truth:

If two proteins (not necessarily true for nucleotide sequences) show significant similarity in their primary sequence, they have shared ancestry, and probably similar function.


To date there is no example known where convergent evolution has let to significant similarity of the primary sequence (although here are examples where similar selection pressures have resulted in similar convergent substitutions in homologous proteins).

THE REVERSE IS NOT TRUE:

PROTEINS WITH THE SAME OR SIMILAR FUNCTION DO NOT ALWAYS SHOW SIGNIFICANT SEQUENCE SIMILARITY
for one of two reasons:

a)  they evolved independently
(e.g. different types of nucleotide binding sites);

or

b)   they underwent so many substitution events that there is no readily detectable similarity remaining.

In particular, PROTEINS WITH SHARED ANCESTRY DO NOT ALWAYS SHOW SIGNIFICANT SIMILARITY
(reason: see B above); many recent advances concern the improved detection of similarity.