Questions (yours) and Answers (mine, i

Questions (yours) and Answers (mine, i.e. J. Peter Gogarten)

> I am doing research work on tracking, monitoring, and quantifying some specific soil microbes in rhizosphere soil and roots of plants. My questions related to this course are:
-how can I know what microbes are very closed to Bacillus subtilis, Trichoderma harzianum, and Steptomyces lydicus based on their gene sequences. -I am trying to design some primers to detect these microbes, what genes I should use in primer designing?

The usual yardstick for "relatedness" is 16SrRNA. Especially for PCR this would be the gene to use, because you would not need to fight with synonymous substitutions (and the resulting redundancy in the primers) -- there are regions that are universally conserved, and others that can be used to make more specific primers (e.g., specific for Bacillus species).

The best place to get sequences (and alignments) is the RDPII (http://rdp.cme.msu.edu/index.jsp ). Download the sequences for the groups you want to detect, and the ones you want to exclude, align them (or download them already aligned), design your primers or probes. You can test the specificity of your probes using the above website.
Another thing that you might want consider is that organisms can have identical rRNA genes, but differ dramatically in their genome content. One way to get a handle on this is to use primers to sequences that are found repeatedly in the genome, the resulting PCR then gives a fingerprint that then can be used to measure the differences between the different strains.

>What exactly do you mean by purifying or conservative selection, and how is this different from “normal” selection?

This is currently one of the hottest topics in bioinformatics, molecular evolution, and molecular medicine. Kimura observed that most mutations found in members of a population are selectively neutral (the so-called neutral or nearly neutral theory of evolution), i.e., organisms that carry these mutations do not have a selective advantage or disadvantage compared to the rest of the population. However, there are many mutations that are never observed because they are detrimental, and as a consequence, these mutations are never turned into substitutions (i.e. fixed in the population). If you take a typical protein, some of the sites involved in what the protein is doing are so important that only one particular amino acid can do the job. As a result this site will always be occupied by this amino acid. In contrast, other positions might be less critical and amino acids might be changing back and forth. On the nucleotide level there are two types of substitutions, those that change the encoded amino acid (non synonymous substitutions), and synonymous ones (i.e., due to the redundancy of the genetic code the codon with the changed nucleotide still encodes for the same amino acid). Clearly, if the encoded amino acid is very important, it will never undergo a substitution, and the only substitutions that occur at the nucleotide level are synonymous ones. The type of selection that is revealed by many more synonymous than non-synonymous substitutions is called purifying selection. (The type of selection can be determined for individual genes, or for individual codons. The type of selection can also change over time.)

The absence of selection is revealed by equal rates of synonymous and non-synonymous substitutions. Pseudogenes (mRNA that are copied back into the genome, but that is not actually transcribed) are usually in this category. In this case all types of mutations have the same chance of being fixed, there is no selection at the amino acid level, because the gene is not expressed.

In rare instances a mutation that changes the encoded amino acid provides a selective advantage. This mutataion will be fixed in the population with higher probability and faster than a neutral mutation. As a result the codon or gene that frequently underwent a change that was fixed in the population because it provides a selective advantage will have experienced more non-synonymous than synonymous substitutions. These sites or genes are considered as being under positive (or diversifying selection).

It is these genes that in the past were under positive selection are of great interest to biologists, because it were these genes that provided a selective advantage to the ancestors of today’s organisms. For example, many groups search for genes that were under positive selection during the recent evolution of humans. Among the few genes under positive selection are a jaw-closing muscle that became dysfunctional and olfactory receptor encoding genes (see here and here).

Because the genes under positive selection are the ones that made organisms more fit, this type of selection is also called Darwinian selection. As Paul mentioned in his lecture recognizing sites that in the past frequently had changes that led to positive selection, is a great way to find influenza strains that are likely to be part of next years population. This strategy was explored by Walter Fitch, one of the pioneers in molecular evolution. More details on this and a link to a lecture and slides given by Walter Fitch is here (last box) – this is definitely worth checking out, the story of a single “older” molecular evolutionary biologist beating a whole bunch of CDC specialist in predicting evolution.

Usually approaches to detect positive selection require a comparative approach. A recent article by Plotkin et al. suggested otherwise (here). Their method uses codon “volatility”, a measure for how many synonymous substitutions could the used codons undergo. This might be very cool approach, especially because it is very difficult to pin down and align orthologs (for example in case of the olfactory receptor genes there are many paralogs, gene duplications, pseudogenes, etc). However, the proposal received a rather hostile reception (see here under brief communications, the first of several is here).

What is phylogenetics? Does it mean building trees?

The equation between trees and phylogeny is a widely propagated misconception. The origins of the word phylo-geny are Greek phlon, race or class and Greek -geneia, from -gens, born. (from the American Heritage Dictionary). Phylogeny describes how the larger taxonomic categories came into existence (as opposed to ontogeny which describes how the individual organism comes into existence). Botanist discovered long ago that the origin of many species results from the fusion of genomes belonging to different parent species. They coined the term reticulate evolution. There is hardly any crop plant that is not aneupolyploid (i.e. every cell contains copies of genomes from two different parent species. More here.).

Every eukaryotic cell represents the result of a fusion between at least two independent ancestors, an alpha proteobacterium that evolved into the mitochondrion, but whose genes nowadays mostly reside in the nucleus, and a host cell that was a close relative of the archaea. (There might have been many more organisms contributing genes over time, but except for the cyanobacteria, these additional contributors currently are less well defined.)

Many organisms are in fact microbial communities, whose members live in close association (e.g., lichen), and many (all?) microbial communities can be viewed as higher order entities with a shared genetic resource (open source genetics J).

Especially for microorganisms (but see here and here for recent examples of gene transfer between very divergent angiosperms) the evolutionary history of organisms is not tree-like, at best, it can be approximated by a tree. For more on this see
Gogarten, J. P., Doolittle, W. F., Lawrence, J. G. (2002). Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19, 2226-2238.
Zhaxybayeva, O., Gogarten, J. P. (2004). Cladogenesis, Coalescence and the Evolution of the Three Domains of Life. Trends in Genetics 20, 182-187

This is what Wikkipedia currently says on phylogeny:
A phylogeny (or phylogenesis) is the origin and evolution of a set of organisms, usually of a species. A major task of systematics is to determine the ancestral relationships among known species (both living and extinct), and the most commonly used methods to infer phylogenies include cladistics, phenetics, maximum likelihood, and Bayesian.

During the late 19th century, the theory of recapitulation, or Haeckel's biogenetic law, was widely accepted. This theory was often expressed as "ontogeny recapitulates phylogeny", i.e. that the development of an organism exactly mirrors the evolutionary development of the species. The early version of this hypothesis has since been rejected as being oversimplified and misleading. However, modern biology recognizes numerous connections between ontogeny and phylogeny, explains them using evolutionary theory, and views them as supporting evidence for that theory. See the article on ontogeny and phylogeny.