Assignments for Today

 

Assignments for Wednesday

 

 

Progressive alignment of multiple sequences
(e.g. clustalw/clustalx):

1) Pairwise distance calculation
2) Clustering analysis of the sequences based on pairwise alignment.
3) Iterated alignment of two most similar sequences or groups of sequences.

Problem: Step two can create a strong bias, that is recovered as "signal" in future analyses of the multiple sequence alignment.

PPT Slides for today

From:<http://dml.cmnh.org/2002Jul/msg00351.html>

----- Original Message -----
From: <Dinogeorge@aol.com>
Sent: Thursday, July 11, 2002 6:47 PM
Subject: Re: New finds

 

> > --+--+-----------A
> >   |  `--+--+-----B
> >   |     |  `--+--C
> >   |     |     `--D
> >   |     `--------E
> >    `--------------F
>
> This is >not< a Hennigian comb. Only the entire ABCDE clade and the F
lineage
> make a (two-toothed) Hennigian comb in this cladogram. In a Hennigian comb
> the side branches are left unbranched, like the teeth of a comb. Hence the
> name.

This _is_ a Hennigian comb, because in a cladogram, _only_ topology counts.
A cladogram is a mobile. Look at the following -- it's exactly the same
cladogram as above:

--+--F
  `--+--A
     `--+--E
        `--+--B
           `--+--D
              `--C

... what a side branch is lies completely in the hand of the presentator.
All I did was I rotated a few stems around their long axes.

 

 

Intro to phylogenetic reconstruction

Phylogenetic analysis is an inference of evolutionary relationships between organisms.
Those relationships are usually represented by tree-like diagrams.
Note:
the assumption of exclusively tree-likeliness of evolution is not justified.

Steps of the phylogenetic analysis:


Compilation of sequence dataset
Alignment
Determination of substitution model
Tree building
Tree evaluation

 

 

Why phylogenetic reconstruction of molecular evolution?

A) Systematic classification of organisms

      e.g.: Who were the first angiosperms? (i.e. where are the first angiosperms located relative
      to present day angiosperms?)

      Where in the tree of life is the last common ancestor located?

B) Evolution of molecules

e.g.: domain shuffling, reassignment of function, gene duplications, horizontal gene transfer, drug targets, detection of genes that drive evolution of a species/population (e.g. influenca virus, see here for more examples)

C) Identification of organisms

e.g., phylotyping in microbiom samples),
origin
of genes and viruses (e.g. recent ebola out break)

How:

1) Obtain sequences

Sequencing

Databank Searches -> ncbi a) entrez, b) BLAST, c) blast of pre-release data

Friends

 

2) Determine homology (see notes for earlier classes for practical implementation)

Reminder on Definitions:
Homology: Two sequences are homologous, if there existed an ancestral molecule in the past that is ancestral to both of the sequences

3) Align sequences

(most algorithms used for phylogenetic reconstruction require a global alignment. An exception is statalign
from Thorne JL, and Kishino H, 1992, Freeing phylogenies from artifacts of alignment. Mol Bio Evol 9:1148-1162)

Some evolutionary biologists recommend to select only the part of the alignment that is reliable. (Discuss!) Modify alignment, if necessary.

 

4) Reconstruct evolutionary history

    A) Distance analyses

      1. calculate pairwise distances
        (different distance measures, correction for multiple hits, correction for codon bias)
      2. make distance matrix (table of pairwise corrected distances)
      3. calculate tree from distance matrix
i) using optimality criterion
(e.g.: smallest error between distance matrix
and distances in tree, or use
ii) algorithmic approaches (UPGMA or neighbor joining)

    B) Parsimony analyses

      find that tree that explains sequence data with minimum number of substitutions

      (tree includes hypothesis of sequence at each of the nodes)

       

    C) Maximum Likelihood analyses

      given a model for sequence evolution, find the tree that has the highest probability under this model.

      This approach can also be used to successively refine the model.

      Bayesian statistics use ML analyses to calculate posterior probabilities for trees, clades and evolutionary parameters. Especially MCMC approaches have become very popular in the last year, because they allow to estimate evolutionary parameters (e.g., which site in a virus protein is under positive selection), without assuming that one actually knows the "true" phylogeny.

       

      D - ...) Else:
      spectral analyses, evolutionary parsimony, i.e., look only at patterns of substitutions,

Another way to categorize methods of phylogenetic reconstruction is to ask if they are using

  • an optimality criterion (e.g.: smallest error between distance matrix and distances in tree, least number of steps), or
  • algorithmic approaches (UPGMA or neighbor joining)

5) Interpret the result.

It is especially important to consider artifacts that might originate in phylogenetic reconstruction, and to asses the reliability of your results.