Assignments for Friday

Assignments for Monday

 

Regarding pop-up quiz (see here for question and discussion):

2 and 4 letter alphabet

Updated Slides on Mutual Aid versus(?) Natural Selection here

Natural Selection and Evolution

When does "evolution" occur? An algorithmic approach.

"Darwin's Dangerous Idea" by Daniel C. Dennett, Chapter on Evolution as algorithm is a reading assignment for Monday, Sept. 13. [available through WebCT]

What is needed for evolution to occur?

(Note, this is different from stating that this is all that occurs in evolution)

  • Offspring similar but not identical to parents
  • More offspring than necessary
  • Competition for resources, mates => survival of the fittest.

What processes in biological evolution go beyond inheritance with variation and selection? (We'll discuss many of the following later in the semester.) Alternatives to evolution by natural selection?

  • Horizontal gene transfer and recombination
  • Polyploidization (angiosperm and vertebrate evolution) see here and here
  • Fusion and cooperation of organisms (Kefir, lichen, also the eukaryotic cell)
  • Evolution of the holobiont (host + symbionts)
  • Targeted mutations (?), genetic memory (?) (see Foster's and Hall's reviews on directed/adaptive mutations; see here for a counterpoint)
  • Random genetic drift
  • Gratuitous complexity
  • Selfish genes (who/what is the subject of evolution?)
  • Parasitism, altruism, gene transfer agents
  • Mutationism, hopeful monsters

 


DataBank Searches at NCBI. Information Retrieval using Entrez.



NCBI (National Center for Biotechnology Information) is a home for many public biological databases (see diagram below). All of the databases are interlinked, and they all have common search and retrieval system - Entrez

entrez connections old
old entrez connections

 

A list of the different databases in ENTRZ is here.

A Pubmed tutorial click here (goes well beyond what you need to know for Friday).

Use Boolean operators (ANDORNOT) to perform advanced searches. Here is an explanation of the Boolean operators .

Search Field Tags- Listed here.

Explore features of NCBI Search interface: Advanced Search, Index, Clipboard and MyNCBI.

 

Other Useful Databases and Services:

While Medline is incorporating more and more non-medical literature, there might still be gaps in the coverage. Alternatives are other databanks available though the National Library of Medicine (here) and the local services offered at the UConn libraries. Especially Current Contents and Agricola nicely complement PubMed. The best way to access them is through the UConn library's website. In particular, the "Web of Science" database gives access to the Science Citation Index: a database that tracks cited references in journals. Scopus provides similar services. (But Google Scholar has gotten nearly as useful -- eg here.)

Note that many resources are restricted to the UConn domain, thus you either need to access them from a campus computer or through the proxy account. The university now provides easy vpn access through the Juno Pulse application (see http://remoteaccess.uconn.edu/vpn-overview/connect-via-vpn-client/).

In searching PubMed, you can add links to online journals for which UConn has a subscription. (If you are outside UConn, you need to use vpn for the links to work). The link to use for pubmed is http://www.ncbi.nlm.nih.gov/sites/entrez?otool=uconnlib

 


If you want to be informed about new sequences/articles in your research area? Check out these services (- you also can use MyNCBI for this, but I use Pubcrawler for several years and it works reliably): 

2 PubCrawler
3 Swiss-Shop

Comments

Use MyNCBI at Entrez or PubCrawler for repeating searches in regular intervals.

Do example on clipboard and index. (use GI 2266989 (nucl) and 3334404 (prot))
How many related sequences does the protein sequence have?
Demonstrate BLINK

Bottom lines:
a) Genbank is redundant
b) If possible, it is preferable to use a 20 letter protein sequence as query rather than a 4 letter nucleotide sequence!


Other web pages:

Nucleic Acid Research Database Issue
Every year, the first issue of Nucleic Acid Research is devoted to updates on biological databases.
(link to the databank issue in the section browse pull-down menu on top)

http://www.ebi.ac.uk/
The European homolog/analog to NCBI, software archive.

http://rdp.cme.msu.edu/
The US ribosomal databank project

http://www.arb-silva.de/
ARB-Silva - the europaen RDB alternative

http://greengenes.lbl.gov
Green Genes- 16S rRNA database and tools at the Lawrence Berkeley National Laboratory

http://www.jgi.doe.gov/
Genomes at the DOE joint genome institute

http://www.genomesonline.org/
List of completed genomes and ongoing genomes

http://www.flybase.org/
Database of Drosophila Genome

http://www.arabidopsis.org/
TAIR - The Arabidopsis Information Resource

http://www.ensembl.org/
Ensembl Genome Browser (Eukaryotic genomes, including Human and Mouse genomes)

Sequence and structure databanks can be divided into many different categories.
One of the most important is:

 

Supervised databanks with gatekeeper.

Examples:

  • Swissprot
  • Refseq (at NCBI)

Entries are checked for accuracy.
+ more reliable annotations
-- frequently out of date

 

 

Repositories without gatekeeper.

Examples:

  • GenBank
  • EMBL
  • TrEMBL

Everything is accepted.
+ everything is available
-- many duplicates
-- poor reliability of annotations

 

One problem in maintaining databanks (supervised and unsupervised) is "owner ship" of sequences, which in many data banks prevents a continuous update of sequences. Even if errors are detected, they are not easily removed form the databank. E.g. ATP synthase operons in E.coli see Fig.1 in http://mic.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.033811-0#tab2

 

Types of Error in a Databank search

False positives: The number of false positives are estimated in the E-value. The P-value or significance value gives the probability that a positive identification is made in error (same as with drug tests).
Danger: avoid fishing expeditions. If you do 100 tests on random data, you expect one to be positive at the 1% significance level.

You could apply the Bonferroni correction:

The significance level for the individual test is calculated through dividing the overall desired significance level by the number of parallel tests. The hypothesis to be be rejected is that Not all of the individual tests are significantly different from chance. (all in the sense of "at least one"

False negatives: Homologous sequences in the databank that are not recognized as such. If there are only 12000 different protein families, an average a sequence should have (size of the databank)/12000 matches. In other words, the number of false negatives is probably very large.

How old is life on Earth?

  • The Earth is about 4.6 Ga old, but no crustal rocks has survived from that time. The oldest rocks are no older than 4.0 Ga. But zircon crystals dated to 4.4 billion years BP are found in rocks, are considered evidence for Earth's crust having had formed and being in contact with water already 4.4 billion years BP (here and here)

Morphological Fossil Evidence:
  • For about a decade the oldest microfossils were considered to be about 3.5 Ga old (see here). The fossils (as interpreted by Bill Schopf) look like "modern" Cyanobacteria. Compare the time to to molecular trees of life: Is this a problem? However, the evidence for these fossils was questioned.

  • 3.2Ga old filamentous fossils, probably of thermophilic chemotrophic prokaryotes (Rasmussen, 2000)

  • 1.8Ga old fossils from Gunflint formation: iron-loving bacteria and cyanobacteria
Biological Signature Evidence (examples):
  • Oldest geological evidence for life - 3.8 Ga ago - is based on 13C discrimination (carbon derived from living systems often have lower delta 13C values than inorganic carbonates) [here]. The rocks are from Akilia island off the coast of Greenland, and severely altered by metamorphism. However, recently the evidence for that was reassessed.
  • 3.7 billion year of rocks from the Isua formation in Greenland contain structures described as stromatholites. here in NYT, report in Nature
  • The amount of carbon in the Issua formation (and its discrimination against agains 13C) is interpreted by Minik Rosing to indicate a highly productive biosphere. (See Minik's presentation at the 2011 ISSOL meeting. Especially slides 6 ff are interesting - the handwritten slides were written with the sedimentary rocks - containing lots of graphite.)

  • 2.7Ga old: probable biomarkers of cyanobacteria and of eukaryotes (Roger Summons, Roger Buick and Jochen Brocks)

See Olga's Timeline of the Universe here