Future Monday and Wednesday classes will take place in CB 206!

Assignments for Friday's class:

Assignments for Monday's class:

 

 

 

DataBank Searches at NCBI. Information Retrieval using Entrez.



NCBI (National Center for Biotechnology Information) is a home for many public biological databases (see diagram below). All of the databases are interlinked, and they all have common search and retrieval system - Entrez.

entrez connections old

 

A list of the different databases in ENTRZ is here.

A Pubmed tutorial click here. An Entrez tutorial (non interactive) is here (both go well beyond what you need to know for Friday).

Use Boolean operators (AND, OR, NOT) to perform advanced searches. Here is an excellent explanation of the Boolean operators from the Library of Congress Help Page.

Search Field Tags- Listed here.

Explore features of Entrez interface: Advanced Search, Index,Clipboard and MyNCBI.

 

Other Useful Databases and Services:

While Medline is incorporating more and more non-medical literature, there might still be gaps in the coverage. Alternatives are other databanks available though the National Library of Medicine (here) and the local services offered at the UConn libraries. Especially Current Contents and Agricola nicely complement PubMed. The best way to access them is through the UConn library's website. In particular, the "Web of Science" database gives access to the Science Citation Index: a database that tracks cited references in journals.

Note that many resources are restricted to the UConn domain, thus you either need to access them from a campus computer or through the proxy account. In some instances you are prompted to connect to the UConn VPN network or through EZproxy (the latter is new, and not all links have migrated to using EZproxy).


If you want to be informed about new sequences/articles in your research area? Check out these services (- you also can use MyNCBI for this, but I use Pubcrawler for several years and it works reliably):

2 PubCrawler
3 Swiss-Shop

Comments

In searching Entrez, you can add links to online journals for which UConn has a subscription. (If you are outside UConn, you need to set up a proxy account for the links to work).

The link to use is http://www.ncbi.nlm.nih.gov/sites/entrez?otool=uconnlib

Use MyNCBI at Entrez for repeating searches in regular intervals (Alternative is Pubcrawler see above).

Do example on clipboard and index. (use GI 2266989 (nucl) and 3334404 (prot))
How many related sequences does the nucleotide sequence have?
How many related sequences does the encoded protein sequence have? (check page 400 and 1000)
Demonstrate Links and BLINK

Bottom lines:
a) Genbank is redundant
b) If possible, it is preferable to use a 20 letter protein sequence as query rather than a 4 letter nucleotide sequence!


Other web pages:

Nucleic Acid Research Database Issue
Every year, the first issue of Nucleic Acid Research is devoted to updates on biological databases.
(link to the databank issue is in the right hand bar on top)

http://www.ebi.ac.uk/
The European homolog/analog to NCBI, software archive.

http://rdp.cme.msu.edu/
The US ribosomal databank project

http://www.arb-silva.de/
ARB-Silva - the europaen RDB alternative

http://greengenes.lbl.gov
Green Genes- 16S rRNA database and tools at the Lawrence Berkeley National Laboratory

http://www.jgi.doe.gov/
Genomes at the DOE joint genome institute

http://www.genomesonline.org/
List of completed genomes and ongoing genomes

http://www.flybase.org/
Database of Drosophila Genome

http://www.arabidopsis.org/
TAIR - The Arabidopsis Information Resource

http://www.ensembl.org/
Ensembl Genome Browser (Eukaryotic genomes, including Human and Mouse genomes)

 

Sequence and structure databanks can be divided into many different categories.
One of the most important is:

 

Supervised databanks with gatekeeper.

Examples:

  • Swissprot
  • Refseq (at NCBI)

Entries are checked for accuracy.
+ more reliable annotations
-- frequently out of date

 

 

Repositories without gatekeeper.

Examples:

  • GenBank
  • EMBL
  • TrEMBL

Everything is accepted.
+ everything is available
-- many duplicates
-- poor reliability of annotations

 

 

One problem in maintaining databanks is "owner ship" of sequences, which in many databanks prevents a continuous update of sequences. Even is errors are detected, they are not easily removed form the databank. E.g. ATP synthase operons in E.coli see http://mic.sgmjournals.org/cgi/content-nw/full/156/7/1909/F1

ORF finder illustration of ORFs

ATPsynthaseORFs

ATPsynthase ORF as query in databank search (BLink):

blink Example

Alternative ORF as query in databank search:

wrong ORF

 

Alternative ORFs in BLink

#3:

Blinkexample2

#1:

blink#1