MCB 3421 Computer Lab 4: Databank Search Exercise A

Your name:
Your email address:

Today, we will focus on searches of literature data banks and of other databanks at NCBI's entrez.  Related to this are bibiography softwares.  If you write scientific or other academic articles, you frequently need to cite the literature. It is a good idea to get used to using a bibliography program ASAP. This makes it easier to incorporate citations into an article, to reformat the bibliography, and to download citations from the internet. Popular choices are Endnote, Refwork, Zotero and Mendeley.
Endnote is popular, but expensive, and old versions usually stop working when you update your operating system. Also, citations incorporated into a text document cannot be used by other citation programs. Refwork also is a commercial software, but UConn has a subscription.
Mendeley and Zotero are similar, but no longer compatible.  - You still can export your personal library from one program and load it into the other, but the two softwares can no longer be used on the same document.  Mendeley is popular, it also can be used to keep track of pdf versions of articles, and it is updated within reasonable time, when Microsoft office or the operating system becomes incompatible with an older version.
Your instructor currently uses Mendeley, maintained by Elsevier. The software can be downloaded here. After you create a free account, your database of references is stored online, you can share folders with others, and you can use your references from different computers. The software comes with three important features: A) a bookmark for your browser that automatically downloads citation (and pdf, if available, from the cite you are visiting (e.g. pubmed, journal page, scopus, ...), a plug-in for microsoft word, the allows you to insert citations into the text you are writing, and the Mendeley desktop, that allows access to your references, and different bibliography style sheets.  I strongly encourage you to install either Mendeley or Zotero on your personal computer.

1. (less than 20minutes)
Use Pubmed in NCBI's Entrez to find an article written by Carl R. Woese (famous scientist, co-discoverer of the Archaea), published in the journal Proceedings of the National Academy of Sciences of the United States of America with the words primary kingdoms in the title of the paper. Try to use Boolean operators (ANDORNOT) and field tags; if you cannot recall the tags, use the pull-down menus under "advanced"  (link below the search text window).

What query did find the 1977 article?


If your search resulted in multiple matches, click on the link to the PNAS article.  You should see a page with the title, the abstract and a listing of similar articles.  How many similar articles (click on the link in right hand bar, scroll down and click on view all) are linked to this article?  
When was the most recent published (Hint: In the Display Options pull-down menu set the "Sort by" option to Pub date)?   
How many articles cite the Woese 1977 PNAS paper, how many of these are publicly available?  (click on "Cited by" in the right bar, then select "See all Cited by articles". the bar on the left hand allows to select "Free Full Text")  

2. (ca. 5 minutes) (Note: If Entrez' pulldown menus do not work well, use another browser)
In NCBI's Entrez/pubmed find the earliest paper co-authored by Senejani and Gogarten. What is the topic of the paper?

To learn about inteins, in NCBI's Entrez select books as the target database to search (pulldown menu to the left of the search bar) and search for intein homing - the image in the right column is somewhat informative. For more information check the books (the first one has a nice chapter on inteins although most are much longer than the 150 aa that is mentioned as size in the book), or Wikipedia on inteins

3. (ca. 5 minutes)
Dr. Johann Peter Gogarten seems obsessed with ATP synthases and Inteins. Is he interested in anything else? How many articles has he published that are NOT related to the ATP synthase OR ATPase OR intein OR inteins? (Note, a complication is that there is a proliferation of authors with the same family name :))  - As usually happens, there is more than one way to formulate the search - parentheses are important.
What query did you assemble?

How many articles did you find?

4. (13 minutes)
Comparing search engines and databases (you want to open pubmed, google scholar, and Scopus in different tabs in your browser.  (UConn has a subscription to Scopus. To use it, you either need to be on campus, or establish a vpn connection to uconn.  Instructions on how to do the latter are here.  Aside, this can slow down your connection to huskyCT and Webex):
For a scientist of your choice (e.g., your advisor, or someone who publishes in your field of interest), use pubmed, Google scholar, and Scopus and to search for articles by this author.
Which scientist did you choose?

How many articles authored by this person did you find in pubmed, google scholar, and Scopus (comma separated, if your author of choice does not have a google scholar profile enter --)

How often was your author cited according to his Google Scholar profile?

What is the H-index for the author of your choice according to his Google Scholar profile? (At top in the right column of the Google Scholar profile)

What is the H-index (google or on the google scholar page, hover the pointer over the "h-index")?

In Scopus, search for your author, click on the name(s) of your author, wait ...
Click on citation overview.  How many articles did scopus find?
Click on "view H-graph" (link is in the upper right). How often were these articles cited according to scopus (below the graph)?
What is the H-index for the author of your choice according to Scopus?

5. (15 minutes)
Using Pubmed, search for articles co-authored by Taiz and Gogarten.

a) How many articles did you retrieve?


b) select the article by  L Zimniak, P Dittrich, J P Gogarten, H Kibak, L Taiz.  Scroll down to associated data, what is listed? 
.

c) Select search in nucleotides. Then select "run blast" in the right hand column.   On the form under organisms enter flowering plants (start typing, then select from the offered choices).  Under algorithm options, increase the number of matches to 5000, and decrease the expect threshold to 0.0001. Place a check-mark in  "show results in a separate window" and click on BLAST.  Check "select all". 
How many matches did you obtain? 

d) Go back to the nucleotide entry (or here).  Click on the link behind "Protein_id".  Scan through the genbank entry for this protein, then select "run blast" in the right hand column. Under algorithm options, increase the number of matches to 5000, and decrease the expect threshold to 0.0001. Place a check-mark in the "show results in a separate window" and click on BLAST.  Once the search is done (which takes time), it will take some more time to completely load the results.  Check "select all"
How many matches did you obtain? 

 e)What might explain the difference in the number of matches from the two searches?


6. (10 minutes)
Using Entrez, search Protein (use drop-down box to select the Protein database) for 19888400 (this is a gi number, see historical note)

Run a BLAST search (same parameters as above) except select the taxon Thermoplasmatales (taxid:2301).  Select the graphic tab once the search is done (the results are displayed piece by piece, you need to wait until the graphics tab shows up).

Do you notice anything interesting about the alignments? (think intein) 

Click on the Taxonomy Tab and then on Taxonomy. For which genus and species are most homologs to 19888400 reported?

7. (5 minutes)
To what domain (super kingdom), phylum (kingdom), and family does Thermoplasma belong? (Use the Taxonomy Search. or click on Thermoplasma in the taxonomy report from the last exercise. In the line labeled lineage, if you hover the mouse pointer over the names, it tells you which taxonomic category you are pointing at. )

How many protein and genome sequences are available for Thermoplasma acidophilum, how many are available for the genus Thermoplasma? (In the taxonomy browser go to Thermoplasma and check protein and genome in the header, then click on <Display>)


Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone