MCB 3421 Computer Lab 4: Databank Search Exercise A

Your name:
Your email address:

Today, we will focus on searches of databanks at NCBI's entrez and Google Scholar.   Related to this are bibiography softwares.  If you write scientific or other academic articles, you frequently need to cite the literature. It is a good idea to get used to using a bibliography program ASAP. This makes it easier to incorporate citations into an article, to reformat the bibliography, and to download citations from the internet. Popular choices are Endnote, Refwork, Zotero and Mendeley.
Endnote is popular, but expensive, and old versions usually stop working when you update your operating system. Also, citations incorporated into a text document cannot be used by other citation programs. Refwork also is a commercial software, but UConn has a subscription.
Mendeley and Zotero are similar, but no longer compatible.  - You still can export your personal library from one program and load it into the other, but the two softwares can no longer be used on the same document.  Mendeley is popular, it also can be used to keep track of pdf versions of articles, and it is updated within reasonable time, when Microsoft office or the operating system becomes incompatible with an older version.
Your instructor currently uses Mendeley, maintained by Elsevier. The software can be downloaded here. After you create a free account, your database of references is stored online, you can share folders with others, and you can use your references from different computers. The software comes with three important features: A) a bookmark for your browser that automatically downloads citation (and pdf, if available, from the cite you are visiting (e.g. pubmed, journal page, scopus, ... *), a plug-in for microsoft word, the allows you to insert citations into the text you are writing, and the Mendeley desktop, that allows access to your references, and different bibliography style sheets.  I strongly encourage you to install either Mendeley or Zotero on your personal computer.
*: This sometimes does not work as seamlessly as it should, because different sites use different Journal abbreviations, capitalization for the title and author initials. These inconsistencies disappear, when you always use the same source to fetch the references.

1. (less than 20minutes)
Use Pubmed in NCBI's Entrez to find an article written by Carl R. Woese (famous scientist, co-discoverer of the Archaea), published in the journal Proceedings of the National Academy of Sciences of the United States of America with the words primary kingdoms in the title of the paper. Try to use Boolean operators (ANDORNOT) and field tags; if you cannot recall the tags, use the pull-down menus under "advanced"  (link below the search text window).

What query did find the 1977 article?


If your search resulted in multiple matches, click on the link to the PNAS article.  You should see a page with the title, the abstract and a listing of similar articles.  How many similar articles (click on the "Similar articles" link in the right hand bar, scroll down and click on "See all similar articles" link) are linked to this article?
 
When was the most recent published (Hint: In the Display Options pull-down menu set the "Sort by" option to Pub date)?
 
How many articles cite the Woese 1977 PNAS paper, how many of these are publicly available?  (click on "Cited by" in the right bar, then select "See all Cited by articles". Then in the bar on the left hand select "Free Full Text")
 

2. (ca. 5 minutes) (Note: If Entrez' pulldown menus do not work well, use another browser)
In NCBI's Entrez/pubmed find the earliest paper co-authored by Senejani and Gogarten. What is the topic of the paper?

To learn more about inteins, in NCBI's Entrez select books as the target database to search (pulldown menu to the left of the search bar) and search for intein homing - the image in the right column is somewhat informative. For more information check the books (the first one has a chapter on inteins although most are much longer than the 150 aa that is mentioned as size in the book); however, as usual Wikipedia has more up-to-date information.

3. (15 minutes)
As a student at UConn you have access to different databases (e.g., Scopus). However in practice pubmed, and google scholar, are all that is needed. Many Journals are available in full text, if you log in with a UConn IP address. To have access to articles behind a paywall you either need to be on campus, or establish a vpn connection to uconn.  Instructions on how to do the latter are here
In case UConn does not have a subscription to a Journal, you can ask the library to provide you with a pdf for the article (reportedly, the library has a fast turn around time). Alternatively, you can use SciHub.
For a scientist of your choice (e.g., your advisor, or someone who publishes in your field of interest), use pubmed, and Google scholar to search for articles by this author.
Which scientist did you choose?

How many articles authored by this person did you find in pubmed, and google scholar? (comma separated, if your author of choice does not have a google scholar profile enter --)

How often was your author cited according to his Google Scholar profile?

What is the H-index for the author of your choice according to his Google Scholar profile? (At top in the right column of the Google Scholar profile)

What does the H-index mean (google or on the google scholar page, hover the pointer over the "h-index")?


4. (15 minutes)
Using Pubmed, search for articles co-authored by Taiz and Gogarten.

a) How many articles did you retrieve?


b) select the article by  L Zimniak, P Dittrich, J P Gogarten, H Kibak, L Taiz.  Scroll down to associated data, what is listed? 
.

c) Select search in nucleotides. Then select "run blast" in the right hand column.   On the form under organisms enter Crenarchaeota (start typing, then select from the offered choices - Daucus carota is a flowering plant, we search a divergent group to see how well the algorithm is working.).  Under algorithm options, increase the number of matches to 5000, and decrease the expect threshold to 0.0001. Place a check-mark in  "show results in a separate window" and click on BLAST.  Check "select all". 
How many matches did you obtain? 

d) Go back to the nucleotide entry (or here).  Click on the link behind "Protein_id".  Scan through the genbank entry for this protein, then select "run blast" in the right hand column. On the form under organisms enter Crenarchaeota. Under algorithm options, increase the number of matches to 5000, and decrease the expect threshold to 0.0001. Place a check-mark in the "show results in a separate window" and click on BLAST.  Once the search is done (which takes time), it will take some more time to completely load the results.  Check "select all"
How many matches did you obtain? 

 e)What might explain the difference in the number of matches from the two searches?


5. (10 minutes)
Using Entrez, search Protein (use drop-down box to select the Protein database) for WP_010886039.1 (this is an accession number, which have replaced the gi numbers previously in use, see historical note)

Run a BLAST search (same parameters as above) except
In "Choose Search Set"
  enter the taxon Euryarchaeota (taxid:28890). (you need to click on the offered choices after you start typing) 
  select RefSeq Select Proteins
In "general parameters Sequences" select 250 Maximum Target sequences
Once the search is done (the results are displayed piece by piece, you need to wait until the graphics tab shows up),
    Select the graphic tab .

Do you notice anything interesting about the alignments? (think intein) 

Click on the Taxonomy Tab and then on Taxonomy. For which genus and species are most homologs to WP_010886039.1 reported?

6. (10 minutes)
Repeat the above BLAST search but choose the Thermoplasmales (taxid:2301) as target organisms.
What is the difference to the previous results? What might explain this difference?

7. (5 minutes)
To what domain (super kingdom), phylum (kingdom), and family does Thermoplasma belong? (Use the Taxonomy Search. or click on Thermoplasma in the taxonomy report from the last exercise. In the line labeled lineage, if you hover the mouse pointer over the names, it tells you which taxonomic category you are pointing at. )

How many protein and genome sequences are available for Thermoplasma acidophilum, how many are available for the genus Thermoplasma? (In the taxonomy browser go to Thermoplasma and check protein and genome in the header, then click on <Display>)


Finished?

Check the appropriate radio button below before pressing the submit button:

Send email to your instructor (and yourself) upon submit
Send email to yourself only upon submit (as a backup)
Show summary upon submit but do not send email to anyone