Today, we will focus on searches of databanks at NCBI's entrez and Google Scholar.
Related to this are bibiography softwares. If you write scientific or other academic
articles, you frequently need to cite the literature. It is a good
idea to get used to using a bibliography program ASAP. This makes it
easier to incorporate citations into an article, to reformat the
bibliography, and to download citations from the internet. Popular
choices are Endnote, Refwork, Zotero and Mendeley.
Endnote is popular, but expensive, and old versions usually stop
working when you update your operating system. Also, citations
incorporated into a text document cannot be used by other citation
programs. Refwork also is a commercial software, but UConn has a
subscription.
Mendeley and Zotero are similar, but no longer compatible. - You
still can export your personal library from one program and load it
into the other, but the two softwares can no longer be used on the
same document. Mendeley is popular, it also can be used to keep
track of pdf versions of articles, and it is updated within reasonable
time, when Microsoft office or the operating system becomes
incompatible with an older version.
Your instructor currently uses Mendeley, maintained by Elsevier. The
software can be downloaded here.
After you create a free account, your database of references is stored
online, you can share folders with others, and you can use your
references from different computers. The software comes with three
important features: A) a bookmark for your browser that automatically
downloads citation (and pdf, if available, from the cite you are
visiting (e.g. pubmed, journal page, scopus, ... *), a plug-in for
microsoft word, the allows you to insert citations into the text you
are writing, and the Mendeley desktop, that allows access to your
references, and different bibliography style sheets. I strongly
encourage you to install either Mendeley
or Zotero on
your personal computer.
*: This sometimes does not work as seamlessly as it should, because different sites use different Journal abbreviations, capitalization for the title and author initials.
These inconsistencies disappear, when you always use the same source to fetch the references.
1. (less than 20minutes)
Use Pubmed
in NCBI's Entrez to find an article written by
Carl R. Woese (famous scientist,
co-discoverer of the Archaea), published in the journal
Proceedings of the National Academy of Sciences of the United States
of America with the words primary kingdoms
in the title of the paper. Try to use Boolean operators
(AND, OR, NOT)
and field
tags; if you cannot recall the tags, use the pull-down menus
under "advanced"
(link below the search text window).
What query did find the 1977 article? |
|
If your search resulted in multiple matches, click on the link to the
PNAS article. You should see a page with the title, the abstract
and a listing of similar articles. How many similar articles
(click on the "Similar articles" link in the right hand bar, scroll down and click on "See all similar articles" link)
are linked to this article?
When was the most recent published (Hint: In the Display Options
pull-down menu set the "Sort by" option to Pub
date)?
How many articles cite the Woese 1977 PNAS paper, how many of these
are publicly available? (click on "Cited by" in the right bar,
then select "See all Cited by articles". Then in the bar on the left hand
select "Free Full Text")
2. (ca. 5 minutes) (Note: If Entrez'
pulldown menus do not work well, use another browser)
In NCBI's
Entrez/pubmed find the earliest paper co-authored by Senejani and
Gogarten. What is the topic of the paper?
To learn more about inteins, in NCBI's
Entrez select books as the target database to search (pulldown menu to
the left of the search bar) and search for intein homing - the image
in the right column is somewhat informative. For more information
check the books (the first one has a chapter on inteins although
most are much longer than the 150 aa that is mentioned as size in the
book); however, as usual Wikipedia
has more up-to-date information.
3. (15 minutes)
As a student at UConn you have access to different databases (e.g., Scopus). However in practice pubmed, and
google
scholar, are all that is needed. Many Journals are available in full text, if you log in with a UConn IP address.
To have access to articles behind a paywall you either need to be on campus, or establish a vpn
connection to uconn. Instructions on how to do the latter are here.
In case UConn does not have a subscription to a Journal, you can ask the library to provide you with a pdf for the article (reportedly, the library has a fast turn around time).
Alternatively, you can use SciHub.
For a scientist of your choice (e.g., your advisor, or someone who
publishes in your field of interest), use pubmed, and Google scholar to search for articles by this author.
Which scientist did you choose?
How many articles authored by this person did you find in pubmed,
and google
scholar?
(comma separated, if your author of choice does not have a google
scholar profile enter --)
How often was your author cited according to his Google Scholar
profile?
What is the H-index for the author of your choice according to his
Google Scholar profile? (At top in the right column of the Google
Scholar profile)
What does the H-index mean (google or on the google scholar page, hover the
pointer over the "h-index")?
4. (15 minutes)
Using Pubmed,
search for articles co-authored by Taiz and
Gogarten.
a) How many articles did you retrieve?
b) select the article by L Zimniak, P Dittrich, J P Gogarten, H
Kibak, L Taiz. Scroll down to associated data, what is
listed?
.
c) Select search in nucleotides. Then select "run blast" in the right
hand column. On the form under organisms enter Crenarchaeota
(start typing, then select from the offered choices - Daucus carota is a flowering
plant, we search a divergent group to see how well the algorithm is working.).
Under
algorithm options, increase the number of matches to 5000, and
decrease the expect threshold to 0.0001. Place a check-mark in
"show results in a separate window" and click on BLAST. Check
"select all".
How many matches did you obtain?
d) Go back to the nucleotide entry (or here).
Click on the link behind "Protein_id". Scan through the genbank
entry for this protein, then select "run blast" in the right hand
column.
On the form under organisms enter Crenarchaeota.
Under algorithm options, increase the number of matches to
5000, and decrease the expect threshold to 0.0001. Place a check-mark
in the "show results in a separate window" and click on BLAST.
Once the search is done (which takes time), it will take some more
time to completely load the results. Check "select all"
How many matches did you obtain?
e)What might explain the difference in the number of matches
from the two searches?
5. (10 minutes)
Using Entrez,
search Protein (use drop-down box to select
the Protein database) for WP_010886039.1 (this is an accession number, which have replaced the
gi numbers previously in use, see historical
note)
Run a BLAST search (same parameters as above) except
In "Choose Search Set"
enter the taxon Euryarchaeota (taxid:28890). (you need to click on the offered choices after you start typing)
select RefSeq Select Proteins
In "general parameters Sequences" select 250 Maximum Target sequences
Once the search is done (the results are displayed piece by piece, you
need to wait until the graphics tab shows up),
Select the graphic tab
.
Do you notice anything interesting about the alignments? (think
intein)
Click on the Taxonomy Tab and then on Taxonomy. For which genus and
species are most homologs to WP_010886039.1 reported?
6. (10 minutes)
Repeat the above BLAST search but choose the Thermoplasmales (taxid:2301) as target organisms.
What is the difference to the previous results? What might explain this difference?
7. (5 minutes)
To what domain (super kingdom), phylum (kingdom), and family does Thermoplasma
belong? (Use the Taxonomy
Search. or click on Thermoplasma in the taxonomy report from the
last exercise. In the line labeled lineage, if you hover the mouse
pointer over the names, it tells you which taxonomic category you are
pointing at. )
How many protein and genome sequences are available for Thermoplasma
acidophilum, how many are available for the genus Thermoplasma?
(In the taxonomy browser go to Thermoplasma and check
protein and genome in the header, then click on <Display>)
Check the appropriate radio button below before pressing the submit
button: