Lecture 11
Goals
- Be aware that LBA is a systematic and reproducible artifact.
- Know the many reasons as to why gene and species trees might differ, and how one can decide if a difference is due to gene transfer or duplication and loss.
- Understand the differences between Maximum Likelihood Estimation and Bayesian approaches to phylogenetics.
- Understand the principle of obtaining posterior probabilities through MCMCMCs
Links
- Slides Intro to phylogenetic reconstruction continued, Bayesian analyses, supertree vs supermatrix approaches.
Assignment for Wednesday (11/5)
Play with Paul Lewis's MCRobot. Explore a differing number of heated chains, and different probability landscapes. https://plewis.github.io/applets/mcmc-robot/
Work through Olga's webpage giving an example on Baysian thinging
Computer lab 10
Computer-lab assignment 10; pdf
Goals
- Know how to do a maximum likelihood ratio test
- Know that Maximum Likelihood approaches allow to avoid over parameterization.
- Know how different substitution models are defined,
- Know how to run iqtree on xanadu (the latest version is invoked with iqtree2, after you loaded the module).
- Know what the term Long Branch Attraction Artifact refers to, and that more sophisticated ml-approaches do much better than simple parsimony analysis to avoid LBA (it remains a sad realization that sometimes nature apparently does not follow Okkham's razor).
Assignments for Monday see below.
Lecture17
Goals
- Understand the differences between Parsimony, Maximum Likelihood and Bayesian approaches to phylogenetic reconstruction
- Know what non parametric bootstrapping is, and how it can be applied to many different approaches of phylogenetic analysis
- Appreciate the huge number of tree topologies for a given number of leaves, and the implications this has for heuristic searches of tree space.
- Appreciate the difference between likelihood of a tree or model and the probability of a tree or model
- Know how the the posterior probability is related to the Prior and the likelihood of the data.
Links
- Slides Intro to phylogenetic reconstruction continued.
Assignment for Friday
- If you have problems wrapping your head around non-parametric bootstrapping, watch this YouTube video (warning some may find the sound track a little strange)
Assignment for Monday (10/30)
- Take the quiz from the Tree Thinking Challenge
- For additional motivation on the importance of molecular phylogenies meditate about the COVID pandemic. A nice interactive tree is here. There is a play button top left, and you can select different classification schemes and time intervals. I am particularly impressed by the long stem branches that some clades have before they were sampled and diversified (e.g. emerging lineage 22F). Note the the piecharts on the geographic regions are updated in sync with the phylogeny.
Lecture16
Goals
- Know that swapping branches around a node does not change the meaning of a phylogenetic tree
- Know the principle behind parsimony analysis and Occam's razor (or Ockham's razor, aka lex parsimoniae)
- Know the similarity and differences between parsimony and maximum likelihood based phylogenetic reconstruction
- Understand the differences between Parsimony and Maximum Likelihood Estimation
- Know that algorithmic approaches (neighbor joining) are fast, whereas parsimony and maximum likelihood need to do a heuristic exploration of tree space.
Links
- Slides Intro to phylogenetic reconstruction.
Assignment for Wednesday
Computer lab 9
Computer-lab assignment 09; pdf
Goals
- explore different alignment approaches
- edit and dissect sequence alignments in seaview
- Compare phylogenetic trees calculated for intein and extein sequences
- use intein sequences dissected from a multiple sequence alignment to predict the structure
of the intein
- discuss the predicted structure in terms of the presence of self splicing domains and LAGLIDADG domains.
Assignment for Monday
- Contemplate the different ways genes can be duplicated, and how they can persist over long periods of time
- Try to understand how pseudogenization can lead to a post mating barrier for diverging populations.
- Read excerpts of Chapters 5 and 6 from Li's "Molecular Evolution" on HuskyCT
Lecture 15
Goals
- Know the different pathways through which gene families can expand in a genome
- Know about the fate of duplicated genes
- Understand that gene duplication followed by gene loss may be important in erecting post mating hybridization barriers.
- Appreciate that the concept of mutual aid and natural selection are not mutually exclusive concepts.
Links
SLIDES on lab 8, Gene and Genome Duplications, Mutual aid
Assignments for Friday
- Recall the intein / extein searches from lab 4. From this class you should have a phamily of sequences with homologs to the intein, and it the intein-free version was not in the same phamily, you should have these in a second phamily. We will provide sequences for you to work with, but it might be nice to work on your own set of sequences.
Computer lab 8
Computer-lab assignment 08; pdf
Goals
- know how to perform blast searches from the command line
- know how to create a blast searchable databank
- be able to retrieve matching target sequences from a databank
- know how to process the tabular blast output files.
Lecture 14
Goals
- Know about some processes in evolution that go beyond natural selection acting on gradual changes
- Be aware that many scientific heroes were children of their time
- Know some short-comings of the Modern Synthesis
Links
slides on Images to depict evolution, Mutualism and Mutual Aid
Assignments for Monday
Lecture 13
Goals
- Appreciate that many important characteristics (such as photosynthesis) of living organisms were transferred horizontally
- Understand the terminology used in cladistics
- Understand the concerns about not considering paraphyletic groups as proper taxonomic units
- Know why fish do not exist
- Contemplate the utility of a natural taxonomic system in light of endosymbiosis and HGT
Links
Slides on lab 7, photosynthesis in the ToL and cladistics.
The heat of this controversy is reflected in the following excerpt from from Tom Cavalier Smith http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2842702/ :
"Oddly, the school of ‘phylogenetic systematics’ founded by Hennig (1966) grossly downplayed the phylogenetic importance of progressive change compared with splitting, seen by them as so all-important that many Hennigian devotees dogmatically insist that ancestral groups like Bacteria, Protozoa and Reptilia be banned. Hennig called such basal groups with a monophyletic origin ‘paraphyletic’ and redefined monophyly to exclude them and embrace only clades, likewise redefined as including all descendants of their last common ancestor. This redefinition of ‘clade’ is universally accepted, but Hennig's extremely confusing and unwise redefinition of monophyly is not. Though accepted by many, sadly probably the majority (especially the most vociferous and over self-confident, and those fearful of bullying anonymous referees, of whom I have encountered dozens mistakenly insisting without reasoned arguments that paraphyletic taxa are never permissible), it is rightly firmly rejected by evolutionary systematists who consider the classical distinction between polyphyly and paraphyly much more important than distinguishing two forms of monophyly (paraphyly and holophyly, using the precise terminology of Ashlock (1971), where holophyletic equals monophyletic sensu Hennig)."
Computer lab 6
Computer-lab assignment 07 docx
Computer-lab assignment 07 pdf
Goals
- Be able to submit scripts to the queue on Xanadu
- Interpret dotplots that compare genomes from closely related specis, even if the two genome sequences are not given with homologous starting points.
- Be able to adjust dotplots interactively to highlight similarity between to sequences.
- Be aware in dotplots noise appears in the form of short diagonals.
Assignment for Monday
Read through the letter of Ernst Mayr criticizing Carl Woese's three domain classification. Woese's reply is here but it is rather lengthy. Instead, read the shorter argument to abolish the term prokaryotes by Norm Pace here.
Midterm
In TLS 301, bring a pencil and eraser.
Lecture 12 (10/7)
Goals
- Be prepared for the midterm
- Understand how to infer within genome recombination events in circular genomes from gene - and nucmer plots
- Know that recent data suggest rapid intein spread in local phage populations.
Links
Slides gene drive in phage populations, inferring within genome recombination events from gene and nucmer plots
Assignment
for Friday (10/11)
Make sure that you know where you saved the sequences you retrieved and analyzed in lab 4 (the phage intein and extein sequences) and lab 6 (the genome sequences of selected bacteria)
Computer lab 6
Computer-lab assignment 06 docx
Computer-lab assignment 06 pdf
Goals
- Know how cumulative strand bias helps to infer genome structure of prokaryotic genomes (Ori, leading/lagging strand, terminus of replication).
- Appreciate the power of hashes as on the fly counters and as flexible data structures to associate keys with values.
- Appreciate scripts to perform repetitive tasks
Assignment
for Monday
Think about questions to ask before the midterm.
Lecture 11 (10/2)
Goals
- Know about the debates concerning the hot origins of life.
- Understand the arguments for the domain ancestors being survivors of a catastrophe (impact?) that selected to thermophyly
- appreciate that early Earth was a more violent place than today's Earth
- know that the temperature under which an organism lives is reflected in sequence composition of proteins and RNAs
- Understand genome structure of prokaryotic genomes (Ori, leading/lagging strand, terminus of replication).
- Know two explanations that can explain the preponderance of recombination events between points equidistant to the origin of replication
- co-occurrence of recombination with replication;
- Architecture Imparting Sequences (AIMS) and strand bias are not disrupted/misplaced (i.e., these are the only recombination events that do not lead to a drop in fitness)
Links
Slides on arguments against a hyperthermophylic LUCA, Strand Bias, Recombination and AIMS
Assignments
for Friday (10/4)
- Refresh your memory on FileZilla, ssh, and Xanadu
- if you have difficulties wrapping your head around the dot -plot comparison between sequences, a very thorough explanation is in the first 14 minutes of this youtube video (45 seconds to 13.4 minutes)
for Monday (10/7)
- Think about questions you would like to have answered before the midterm.
Optional: See the articles listed in the notes to Lecture 10
Lecture 10, 9/30
Goals
- Understand the transitive property of homology and its limitation
- Appreciate the differences between different approaches to calculate MSAs
- know about the arguments in favor of a vibrant biosphere being present before 3.8 Ga BP
- appreciate the difficulties of interpreting ancient microfossils.
Links
- Discussion of the transitive property of homology, MSAs, the early evolution of life and impact frustration of an earlier biosphere: Slides
Assignments for Wednesday 10/2
-
Read through article from Tillier and Collins on Genome rearrangement by replication-directed translocation (also available on HuskyCT). Try to understand Figure 1 and 2. Can you think of alternative explanations?
-
Make sure that your electronic notebook is in good shape. In particular, you should have reflective statements on every lecture. If you do not use the OneNote class notebook (to which I have access), share the notebook with the instructors via email. (Different dates for this are floating around, try to update things by 10/2/2024 )
-
Optional, incase you are interested in the early evolution of life, here are a few articles that give more details:
Computer lab 5
Computer-lab assignment 05 docx
Computer-lab assignment 05 pdf
Goals
- know when an when not the transitive property of homology applies
- be able to log into your student account on Xanadu
- become familiar with Filezilla and ssh
- Be able to align sequences using Mafft or Muscle
Assignment for Monday
read as much of this [introduction to the unix shell]
Lecture 9 (9/25)
Goals
- Know about the command line and the UNIX operating system
- Have a rough idea of the intricacies of creating multiple sequence alignments
- Appreciate the advantages of a command line interface over a Graphics User Interface
- Know a few commands from the unix command line (cd ls pwd man cat more)
- Know about compute and head-notes on computing clusters
Links
- Intro to alignments, blast, and unix Slides
Assignments for Friday 10/1
- Go through today's Slides
- If you are motivated check out the software carpentry webpage on the commandline. If you have a mac, the terminal application, is a unix shell.
Lecture 8
Goals:
- Know the difference between PAM and Blosum matrices.
- Know what Dayhoff groups of amino acids are.
- How to measure if sequences are significantly similar.
- Understand the difference and similarities between P and E values.
- Know about "usual" cut-offs for Z-scores, P- and E-values.
- Be able to discuss the processes that may lead to the decay of significance
- Know what fishing expeditions are about
- Know what the Bonferroni correction is, and why it is not popular.
- Know what false positives and false negatives are in relation to a databank search
- Be able to discuss the processes that may lead to the decay of significance
Links
- Slides Statistics, sequence alignment, and blast searches
Assignments for Wednesday 9/25
Computer lab 4
Computer-lab assignment 04, pdf
Goals
- Know about bibliography software
- Know about the advantage of the databanks accessible through NCBI's Entrez.
- Be able to perform literature databank searches at Google scholar, SCOPUS and pubmed
- Know how to retrieve full length manuscripts ;-)
- Know that the # of publications, # of citations, and the H-index are frequently used to measure productivity and impact of scientists.
- Know how to access manuscripts similar to one that you know is relevant to you.
- Appreciate that GenBank is highly redundant.
- Know that searches at the protein level are more effective than searches at the nucleotide level.
- Be able to perform blast searches at the NCBI and phagesDB
- Recognize intein containing queries in th the graphical overview of blast searches.
Lecture 7 (9/18)
Goals
- Know about Margaret Dayhoff's contributions to bioinformatics.
- Know about the Entrez system at the NCBI
- Know about the advantages and disadvantages of databanks with or without a gatekeeper.
- Know the difference between PAM and Blosum matrices.
- Know what Dayhoff groups of amino acids are.
- How to measure if sequences are significantly similar.
- Understand the difference and similarities between P and E values.
- Know about "usual" cut-offs for Z-scores, P- and E-values.
- Know what false positives and false negatives are in relation to a databank search
Links
- Slides on Entrez, the origin of GenBank and Margaret Dayhoff; blast searches
Assignments
for Friday's (9/20) Computer Lab
- Read through the file on frequently used formats to depict sequences here
- Explore the Genbank Sample file here
- Read through http://en.wikipedia.org/wiki/FASTA_format
- Refresh your memory on Boolean operators (AND, OR, NOT) to use in advanced database searches. Here is an explanation of the Boolean operators
for Monday's Class 8 (9/23)
Lecture 6 (9/16)
Goals
- Understand the relation between substitutions and sequence divergence
- Know a few reasons why protein sequences work better to assess similarity than nucleotides
- Understand that only slow evolving genes that are under strong selection for function are suitable to trace early events in evolution.
- Appreciate Lamarck's contribution to understanding evolution.
- Understand the contributions that Woese and Fox made to the classification of life, which molecule they used, and the domains (aka Urkingdoms) they discovered
- Understand the power and the limitations of the tree of life image.
- Know about the tree and coral metaphors to depict evolution
- Understand the relationship between the 3 domains, and how the tree of life was rooted.
- Appreciate that the organismal tree is embedded in the tangled tree or network depicting genome evolution.
Links
Assignments for Wednesday class 7 (9/20)
Computer lab 3 (9/13)
Assignment 03.docx , pdf:
Characterizing homing endonuclease and self-splicing domains in inteins using chimeraX / using alphafold to predict protein structures.
Lecture 5 (9/11)
Goals
- Know what inteins are and which enzymatic activities do they have?
- Know the scientific definition of symbiosis
- Know about the possible symbiotic relationships between organisms, genes, or protein domains?
- Know the different phases of the homing cycle.
- now that inteins can be associated with a strong selective disadvantage
- Know that an environmentally heterogenous environment may allow for the long term persistence of parasites (and thus provide an alternative to the homing cycle).
Links
Assignments
for Friday (9/15)
- go through the slides on inteins,
- Bring your Google username and password.
for Monday (9/16)
- Draw a sketch for the relation between the number substitutions that occurred in evolution and the percent identity of the two sequences. (I.e. how does the observed similarity change, as more and more substitutions occur?)
- What are the endpoints (saturation levels) for 4 letter alphabet and for a 20 letter alphabet assuming a perfect alignment that aligns homologous positions.
- How does this relationship change, if some parts of the sequence are so important that the protein becomes non-functional, if a mutation occurs in these positions (i.e., these parts of the sequence are never observed to undergo any change?
- If you were to do a realistic calculation and you were to consider a nucleotide sequence, how long would it take to arrive at less than 25% identity? (tip: how similar are two random sequences that have not been aligned?)
(Note: answering these questions should not require the use of a calculator or a formula, just common sense.)
Lecture 4 (9/9)
Goals
- Know that ATP binding domains can be of very different types, and what this means for our understanding of homology.
- Have a clear understanding of homology versus sequence similarity
- know about Levinthal's paradox
- Understand the problems and limitations faced by attempts to define life
- Know who Lynn Margulis was.
- Understand the (outline of) Gaia hypothesis, and the problems it faces, and how the ITSNTS approach might overcome these.
Links
- Slides on discussion of lab#2, ATP binding sites, convergent evolution.
- Slides on Life, Natural Selection, and Gaia.pptx
Note For those who joined the course recently
Look through the slides linked below, and if they do not make sense, check the recordings on HukyCT. Ask the instructors in some things remain unclear.
Assignments:
-
Contemplate the following:
-- find arguments for an against a virus being considered alive.
-- if being part of group that can be subject to natural selection is a criterion for being alive, why should this not apply to computer life and computer viruses?
-- does the stipulation of being a "chemical"-system restrict this to "life as we know it"?
-- what argues against Traube's cells not being alive?
-
Read through the slides on Life, Natural Selection and Gaia. You can follow the links, if you are in presentation mode.
-
Read through take-home exam #1 - Wednesday is the last chance to discuss this in class before the dues date. Remember to work on the exam on your own.
THIS IS NOT A TEAM BASED LEARNING EXERCISE!
Computer lab 2 (9/8)
Assignment 02.docx , pdf:
Aligning divergent sequences and structures in Chimera
Goals:
- Have an rough understanding of the content of a protein data bank file
- Be able to save individual subunits into distinct pdb files
- Align structures of divergent proteins
- Use the structure based alignment to align the linear sequences
- Align structures of a catalytic subunit during the catalytic cycle
- Appreciate that even 80% sequence divergence (or more) can leave the protein structures very, very similar.
- Appreciate that for important proteins substitutions occur so rarely that proteins remain recognizable similar in structure AND sequence.
Assignments for Monday:
Read through the Slides on the ATP synthase (skip the intein slides), and try to understand how the evolution of ATP subunits (and other ancient duplicated genes) informs us on the early evolution of life.
Lecture 3 (9/4)
Goals:
- The ATPsynthase as rotary motor (Yoshida's experiment, proteolipids)
- The role of gene duplication and sequence divergence in the evolution of proteins;
- Know about the three domains of life (archaea, bacteria, eukaryotes) and how they are related to one another
- Appreciate that molecular evolution can study events that occurred before the last universal common ancestor
- Understand the role of ancient gene duplications in rooting the tree of life.
- Understand that RNA can be both genetic material and catalyst
- Know item that support the RNA world concept, and difficulties faced by the RNA world
Links:
-
Slides on Comp. Lab #1 and Assignments
-
Slides on ATPsynthase, ancient gene duplications and the Tree of Life
-
Slides on Homology (continued)
Assignments for Friday
- Try to wrap your head around the homology concept and its relation to significant similarity.
- Read through the slides on the ATP synthase, and try to understand how the evolution of ATP subunits (and other ancient duplicated genes) informs us on the early evolution of life.
Computer Lab #1 (9/1)
Computer-lab assignment 01 docx; pdf
Intro to Chimera: Binding Pocket - Substrate Interactions
Goals:
- Be able to launch chimera
- Display a 3 D coordinate file from the pdb (1HEW) in chimera
- Use different display settings
- Display amino acid side chains in the binding pocket of 1HEW and study the interactions between the substrate and the binding pocket.
- Calculate a Ramachandran plot, and determine where in this plot alpha helices, beta sheets, and glycine residues fall.
- Save your work as image and project.
Assignments for Wednesday see below
Lecture 2 (8/30)
Goals:
- Understand the concept of homology
- Understand that significant similarity between two primary protein sequences (that are - not of low complexity) is a strong indication that the two sequences evolved from the same ancestral sequence.
- Know how the field of Bioinformatics is commonly "defined"
- Know what terms replication, transcription and translation refer to
- Know about primary, secondary and tertiary structure of proteins
Links:
Assignments for Friday (8/30):
Assignments for Wednesday (9/4)
Contemplate the following questions (see the slides on homology for inspiration):
- Are most proteins with similar function homologous?
- Are all proteins with similar function homologous?
- Are most proteins with significant sequence similarity homologous?
- Do most homologous proteins have significant sequence similarity?
- Do most homologous proteins have similar structure?
Try to answer the following questions:
- Would in your opinion maintaining a database on beetles that contains data on where was the beetle collected, its morphology, and where is it stored in the collection fall under bioinformatics?
Would your assessment change, if partial sequences of the mitochondrial cytochome C oxidase were included for each beetle?
- Would in your opinion determining the 3D structure of a protein using X-ray crystallography fall under bioinformatics?
- How many different proteins with length of 100 aa are theoretically possible?
- At most how many aa substitutions does one need to turn one of these sequence into an another one?
- Formulate a question that you could ask on Wednesday (things you didn't understand, anything you want to hear more details about).
- Read through the slides selected from Mark Gerstein's Bioinformatics Course in the Intro slides class 1
Are there any items where you do not agree with Mark Gerstein's delineation?
Read the excerpt from Thomas Mann's book on Dr. Faustus (Dr Faustus) available on HuskyCT. Or at https://www.fadedpage.com/showbook.php?pid=20180329 (go to chapter III). This chapter can provide two insights:
- Scientific experiments in parlors, salons and living rooms were frequent and common entertainment in the early 1900s.
- The distinction between living systems and the mineral world was not established. Apparently life could be easily created from non-living constituents. My favorite example are Traube's cells. In the past I did the experiment in class, but now you have to watch the you tube version instead: How to grow an artificial cell from water and salts ("Traube Cell" experiment).
- The membranes that form were the starting point to build the first osmometer and an important step in the development of cell theory. They clearly are not alive, but they grow and do look a lot like red algae.
Ask a question (not limited to Dr. Faustus) on the huskyCT discussion board
Lecture 1 (8/26)
Goals:
- Know how to contact the instructor and TA.
- Know how your performance will be assessed and graded.
- Know that take-home exams and computer lab assignments are an important part of this course, and that they will be graded.
- Know that you need to maintain an electronic notebook
- Be able to define/circumscribe the field of Bioinformatics
Links:
Assignments for Wednesday (8/30):
- Study the Syllabus! Ask questions, if expectations are not clear.
- Consider if you want to participate in the CURE (course based undergraduate research experience) project.
- Read through the [Slides on Homology] Note: "Read through" is short for read it, but don't overdo the studying. (https://j.p.gogarten.uconn.edu/mcb3421_2024/class01_2024_homology.pptx)
- Make yourself familiar with the OneNote electronic notebook and write an entry for "lecture 1" (or set up your notebook in Joplin).
Assignments for Friday (8/30)