HOMEWORK ASSIGNMENT #6:

  1. Read chapter 5
  2. Write a script that reads in a nucleotide sequence from a file in Genbank format, and puts out a file in FASTA format. Implement an informative annotation line in the FASTA formated file.
  3. Improve your count bases in genome program
  4. Add a counter of nucleotide excesses (A over T, or G over C, or keto over amino base excess ((G+T)-(A+C))).  Print the cumulative excess into a table and  plot the result with gnuplot. 
  5. What does the result mean?  Which of the above measures (any others you could try?) shows most bias? 

Extra challenge: 
Does the same work for dinucleotide bias?  How about larger oligonucleotides?  Try to implement the former, and, if you have energy to spare, write some "pseudocode" for the latter (oligonucleotides).