Background:

Problems in finding Open Reading Frames (ORFs) and Coding Sequences (cds) provide a nice example for a failing first principle approach:

   In higher Eukaryotes the coding sequence is often interrupted by introns. Genes are transcribed into RNA. With the help of so-called spliceosomes the introns are removed from the RNA and the exon portions are religated. In Arabidopsis the splice site consensus is as follows (from www.arabidopsis.org):

 

5' consensus
                          --- intron --->
A   281  338  588   97  |   11   21  635  545  201  226  307
C   230  347  128   45  |    4    8   47  136   87  147  173
G   174  158   98  740  |  933   29  107   52  474  116  112
T   285  127  156   88  |   22  912  181  237  208  481  378

               A    G       G    T    A    A


3' consensus
                      <--- intron ---
A   174  183  172  291   77  912   27  |  235  249  274  272
C   123  117  107   61  600   14    4  |   90  142  167  182
G   194  161  101  360   29   27  931  |  529  174  215  270
T   479  509  590  258  264   17    8  |  116  405  314  246
          T    T         C    A    G       G

This diagram gives the frequency with which nucleotides (ACGT) are found in positions surrounding the splice site (indicated by | ). In case a position is occupied by a particular nucleotide in more than 50%, the entry is colored in red.

Given the many introns known in Arabidopsis, and the fact that the spliceosomal RNAs have been sequenced, one might expect that given a sequence it would be possible to recognize with high reliability which parts of a sequence are coding. The following exercises will demonstrate that this is not the case.

 

Gene Identification Exercise part A (Prokaryotic genomic DNA)

(Part B is further down the page)

For this sequence from a prokaryote (the archaeon Thermoplasma acidophilum), identify possible Open Reading Frames using ORF-finder at the NCBI at
http://www.ncbi.nlm.nih.gov/gorf/gorf.html 

Use the link to BLAST on the top of the page, and do searches for at least two open reading frames - check off the parameter box on the format page before requesting the results, check off the graphical overview button. 

1) Based on your and your neighbors analyses, which ORF are likely to form an operon?  Which is the coding strand? Is it the same for all ORFs? 

2) select longest ORF on frame +2 and do a blast search using the link in the header.   Check the graphical overview box on the format page. Do you notice anything strange? 

>Thermoplasma acidophilum genomic DNA
aactacagta gatgctcttt tagctattgc ggcttcgatc tcaatcgccg gcggccttat aggtaccggt atggcacagc agggaatagg agccgctggt atgggtataa tcgcggagaa acctgagaag ttcggccagg ttctgttctt ctttgttata cctgagacgc tctgggtcat aggtcttgct ctgggtatca tactgctgct ccatataatc tgatctctat gtccctcgaa gaagtgctga aggatatcga acgcgacaag gaagaaaaga agaaagagat agcagatgct gcgtccaggg agacggcaaa gatagagaag gagagggaag aaaagatcca gattctgcag agggaatacg agaacaggat gagggaagag ggcagcaggc tgtacaattc cataatcgac aaggcgaatg tggaggccag gaacatcgtg aggatgaggg ttcaggagat cctggaccag tatggcgcaa aggccgatga actgataaag aatctggcca aaacaaagga atacgatgat gtcctgaaga agatgatcga ggtatccaga aaggccctcg ggccagactg cattgtgaag gtgaacacag ccgataaggg ccgcatctct gatggaaaca taaagttcga ggatatagat ccgtatgggg gcgtactggc aacctccaga gacggaaaga tagaactcga tctgagaata tcaagcataa ggcgcgatat tcttgaacgg ttcaaggtgc ggctttattc aatgatagag gattgagatg gattcaactt acatagggtc ctatggcagg ctccgtgttt atcagacaga gtttttcagc aggcagcaaa tagatcagat gctgtcaatg actgatccaa aggatgtgtc tgcctttctc tacaacggtc cttacaggga agattatgac agcctgtccg cggtcttcaa ggatcctgat ctgaccgaga tggcaataaa caggcacatg gtcaggaaca atcgcctggt gctttttgcc atcccgcctc tggcgaagaa tgccgtggtg gcttatctca gcaaatggga tatagagaat atcaagaccg tcatatccgc aaaattcctg gggcacggga taagggaagc tgagcctttc cttgtgagct ttcgcgacat accgcttggt ataatgagcg gaacactcac caacgaggat tacaggaaca tgataaatct gcccaacata gaggctatac tcaattacct tgcaaggttc ggatacggta cgtacatgat gcagttcctg gaagattaca ggaagaccgg tgatatatca ccgatgctct attcgctgga tcgctattac tacatgaatc tgctgtcggc cctgaagtat tacaggggcg acgaggcgcc tgttctcaat tatgtgcgat cggacataga tcgccagaac atagtgacta tgctgaaggg aaaggtgctt aagataccgt ttgaaaggat gtcctcaggc ataatcgatg ggggtaacat aggcgttaac cggttgcggg agatctactc atcccaggac gccgtttccg ttgcggatgc tctgaagcag tactacgatc tcgaagagcc aaagaagaaa tacatggaaa caggcgatct ctaccatttt gatatagcga tccgcaacat aattgccaga aggttccttg acaccatgtc catgctgcct ctgtcgctgg acagcctgtt ctacttcata ctgaggagcg agattgaaag gcacaaccta aggacaatat atttgtccaa ggtgaacggc atgccaaggg agatcacgga aagcctgctg ataacggaga tgatgtgatt ggagagctgc ataactgtca taggcgaaag ggacgttgtc ctcggattca ggctcctcgg tattcagcac accataatag ctgagggcaa ggatcttctg aagaagtttc tggaggtctt tcagaaccct cagtgcaata taataatcgt ttccgaaaat gtgaagaaca tgatggataa aaggacgctg agaagcgtgg agatctcgtc aaagccgctg gtagtcttca taccccttcc aggcgtaagg gaggaggaga ccatagagga gatggcgaag aggatcctcg gtattgatat tggaaatgtt tgaggtgaat caatgggaaa gataatcaga atttcaggtc cagtagtcgt ggctgaagat gttgaagacg ccaagatgta cgatgttgtc aaggtcggag agatgggcct catcggtgag ataataaaga ttgaggggaa cagatcgacc atacaggtct atgaggatac tgcaggcata aggcctgacg aaaaggttga gaacaccagg aggccgctgt cggtggagct cggcccaggc atactcaaat cgatatacga tggaatacag aggccactgg atgtgatcaa gatcacttct ggagatttca tagctcgcgg tctgaaccca cccgcacttg acaggcagaa gaagtgggag tttgttcccg ctgtaaaaaa aggagagacg gtctttcctg gccagatact cggtaccgtg caggaaacct cgctgataac ccacaggata atggttcccg agggtatttc aggaaaggtg acgatgatcg ccgatgggga gcacagggtt gaggatgtga tagcgacggt atcaggaaat ggcaagagct acgatattca gatgatgaca acgtggcccg tcaggaaggc gaggagggtg cagaggaaac tgcctccaga gatcccgctg gtaacgggac agagggtaat agatgcgctt ttccccgtgg cgaagggcgg aactgccgcc gtacccgggc cattcggaag tggaaaatgt gtgtctggcg atacaccggt acttctggat gccggcgaga ggaggatagg cgacctgttc atggaggcca ttcaggacca aaagaacgcg gtcgaaatag gccagaacga agagatagtc cggctccatg atccgctgcg catatattcc atggtcggtt ctgaaatagt cgaaagcgtc tctcacgcca tatatcacgg aaagagcaat gccattgtaa ccgttaggac ggagaatgga agagaggtca gggtgacacc tgtccacaaa ctctttgtta aaattggaaa ctctgtaatc gagaggccag cctcagaggt gaatgagggc gatgaaatag catgcgcaag cgtaagtgag aacggtgatt cccaaaccgt caccacaacg ctggtattga cattcgatag agtggtatca aaggaaatgc atagcggcgt attcgatgtc tacgatctga tggttccgga ttatggatac aacttcatag gcggaaatgg cctcatagtc cttcacaaca ccgtgataca acaccagctg gcaaaatgga gcgatgcaaa catagttgtt tacataggct gtggcgagcg cggaaatgag atgactgaaa tactcaccac cttcccggag ctgaaagatc ctaacacggg ccagccgctg atggacagga ctgtccttat agccaacact tctaatatgc ccgtggcagc aagagaggcg agcatataca caggtataac gatagcggag tactacaggg acatgggata cgacgttgcc ctgatggcag acagcacatc acgctgggcg gaggcactca gggagatctc aggcaggctg gaggagatgc cgggagaaga gggatatcct gcctatctgg gtagaagggt ttcagaattc tacgagagat ccggaagggc gaggctcgta tcgccggatg agaggtacgg atcaataacg gttatcggtg ctgtatcacc gccgggagga gacatatccg agccggtatc gcagaacacc ctgcgtgtaa caagggtatt ctgggctctg gatgccgccc tggccaacag gaggcatttt ccatcgataa actggctcaa cagctattcg ctttacaccg aggatctgag atcctggtac gataagaacg tatcatccga atggtctgct ctaagggaaa gagcgatgga aatactgcag cgggagagcg agctccagga ggtcgcacag ctcgttggat acgatgccat gcctgaaaaa gagaaatcaa tactggacgt tgccaggata ataagggagg acttcctgca gcagagcgcg ttcgacgaga tcgatgctta ctgctccctg aaaaagcagt acctcatgct gaaggcaata atggagatcg atacctatca gaacaaggcg ctcgactccg gcgcaacaat ggataacctg gcttctcttg cagttaggga gaaactctcg aggatgaaga tagtgccaga ggcgcaggta gaatcctatt acaatgatct tgttgaggag atccacaagg agtatggaaa tttcattggt gagaaaaatg ccgaagctag cctataaatc tgtttcacaa ataagtggcc cactgctctt cgttgagaac gtgccaaatg ccgcttacaa cgagatggtt gacatcgaac ttgagaacgg ggaaaccagg caggggcagg ttctggacac caggaagggc ctcgccatag tgcagatatt cggtgcaaca accggtatag gcactcaggg aaccactgtt aaattcaggg gagagaccgc caggcttcct atatctgagg acatgctggg cagggtattc aatggcattg gcgagcccat agacggtggc cctgagataa tagcaaagga gagaatggag atcaccagca acgccataaa cccttattca agggaggaac cttccgaatt catagaaacc ggaatttcgg caatagacgg aatgaatacg cttgttaggg gccagaagct gcccatattc tccggttccg ggctgccgca caaccagctt gccgctcaga tagcaaggca ggcaaaggtt ctggattcct cagagaattt cgcggttgtc ttcggtgcaa tgggcataac gagcgaggag gctaattatt tcacgaacca gttcagggaa actggtgcgc tatcaagatc ggtcatgttc cttaacctct cttcggatcc gtccatggag aggatcatcc tgcccaggat agcactcaca actgcagagt acctggcatt ccagaagggc atgcacatac tcgtaatatt gacggatatg acgaactact gtgaggccct tcgtgagata tctgccgcca gggaggaggt tccgggaaga aggggctacc caggatacat gtacacggat ctgagcacca tatacgagag ggcaggaaag ctgaagggaa acaatggatc cataacgcag atccccatac tcaccatgcc aggcgacgat ataacgcatc ccgtgccgga tctcacaggc tacataaccg agggccagat agtgatttca agagatctca acagaaagga catgtatcca ggcatagacg tgctcctctc cctctcaagg ctgatgaacc agggcatagg gaagggaagg acaagggagg atcatagggg cctggcggat cagctttacg ctgcatacgc ttcaggaaag gatctgagat cactgactgc aatcgttggt gaggaggccc tcagccagaa cgacagaaag tatcttcact ttgcagacac ctttgagtca aggtacatca agcaggggtt cttcgaggat cgctcaatag aggatacgct tggcctggga tgggatcttc ttgctgatct tcctgttcag gacatgaaga gggttaagcc tgatcacatc cagaagtatg gcagatggaa gaaggagtga acatggacat acgaccgaca aggatagaac tgatacgcac caggaggaga ataaggcttg caaagaaggg ccttgacctt ctgaagatga agaggtccgc ccttatatac gaattcctgc agataagcag aaccataagg ggcatgaagg agaacctcag aaaggaggtt gttgaagccc tgaacatcat aaaggtggcc agcgtcctgg aggggtccct tgcactggag cgcatagcga acatgtcaag cgattccagg ataaatgtca actccagaaa tgtcatgggc gtaaatatac ccacccttga ggtctcatac aacctgtcca tattatcgga cgtttaccgt acagtgtctg tcccggttgc catagatgat tccatacgca ggtttcagaa gctgttctac gatctcattc tgatagtgga aaaggagaac tctctgcgca acctgctgat ggagatagac agaacaaaga gaaggagcaa cgctatagag aatatactga tacccaggct tgagtatcag gcgaagatga taaagatgac cctggatgag agggagaggg ataccttcac cacgcttaaa accataaaga agaagataga ggctgagaat gattagaaac ccgtggtttg atatcggaac gaggaagtac gtgaagaatg tcgatataac cagggcgaag gatccgaagc tgatcaggaa gttcataatc ataaggaacc tgatcatgct gttcaatgtg gctgttgcag cgctaatact ggtgctggtt tggagctgat gcgtatggat gatgttgaga ctatcaggat tatcaaggaa aaggaaacaa gtgcagatga ggagatcaat cagttcaagg aggaacagga aaagatcata aaagaagcca gggagaagga agcgcttgat ttggaaaaga ccgaggatga actgaaatcc agatatcagg agtatctgga atcgagaaga aaggaggctg aggagaaagc atcggaaata atagataatg caaagcaaag ggcatctgcg ataaatcttg acataaagga gaaggatctg cagaagatgg tcctggaaat aataatgaag tatctagagg agtaatatgc tgagaccagt taagatggag aagatcagga tcatagcccc gtattcctac agggatcctg tcatatccgc ccttcatgac ctgggcgtca tgcagataga ggagatgagg gaagatgttg acaggcttct gtctcctgcc aaagcttcgg aacaggcaaa aaccgttatg gattacctgc agaagttccg aggatacgag aacatacttc cgaagaggcc agtgagaaca agagccaaat tcacctctct tgcagatatc ctcaacgagg catccaagat aaacatagac gatgatatac gcatagctgt gaacagggaa aacgacattg cagcagccat gaaggatatc gatagcaggc tttctgcgct tgaatacatg aagggatatg atttcgacgt atccatattc aacgggaagc acttcgagtc ttacataata cctgataaga atgtggatat caaggcgttc tccagcctga acgcagaaat tgtgccgctg aagaatgcat tcataataac cgtggctgag gacagaacac aggatctcag caggatcgcc aattcgattg gagcaaggct cattcacatt ccagatctca agggaaagcc tgatgatgta atagctatgc tcaatgacga aagggcaaag ctggatcagg caatgcagga gataagaaag caccttggcg atctttccga taaatattat gagaagatag cccagatcag ggaagccctg gagatcgagg caaagaagat agatgtggag gataaattaa aaggaactga gtacacattt gccgtggagg ggtggatacc atcagattcg ttcggcagag tgagcgatgc catcaacaga gttactggga acagctgcat aataagcaca gtgaagacca acgagatgcc gccaaccctg ctcagaaatc ccaggaggat ctcgcttttc gaattcttca tcaaattcta ttcgcttcca gagggtacgg agtatgatcc tacgctcata ttcgcactgg tctttcccgt attcttcggg ttgatggtcg gtgattgggg ctacgggctg gccatcctgc tgatctctct tttcataata caccgcgttg atcatccacc ggcaaagagt cacataccca gagtcataag cagatttgtt ctgatgataa tgtcgccgca atccctgaag acgctggcaa aggccctgat tccgtcatcc atagtagcaa tcatagctgg cttacttttc aatgaattct tcggattcgc tatcctgcca ttcaccgttt tccatgtgta cgcggttctt ccgaagctga tgctgatcgc cggatacata ggccttggca tggtggtatt cggcttcata ctcggattca ttgaggattt gtggatgaag gatgtcaagg gagccatgga tagactcgga tggcttttct ttgcggttgg aatcgcaacc atagggctta acctgataca ccacgatctg acgttcagcg taagtaccgg gatatcgaat ctgattgcag ttgcactgat agttatcggc atacctctga tagccatcaa ggagaagtcg cagggattca tagagatgcc ttccataata agccacatac tctcatatct caggcttgtg ggaatactga tagccagcgt cgtcatcgct gagataatag acctggtatt catgaagagc atagtttcgc attccatcgg gcttgccatc gccggtgttg tcatactgat attcgggcag atgttcaact taatacttgc agtattcgag ccaggaatac agggagcaag gctgatatac gtggagttct tctcaaagtt ctaccacgga aacggaagaa tgttcaggcc attcaggagc cagagaaaat acaccgagga tggcctcgat tttgataagg ctagataaac gtttaagcat ggaaacgata caaagatgat gataccttct ttttcccctt gtgaaggcga tgtgaaatga acattggagt tcttggcttt cagggagatg tgcaggaaca catggatatg ctgaaaaaat tatccagaaa gaacagagac cttacattaa cccacgtaaa aagggttatc gatctggaac acgtagatgc gctcataata cctggaggag aaagtacgac tatatacaag cttactctgg aatacggcct ttacgacgcc atagtgaaga gatctgccga aggtatgccg attatggcca catgcgccgg cctgatactc gtatcgaaga atacaaatga tgaaagggtc agaggtatgg gcctactgga tgtgaccata agaaggaatg cctatggaag acaggtcatg tccttcgaaa cggacataga aataaatgga atcggcatgt ttccggccgt attcataagg gctccggtaa tagaggattc tggaaaaacc gaggttcttg gtacgctgga tggaaagccc gttatcgtca aacaggggaa tgtgataggg atgacatttc atccagagct caccggcgat acaaggctgc atgaatactt cataaacatg gtgaggggga gaggggggta catttccact gcagatgtga aaaggtgatg gtatgaggac tgtactatat gatgagcatg caaaactgaa cgcaaagttc accgaattca atggatggga tatgcccctt tactacagga gcataatcga agagcatatg gccgtcagga agcatgttgg catatttgat gtatcccata tgggcgacgt gacggtaagc ggaaaggatg cttcggcctt ccttgaccac atgtttccaa cgaaggtaag caatctgaag aatggagaat gcgtttacac agccttcctg aacgacagcg ggctgatgat agatgacacg atagtttata ggatgggcga agattcgtac ttcttcgttc caaatgcggg gacaacggaa aagatataca gatgggtgtc tgatcactcc gcaggataca gcgtaaagat agagaacgta tctaacagga tatcaagcat agcccttcag ggccctgaat ctgaagaagt gctgaatgaa cttggatttt catatcctgg atacttcaag tttcaatacg tttcaggaaa gtacatgaat gcaataacag gtaaagatca aattattata tcaggaacag gttacaccgg tgagaaagga gtagaattca taataccgaa cgaacacgct gttgaactct ggaagaaact gctggaagcg ataaacaaaa gaaatgggct tccggctggc ctcggtgcca gagataccct tcgaatggaa aagggtatgc tgctctcagg ccatgatttc aatgaagaca gagatccata tgaagcttca gtatcattca tcgtcaacaa cgatgaagat tttgtaggaa agaaaaatct tgagatcaga agaagatctg atcatgagat attcagggga ttcgtgcttt ctgacgggat tccaagaaat ggcaatccaa taaaagcagg cgggaagagg gttggaaccg tcaccagcgg aacaatttct ccagtactca ataagggcat agctcttgga tacatagata aagcgtattc aaaagaaaat acggaagtta tgatagagat aagatccgta gatcacaagg ctgtcgttac aaagcctagg attgtgaaat gatgttgcag tggaaataca cccctccccc tctatattaa ttttttattt aatataatta attcatataa tctgcttcat tatctcgcct atggccacct taacggcctg atcgctgtta tatctgttct tccaccccgt tgccttgagc ttctttatat ccattcttgc gtatttcaca tcgcctggcc agcctctgcc catgtaccct cctttgcgca ctatccttgt atccctgagc cccatggcct ctatgacgta ctttgctatc gtatccacgt ttgttacatc gtcatttcca aggttgaaca cttccgttcc gctgatcctg tcatagatgt agaacatgct tccaacgcag tccgtgacgt gcatgtacga tttcgcctgc gttccatccc cgagtacttc cagttctttg ctgttctttt ttagcttgtt tataaaatcg aatataacgc catgcgtgga attctttccc actatgtttg cgaacctgaa gatcttggcg ttgattccat aatagtgcga gtatgatgat atgaaagcct cagccgagag tttggaggcg ccgtatgagg atattgggag gaggggcccg tagtcttcag gcgtgggcat aacctttgcc tctccgtata tcgttgaact ggaggcgaac agtatgtctt tgacatcttt cttcctcatc atctcaagca cgttgacagt ccctatgacg tttgatctca gatctatcgt cggatccacg gatccgttcc tcacatcgga atcagcagcg agatggacga caagatcgta atctccaggc gtaacggatt ccgttatgtc ttcctttatg aacctgaagt tcttttttcc catgaacggt tttatgtatc tgtcatccat tatgctcagg ttatctataa ccgtgacatc gttgtcctca agaagcattt ccaccatatt cgatcctatg aaacctgcgc ccccagttat catcacatgt tttccattca tatactggta atatttgatg taataatagt tgttctcatg gccgatctgg gtttttatcc ttcagtcatt gttaaatgaa gcatgaaaaa tatctatgta aaaagtatta aaaaactacg taatttcagg tagatgcgca aataattttt acggccacgc catgcttacc ttcatgtcac agaagatcca ttgcataata tgcggatcta taatttattc tggattatac tgttccgatt gcctttcaga gatccagaga tcccgcacca ttgatgacga atcattcgaa gattggctcg cccgaacaag ggaatcttca aatgtcaaac cagacgataa ggaatgcatg   

3) Glimmer is a program that aims to find real ORFs based on compositional analyses. A web version is here. Copy /paste the Thermoplasma acidophilum genomic DNA sequence fragment (T_acido.fa) into the sequence window, select linear and bacterial/archaeal. Did Glimmer identify the ORFs as most probable that you identified as being part of the ATPase operon?

3B) As an aside, Glimmer and many similar programs do really well in identifying real ORF (98% of the real ORF are identified). What other parameter would you like to know in order to judge the overall success rate of the program?

 

PART B

The sequence following below is from the Arabidopsis thaliana genome. 

1) Use Genescan at

http://genes.mit.edu/GENSCAN.html

or at

http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.cgi

to predict exons and introns encoded on this piece of genomic DNA.

Safe the predicted peptide, and inspect the graphic output. 

2) Use GENEMACHINE, enter your email address and the sequence below and have the results send to your email account.
(The turnaround for the genemachine can be a couple of hours. If you are in a hurry, you can find the response here, right click and save to your computer.)

3) After saving the results onto your PC, open SEQUIN (this is a program provided by the NCBI to submit data to the databank using the .asn-format, the output of the genemachine is written in the same format, and you should be able to open the results in SEQUIN as an existing record, say that you do not want to submit it to an existing database.)

4) In SEQUIN explore the different format options to look at the genemachine results (especially graphics and old alignment).

5) From the result, does it appear that the genescan program worked correctly? 

The following exercise is optional. The workbench is very useful, especially if you work in an environment where you don't have access to commercial software packages.

6) Go to the Biologist's workbench at http://workbench.sdsc.edu/, and set up an account in your name. This is a great place for typical sequence manipulation, it is free, and you can access your sequences and analyses wherever you have access to a browser. Import the Arabidopsis genomic sequence given below, and translate all three open reading frames.  The nicest output for browsing is generated by the restriction map program (named TACG in NUCLEIC ACID TOOLS), select all three forward reading frames. 
You also can calculate an optimal pairwise alignment between gi:2266990 (protein) and the predicted sequence, or the genomic DNA and the cDNA (gi:2266989)
If you run out of time, you can look at some of the expected results here.

Please explore some of the other tools offered on this site. You might need them for your future work.

>51028.t00050, 
Chromosome 1, pre-processing
AGTTTTTGAATCTCTGATTGCTGAGAAAATGCCGGCGTTTTACGGAGGAAAGCTTACGAC
CTTCGAGGACGATGAGAAGGAGAGCGAGTATGGTTACGTTCGTAAGGTATTATCCTGTTT
CGTTCGATCTGGTTTCAATTTGTTTTTTTTTCTGTTTTGCGATGTTAGTTTTTGGTGATG
GATAGATGAAATAGTTGATCTGCTTACCAGGTAAGATTGGTGGGATAGCTAGATTTGATC
TGATAGTTATCAGTGATTGAATCGGTTGATTCCGCGTTGGTTCAGTAACCTCGTCTTTGA
ATTTCTGATCTGATCTGATAGTTCTCAGTGATTTGACATTTTCTTTTATGGGATGCAGGT
TTCAGGTCCTGTTGTTGTTGCCGATGGTATGGCCGGTGCTGCTATGTATGAGTTAGTGCG
TGTTGGTCATGATAATTTGATTGGTGAAATCATCCGTCTTGAAGGAGATTCTGCCACCAT
CCAAGGTTTGTTTCTTCTATTGTGCTTCCTAGTGTAATTTACTTTACGGCATCTTATGTG
ACCTCTGTCGAAGTAAGATATCTTAACTGATATTTGGCAACTTCCTTTTGATCAGTTTAC
GAGGAAACAGCTGGATTGACAGTTAATGATCCCGTTCTTCGAACACACAAGGTTCGCGAG
TTATTTATCTTGGTTTTTTCTAGTGTTGTTCATCTGCAGCTAACATATAATTTGTCCTGA
ATTTACTACAGCCACTTTCTGTGGAGCTCGGGCCAGGAATATTGGGAAATATCTTTGATG
GAATTCAGGTTCAGTTGGATTTATAATCTTGCTAGACATGATTTTTTTTACTTTTATGAT
TCGTTTTATGTGGCTTCTTACGATTCTTTGGTTTCATTTCTTTAAATGTCACAGAGGCCT
TTGAAGACTATTGCAAGAATATCCGGTGATGTGTACATTCCTCGGGGTGTGTCTGTTCCA
GCTCTTGACAAAGATTGTCTTTGGGAGTTCCAGCCCAATAAATTTGGTAATGTGGTTTAC
TCCATATGCCTGTCTATGGAAGTGTTCATTTGGTTTTAATCTTGATGGTCAATTGAATTC
GTTTTGTTTGCAGTCGAGGGAGACACAATAACTGGTGGTGACTTGTATGCTGTAAGTTTA
TTGGTCTCCTCTTTAATCTGCTTTTGACAAGGGAATCTATTTACACAGTTACCGTGGTGT
TTCCCTTGTTTACACTGGGAATAGTTTTTTCTGAAAGTCAAATTAAACTTTGGAATGCAG
ACTGTCTTTGAGAACACTTTGATGAATCACCTCGTTGCCCTTCCTCCGGATGCCATGGGG
AAGATCACTTACATTGCTCCAGCTGGTCAATATTCGCTTAAGGTTTGACTTTAAGTTTCC
CTCAAACAGTTATGAATAAATACGTTTCAAACTTTTTCTTCCTTGATTTCTTTGAATTCA
ACGTTTGAGTTAATATATGGCTAACTTGATCAATTGGTAATCACTTCCTGTTGTAGATCA
TGTTTGGCTTGTTGCTAATAATTGTTTGTCGGTGATTTTCATTTCTCAGGATACCGTGAT
AGAGCTTGAATTCCAGGGGATCAAGAAATCTTACACCATGCTTCAGGTTTGCATGTATCT
TTAATCTTCCTACTTGCAAACGTAAATTTTAAGCTATTTGGTTCACTCTGTTAAATTGGT
TTGGTTGATATATGTCAGAGCTGGCCTGTACGTACGCCTAGGCCAGTTGCATCAAAGCTT
GCTGCCGATACTCCTCTACTTACGGGGCAGGTGATTACTCGATTAATTCTTCTTACAGTG
GTGATAGTCATTTGAATACATGTGTTGCTGATTGCTTTCTTTTCCTGTTGTCAGCGTGTT
CTTGATGCCCTTTTCCCTTCTGTTCTTGGTGGAACCTGTGCCATTCCTGGTGCTTTTGGC
TGTGGGAAAACTGTTATCAGTCAGGCACTTTCCAAGGTACCTTGTGACACTCTCTGGTTT
TGTTCCATTTAATTACTGGATAGATTGAATTTCCAAAGCTAACTTTTTCTTATTTACATA
GTACTCCAACTCTGATGCTGTTGTGTATGTTGGTTGTGGAGAGAGAGGAAATGAAATGGC
TGAGGTATATCTCTTCTCATTCTAAATTTGCATATTGTTCATACAAATCGGACATTTGAT
CTGATTGTTTCTCATAAATTAGGTTCTTATGGACTTCCCACAATTGACAATGACGTTGCC
TGATGGCCGTGAGGAATCTGTCATGAAACGTACCACACTTGTTGCTAACACCTCTAACAT
GCCTGTGGCTGCTCGTGAAGCCTCAATTTACACAGGTAATGTTCAGGCACACAGATTTAA
TAGTTATTGATGAATCCCATTGCCTATGCTCATTTTTTTTTTTTTTTTTTAATGTGAATT
CCAGGAATCACAATCGCTGAATATTTTAGAGATATGGGCTACAATGTTAGTATGATGGCA
GACTCAACTTCCCGTTGGGCAGAAGCATTAAGAGAAATTTCAGGACGGCTGGTAATCTTA
TGCGTTTCACTTTTGCTATATGGATGTTCGTGTTGTCCTCATCTCACTTTTCTTTTTCTC
AGTTTATTGACACCTATTTTGCTTTGTTTTATAGGCTGAAATGCCTGCTGACAGTGGATA
TCCAGCCTATCTAGCAGCACGTTTAGCATCTTTCTATGAACGTGCTGGTAAAGTAAAATG
TCTTGGTGGACCAGAACGTAACGGAAGTGTTACAATTGTTGGTGCAGTTTCGCCTCCTGG
AGGAGACTTTTCAGATCCTGTGACTTCAGCAACCCTTAGTATTGTGCAGGTGATTATTTG
GTTCATGTCTGCTTCCCTATCTTCCATTGTAGATTACATAGTCGTATATGTTGGTTGAGA
TGAACCAGATGGTGTTTAGTTTTAGATCTGCCGCAGACTCGTATATTTAAGCATTTTTTT
TCTCCACTTTGAAATGCTTACTCTTCCATTCTGGTTGTTTCTCTTTTCTTCTGCAGGTCT
TCTGGGGTTTGGACAAAAAGCTTGCCCAGAGAAAACATTTTCCCTCTGTTAATTGGTTGA
TTTCTTACTCAAAGTATTCAACGGTATGCTTAAATATTCTCGGTTCAAACTTGTCTTGGT
TTACTATCTAGAAATCTTGTATATAAAACGCTGCTTTTTGTTTTAGGCACTGGAATCTTT
CTATGAGAAGTTCGATCCAGATTTCATCAACATCAGGACAAAGGCCAGAGAGGTGTTGCA
GAGGGAAGACGATCTTAATGAAATTGTCCAGGTATGTATCACTTATCCTTGTATAAGTAT
CTATTGTGGTGACCAATGAACTCTTGTCTCAGCAACCCTAATACATTTTGAAGGGGTTGA
ACGATAATCTTGGCATGTAAACTTGACTTGAGTTATAGAAGGAAAACAGTGCTAGCACGT
TATTCTTTTCGAAAGGAACTTATTTGACCCACACATTGCTTTTTGTGTGCAGCTTGTAGG
AAAAGATGCGCTAGCAGAAGGGGACAAAATCACATTGGAAACAGCTAAGCTATTGAGGGA
AGATTACCTTGCTCAAAACGCGTTTACACCGTAAGATTTGTTGGCTCCCTTCGTTTTGGT
TTAGTACTCTCTCTTTCTCTCTCAACGGGTTATTCACTCTTGAACCTTTTGGATGAATTT
TTTGACAGATATGACAAATTCTGTCCTTTCTACAAGTCCGTGTGGATGATGCGTAACATT
ATCCATTTCTACAACCTAGCCAACCAGGTAAATAAGATGAGATTTATACATACTATGCTA
AGTGGGGATTAAGGTCAATTGGTTTGTCTAGGTAAAAACCCATTAATTGTTTTGGATACA
CAGGCGGTTGAGAGAGCAGCTGGAATGGACGGTCAAAAGATTACCTACACTCTTATCAAG
CATCGCTTAGGAGATCTTTTCTACCGTTTAGTGTAAGCAAACGACTTGCTTCTCCTCGAT
TTCTCTATGACTCTGTTACATAGCGCTCTAATAAAATGGTCTGAAACGGAATTATGGGAA
CTACAGGTCTCAGAAGTTCGAAGACCCAGCAGAAGGGGAGGATACACTGGTGGAAAAATT
CAAGAAATTGTACGACGATCTCAATGCTGGATTCCGTGCTTTGGAAGATGAAACTCGGTA
AGCTGTCGAGTCTCCACCGCAAGTAAAAAAAATCCACAGAATTGGGTTGTTTTTGGAGAA
AGAGGGTTTCATTCATGGTCTCTTTCTTGTGTTTTTGAACCAACAACTATCATAGTGGTC
GGTATTTTATTTATCGGTTTGGTCGATCGATTGAGTTTTAGCTCTGTGAGCGTCATGATT
CTCCGGCTGTGCTGTGCTGTGTAATATGTTTGATTCGTTGTTTTCATGTTTTTATTTCGG
TGGTAATAAGGTACAGCCAATGTGAGTCATATATTTGATTTGATGTACCCTCTCAATTCA
ATAAGTTAATTTTATGTCCAAAAACATATTGGGGATACCGTTATTTTTCTCATAATAAAT
ACCATCATTTT