The world’s Largest Sharp Brain Virtual Experts Marketplace Just a click Away
Levels Tought:
Elementary,Middle School,High School,College,University,PHD
| Teaching Since: | Apr 2017 |
| Last Sign in: | 103 Weeks Ago, 3 Days Ago |
| Questions Answered: | 4870 |
| Tutorials Posted: | 4863 |
MBA IT, Mater in Science and Technology
Devry
Jul-1996 - Jul-2000
Professor
Devry University
Mar-2010 - Oct-2016
In this assignment you read an input file containing named sequences of nucleotides and produce information about them. For each nucleotide sequence, your program counts the occurrences of each of the four nucleotides (A, C, G, and T). The program also computes the mass percentage occupied by each nucleotide type, rounded to one digit past the decimal point. Next the program reports the codons (trios of nucleotides) present in each sequence and predicts whether or not the sequence is a protein-coding gene. For us, a protein-coding gene is a string that matches all of the following constraints*: � begins with a valid start codon (ATG) � ends with a valid stop codon (one of the following: TAA, TAG, or TGA) � contains at least 5 total codons (including its initial start codon and final stop codon) � Cytosine (C) and Guanine (G) combined account for at least 30% of its total mass (*These are approximations for our assignment, not exact constraints used in computational biology to identify proteins.) The DNA input data consists of line pairs. The first line has the name of the nucleotide sequence, and the second is the nucleotide sequence itself. Each character in a sequence of nucleotides will be A, C, G, T, or a dash character, "-". The nucleotides in the input can be either upper or lowercase. Input file dna.txt (partial): cure for cancer protein ATGCCACTATGGTAG captain picard hair growth protein