BioPython Homework/Project

In this homework you are to download the Mycoplasma genitalium G37 complete genome in genbank format.  Go to NCBI’s nucleotides database, download L43967.2 and rename as mycoplasma.gb.   Move the file to the directory that you write python programs and open up spyder.  Create a .py file called myco.py.    Now write a single program that will do all the following

  1. import SeqIO
  2. read in the mycoplasma.gb file into a record variable of your choice
  3. print the records id
  4. print the records description
  5. print the number of features in the record
  6. loop thru the features counting the number of tRNA, gene, and CDS features
  7. print out the number of tRNA’s
  8. print out the number genes
  9. print out the number of CDS features
  10. print out the fourth feature
  11.  print out the type of feature 21
  12. print the location of feature 21
  13. print the translation table used in feature 21
  14. print the first 20 characters of feature 21 protein sequence
  15. print out the first 20 characters of the entire mitochondria sequence

Print out the program and its output , staple together and turn in.

Here is an example printout for the above program.  I have replaced some values with ***

L43967.2
Mycoplasma genitalium G37, complete genome.
Number of features is ***
There are *** tRNA features
There are *** gene features
There are 476 CDS features

The following is feature 4
type: variation
location: [727:728](+)
qualifiers:
Key: compare, Value: ['L43967.1']
Key: gene, Value: ['dnaN']
Key: locus_tag, Value: ['MG_001']
Key: replace, Value: ['gg']

The following information is about feature 21
It is a CDS feature
Its location is [***:***](+)
It is using translation table ['***']
The first 20 characters of it’s protein sequence are MEY***************LSEIE

And finally the first 50 nucleotides of the entire sequence is
TAAGTTATTATT*************************************TTTAAA

Comments are closed.