In this project you are to go to NCBI and grab at least 15 complete zika genomes you can find. They are all 10000+ bp’s long. Within each is a very large polyprotein. You can compare the genomes or proteins are both. Download in fasta format and gb format. You are to do a multiple sequence alignment using the clustal-omega tool. In order to do this you must first download the tool from here, decompress it into a folder and put the folder where ever you want it. You can then call the tool from the ClustalOmegaCommandline() python wrapper in Biopython to generate an alignment file and newick parenthisized tree file. Note that clustalw Look up the parameters for ClustalOmegaCommandline() so you can see how this is done. Having created the alignment file then use the Phylo.read() and Phylo.draw() methods to display the phylo tree. Pick appropriate parameters for draw() to make the image display correctly. You can also use AlignIO.read() to read in the alignment file and display it. There are also many tools on the web for displaying these phylo trees if you are interested. Remember that help(ClustalOmegaCommandline) will give you the basic parameters to run the python wrapper.
After doing the above write a paper explaining what you think happened to create this collection of variations of the virus. Also look at the location where the virus was obtained to see if this has any relevance to the evolutionary distance between the different strains. Include in you paper the generated Phylo tree image as well as one generated on one of the web sites. If they are not the same discuss how then could be different.
Also look up the name of the algorithm that us used by Clustal-Omega to construct the tree. This is one of the distance-based methods. Discuss this algorithm briefly as well at the method used to construct the tree data.
Include the python source that you used to generate the above data.