About ComputerScienceExpert

Levels Tought:
Elementary,Middle School,High School,College,University,PHD

Expertise:

Applied Sciences,Calculus See all

Applied Sciences,Calculus,Chemistry,Computer Science,Environmental science,Information Systems,Science Hide all

Teaching Since:	Apr 2017
Last Sign in:	103 Weeks Ago, 3 Days Ago
Questions Answered:	4870
Tutorials Posted:	4863

Education

MBA IT, Mater in Science and Technology
Devry
Jul-1996 - Jul-2000

Experience

Professor
Devry University
Mar-2010 - Oct-2016

Category > Programming Posted 09 May 2017 My Price 9.00

Align the FASTA sequences using a program called "mafft"

In this assignment, we are going to:

1. Align the FASTA sequences using a program called "mafft"

2. Calculate pairwise distances between the aligned sequences using a program called "quicktree"

Your assignment is to write an executable program called:

~/assignments/assignment10/alignAndDist.py
The program should take a single command-line argument: the name of a FASTA-formatted sequence file. As test cases, I have supplied three unaligned FASTA files in:

/home/PyProgBiol_materials/assignment10/
called:

/home/PyProgBiol_materials/assignment10/BC01.fasta
/home/PyProgBiol_materials/assignment10/BC02.fasta
/home/PyProgBiol_materials/assignment10/BC03.fasta

These are the same as the correct output files from assignment 9. For testing purposes, I would copy these files into your ~/assignments/assignment10 directory. Your program should then work on any one of these input files, by typing:

./alignAndDist.py BC01.fasta

The program should print nothing to the screen, although you are free to print progress messages (or anything else you want) to the screen, if you'd like.

The required output will be 3 files, in this case named:

BC01.fasta.mafft
BC01.fasta.tab
BC01.fasta.dists
That is, the first one is the user-supplied file name, with an added ".mafft" extension. This file will contain the mafft-aligned sequence files.

The second file is a conversion of BC01.fasta.mafft to tab-delimited format. You may want to use the program:

/home/PyProgBiol_materials/assignment10/fastaToTab.py
To accomplish this task. You are free to copy this to your ~/assignments/assignment10 directory, or use it where it is. I have also placed a copy of this program in /usr/loca/bin, so it is in your executable PATH. You can use it by just typing:

fastaToTab.py
I have also written this as a python module, so you can "include" it in your program, should you wish to take that route.

Finally, the last required output file, BC01.fasta.dists, contains a matrix of pairwise distances between all the aligned sequences in BC01.fasta.tab. This can be produced from BC01.fasta.tab by the "quicktree" program.

So, the order of operations goes:

XX.fasta (user-input file) -> XX.fasta.mafft (aligned sequences) -> XX.fasta.tab (convert aligned sequences to tab-delimited) -> XX.fasta.dists (pairwise distances)

Both mafft and quicktree are in /usr/local/bin on our server, so they are available by typing:

mafft
or

quicktree
I will go over using these in the podcasts, although you are welcome to try on your own and/or consult the internet.

Also, all the required output files for each of the supplied .fasta input files can be found in:

/home/PyProgBiol_materials/assignment10/

For testing purposes.

This assignment does a lot of computation with minimal Python code, and highlights writing scripts that make use of existing 3rd-party command-line applications. This represents about 90% (well, a large bit, anyway) of what "bioinformatics" really is: piecing together programming bits to perform complex analysis "pipelines."

Answers

ComputerScienceExpert

(11)

Status NEW Posted 09 May 2017 05:05 AM My Price 9.00

-----------

Not Rated(0)

Buy Answer

Hire Dedicated Virtual Team / Business Solution for SMEs.