Conserved Orthologs: Multiple Alignments Pipeline


For example, we have run Codon Usage Pipeline on two sets of BLAST reports: soybean and tomato versus Arabidopsis. We have generated two subsets of alignments between soybean - Arabidopsis and tomato - Arabidopsis. Then we may want to find overlapping sequences (alignments) between two sets, e.g. we want to generate multiple alignments for overlapping regions: soybean - tomato - Arabidopsis for further analysis.

To accomplish this task (generation of multiple alignments soybean - tomato - Arabidopsis) we will use four files derived by Codon Usage Pipeline:

cos_soybean.codons.query_seq (contains soybean fragments of sequences corresponding to alignments to Arabidopsis)
cos_soybean.codons.subj_seq (Arabidopsis fragments of sequences corresponding to alignments to soybean)
cos_tomato.codons.query_seq (contains tomato fragments of sequences corresponding to alignments to Arabidopsis)
cos_tomato.codons.subj_seq (Arabidopsis fragments of sequences corresponding to alignments to tomato)

Then we will run overlap_finder_017.py script on these four files. Script will ask in which order to input sequences. By running of the script the output will be generated in the form of text file overlapping_seqs.txt containing all triple alignments [soybean - tomato - Arabidopsis] if common overlap is greater than 60 nucleotides and directory overlapping_seqs.dir where each alignment is represented as separate file. We can rename those file and directory into meaningful names, for example:

overlapping_seqs_cos_soybean_tomato_arabidopsis.txt
overlapping_seqs_cos_soybean_tomato_arabidopsis.dir.tar.gz

by mouse click you can download output files and examine them.

Note, that overlap_finder_017.py script will work only in the case if all four input files were derived by Codon Usage Pipeline. There are many assumptions in data structure which will work properly if all steps performed accordingly to described protocols.

email: akozik@atgc.org
last modified: December 19 2003