SNP/INDEL discovery pipeline based on CAP3 assembly

Step 3: Pre-processing of example dataset

So far, we have downloaded:

150193 ESTs for Lycopersicon esculentum and saved them in Lycopersicon_esculentum.fasta file
2504 ESTs for Lycopersicon hirsutum and saved them in Lycopersicon_hirsutum.fasta file
8346 ESTs for Lycopersicon pennellii and saved them in Lycopersicon_pennellii.fasta file

To distinguish these three genotypes we have modified EST IDs in fasta files. For example, all EST IDs in Lycopersicon_esculentum.fasta file after pre-processing contain prefix "A_", Lycopersicon_hirsutum.fasta - prefix "C_" and Lycopersicon_pennellii.fasta - prefix "B_"

It has been done by executing in UNIX shell following perl commands
(/find/replace/ regular expressions):

$ perl -p -i -e 's/^\>gi\|/\>A_/' Lycopersicon_esculentum.fasta
$ perl -p -i -e 's/\|/ /' Lycopersicon_esculentum.fasta

$ perl -p -i -e 's/^\>gi\|/\>C_/' Lycopersicon_hirsutum.fasta
$ perl -p -i -e 's/\|/ /' Lycopersicon_hirsutum.fasta

$ perl -p -i -e 's/^\>gi\|/\>B_/' Lycopersicon_pennellii.fasta
$ perl -p -i -e 's/\|/ /' Lycopersicon_pennellii.fasta