Step 3: Pre-processing of example dataset
      
      
      So far, we have downloaded:
      
       150193 ESTs for Lycopersicon esculentum and saved
      them in Lycopersicon_esculentum.fasta file
      
      2504 ESTs for Lycopersicon hirsutum and saved
      them in Lycopersicon_hirsutum.fasta file
      
      8346 ESTs for Lycopersicon pennellii and saved
      them in Lycopersicon_pennellii.fasta file
      
      To distinguish these three genotypes we have modified EST IDs in fasta files. 
      For example, all EST IDs in Lycopersicon_esculentum.fasta file
      after pre-processing contain prefix "A_", 
      Lycopersicon_hirsutum.fasta - prefix "C_"
      and Lycopersicon_pennellii.fasta - prefix "B_"
      
      It has been done by executing in UNIX shell following perl commands 
      
      (/find/replace/ regular expressions):
      
      
      
      
      
      
      $ perl -p -i -e 's/^\>gi\|/\>A_/' Lycopersicon_esculentum.fasta
      
      
      
      $ perl -p -i -e 's/\|/ /' Lycopersicon_esculentum.fasta
      
      
      
      $ perl -p -i -e 's/^\>gi\|/\>C_/' Lycopersicon_hirsutum.fasta
      
      
      
      $ perl -p -i -e 's/\|/ /' Lycopersicon_hirsutum.fasta
      
      
      
      $ perl -p -i -e 's/^\>gi\|/\>B_/' Lycopersicon_pennellii.fasta
      
      
      
      $ perl -p -i -e 's/\|/ /' Lycopersicon_pennellii.fasta
      
      
      
      
      