• Intro

  • step_01
  • step_02
  • step_03
  • step_04
  • step_05
  • step_06
  • step_07
  • step_08
  • step_09
  • step_10
  • step_11
  • step_12
  • step_13
  • step_14
  • step_15
  • step_16
  • step_17
  • step_18
  • step_19
  • step_20
  • step_21
  • step_22
  • step_23
  • step_24

  • Credits



  • CGPDB
    UC Davis


    Step 3: Pre-processing of example dataset

    So far, we have downloaded:

    150193 ESTs for Lycopersicon esculentum and saved them in Lycopersicon_esculentum.fasta file
    2504 ESTs for Lycopersicon hirsutum and saved them in Lycopersicon_hirsutum.fasta file
    8346 ESTs for Lycopersicon pennellii and saved them in Lycopersicon_pennellii.fasta file

    To distinguish these three genotypes we have modified EST IDs in fasta files. For example, all EST IDs in Lycopersicon_esculentum.fasta file after pre-processing contain prefix "A_", Lycopersicon_hirsutum.fasta - prefix "C_" and Lycopersicon_pennellii.fasta - prefix "B_"

    It has been done by executing in UNIX shell following perl commands
    (/find/replace/ regular expressions):

    $ perl -p -i -e 's/^\>gi\|/\>A_/' Lycopersicon_esculentum.fasta
    $ perl -p -i -e 's/\|/ /' Lycopersicon_esculentum.fasta

    $ perl -p -i -e 's/^\>gi\|/\>C_/' Lycopersicon_hirsutum.fasta
    $ perl -p -i -e 's/\|/ /' Lycopersicon_hirsutum.fasta

    $ perl -p -i -e 's/^\>gi\|/\>B_/' Lycopersicon_pennellii.fasta
    $ perl -p -i -e 's/\|/ /' Lycopersicon_pennellii.fasta