• Intro

  • step_01
  • step_02
  • step_03
  • step_04
  • step_05
  • step_06
  • step_07
  • step_08
  • step_09
  • step_10
  • step_11
  • step_12
  • step_13
  • step_14
  • step_15
  • step_16
  • step_17
  • step_18
  • step_19
  • step_20
  • step_21
  • step_22
  • step_23
  • step_24

  • Credits

    UC Davis

    SNP/INDEL Discovery Pipeline based on CAP3 assembly
    How to find thousand polymorphic sites in EST assembly in 24 steps

    Alexander Kozik, Brian Chan and Richard Michelmore
    University of California at Davis, Department of Vegetable Crops

    One of the aims of the Compositae Genome Project http://compgenomics.ucdavis.edu/ is to generate PCR based markers for genetic mapping of lettuce and sunflower. Compositae Genome Project database (CGPDB) http://cgpdb.ucdavis.edu/ represents over 19,000 lettuce and 12,000 sunflower unigenes.

    We have developed custom pipeline (actually set of scripts written in Python and Tcl/Tk) to find SNPs (single Nucleotide Polymorphism) and INDELs (INsertion/DELetions) in EST contigs assembled by CAP3 program.

    By using our custom pipeline we have been able to find more than 2,500 SNPs/INDELs candidates out of 12,500 lettuce and sunflower contigs. Click here to view examples. These candidates will be used to generate molecular markers.

    To check whether our pipeline is suitable for any EST dataset we have tested it on tomato ESTs that are publicly available on NCBI database. We have been able to detect about 1,000 SNP/INDEL candidates out of 3821 tomato contigs for three genotypes: Lycopersicon esculentum, Lycopersicon hirsutum and Lycopersicon pennellii.

    Following web pages describe detailed protocols how to use our pipeline on tomato ESTs as an example.

    Note: this pipeline was designed by year 2003. Since that time a sligthly different approach and improved scripts were developed. You can check the current protocol of EST selection and SNP discovery here.