Intro
step_01
step_02
step_03
step_04
step_05
step_06
step_07
step_08
step_09
step_10
step_11
step_12
step_13
step_14
step_15
step_16
step_17
step_18
step_19
step_20
step_21
step_22
step_23
step_24
Credits
CGPDB
UC Davis
|
SNP/INDEL Discovery Pipeline based on CAP3 assembly
or
How to find thousand polymorphic sites in EST assembly in 24 steps
Alexander Kozik, Brian Chan and Richard Michelmore
University of California at Davis, Department of Vegetable Crops
One of the aims of the Compositae Genome Project
http://compgenomics.ucdavis.edu/
is to generate PCR based markers for genetic mapping of lettuce and sunflower.
Compositae Genome Project database (CGPDB)
http://cgpdb.ucdavis.edu/
represents over 19,000 lettuce and 12,000 sunflower unigenes.
We have developed custom pipeline (actually set of scripts written in Python and Tcl/Tk)
to find SNPs (single Nucleotide Polymorphism) and INDELs (INsertion/DELetions) in EST contigs
assembled by CAP3 program.
By using our custom pipeline we have been able to find more than 2,500 SNPs/INDELs candidates
out of 12,500 lettuce and sunflower contigs.
Click here
to view examples. These candidates will be used to generate molecular markers.
To check whether our pipeline is suitable for any EST dataset we have tested it on tomato ESTs that are publicly
available on NCBI database. We have been able to detect about 1,000 SNP/INDEL candidates out of 3821 tomato
contigs for three genotypes: Lycopersicon esculentum, Lycopersicon hirsutum and
Lycopersicon pennellii.
Following web pages describe detailed protocols how to use our pipeline on tomato ESTs as an example.
Note: this pipeline was designed by year 2003. Since that time a sligthly different approach
and improved scripts were developed. You can check the current protocol of EST selection and SNP discovery
here.
|