SNP/INDEL discovery pipeline based on CAP3 assembly

Step 1: Understanding the problem
and
Hardware/Software requirements

The basic idea of the approach is to combine two or more different genotypes into a given assembly. If CAP3 assembly represents two or more different genotypes then this assembly contains all information needed to find polymorphism between those genotypes.

A simple algorithm has been deployed: Polymorphic site can be considered as a candidate for SNP/INDEL if it belongs only to all members of one genotype in the given contig.

All programs and scripts were run on UNIX/Linux environment. There is an assumption that user has primary skills in UNIX shell. No knowledge of programming is required. To get proper results user should use "step by step" instructions. Final results can be viewed on any desktop computer. Perl, Python and Tcl/Tk should be installed on a computer where scripts will be run as well as NCBI BLAST and CAP3 assembler. Computer should be with a CPU at least 750 MHz and 1 Gb of memory.

All results for tomato dataset in this environment were obtained in less than 24 hours. Scripts developed in our lab are freely available for download and use.

In the case you are using our pipeline and success, please refer to:
Python DIS pipeline developed by A.Kozik, B.Chan and R.Michelmore at UCD. http://cgpdb.ucdavis.edu/SNP_Discovery/ (DIS stands for Deletions - Insertions - Substitutions)

Note: this pipeline was designed by year 2003. Since that time a sligthly different approach and improved scripts were developed. You can check the current protocol of EST selection and SNP discovery here.