Contig Viewer and Arabidopsis Assembly

Arabidopsis EST assembly was done on a relatively small set of RAFL ESTs (61,281 sequences) downloaded from NCBI by May 2002.

CAP3 was used with default parameters:
overlap length cutoff - 40 nt
overlap percent identity cutoff - 80%
clipping range - 250 nt

Upon assembly 9,487 contigs were generated. Those contigs were analyzed by DIS pipeline. Input and output files are available for download:

RAFL_CA_HTrm.fasta.gz - set of 61,281 RAFL ESTs
RAFL_CA_HTrm.cap.contigs.gz - 9,487 contigs
RAFL_CA_HTrm_CAP3.out.gz - CAP3 output with detailed information about assembly

RAFL_CA_HTrm_CAP3.out has been processed by DIS pipeline (see steps 10, 11 and 12), you can download all 9,487 alignments in CAP3 format here (Ath_Alignments.tar.gz file)

Output files with information about polymorphic sites:
have been examined and some interesting examples were found and analyzed with Contig Viewer:

ATH_Contig5584.align with possible alternative spliced ESTs in the assembly (see graphical output here)

ATH_Contig660.align - set of paralogs assembled into one contig (see graphical output here)

ATH_Contig5295.align and other interesting case (see graphical output here)

Note that this Arabidopsis assembly is represented by one genotype. In this case script can detect so called "partial" substitutions only.

mailto: Alexander Kozik
last modified: March 08 2004