Conserved Ortholog Set (COS) candidates.
The general strategy to identify COS candidates was:
- Selection of ESTs with best match for every translated Arabidopsis ORF.
- BLAST (blastx) search of selected ESTs against Arabidopsis ORFs.
- selection of ESTs with a single hit to Arabidopsis using
tcl_blast_parser_123.tcl.
- Clustering analysis using
Graph9 program
to remove all EST-Arabidopsis clusters with multiple Arabidopsis nodes
from the potential COS set.
Clustering parameters were: Expect cutoff 1e-10, Identity cutoff 20% and Overlap cutoff 50 amino acids.
- Final set with clusters where Arabidopsis gene is represented as a single node can be
cosidered as a true Conserved Orthologs Set (COS).
So far, we have identified 1130 potential COS markers
for Lettuce, 426 for Sunflower, 1860 for
Tomato and 1413 for Corn with 2185
Arabidopsis
sequences. These numbers correspond to EST sequences
with BLAST expectation value 1e-20 or
better from the COS Table at CGPDB.