Python and Tcl/Tk scripts and tools to process and analyze DNA sequences and related data

GenBank2Fasta_UniExtractor_124.tcl - GenBank to Fasta file converter; besides of sequence extraction this parser extracts additional useful information from GenBank file and place it into Fasta header file.
GenBank2Fasta_UniExtractor_126.tcl - current version, minor bug fixes.

seqs_processor_and_translator_bin_V124_AGCT.py - DNA sequence processor and translator; it does translation in 6 frames in batch mode. Brief description is here
seqs_processor_and_translator_bin_V126_AGCT.py - current version, it has new function - sequence split into multiple fasta files.

tcl_blast_parser_123_V038.tcl - NCBI BLAST parser. Detailed description is here
tcl_blast_parser_123_V039.tcl - current version
tcl_blast_parser_123_V041.tcl - current version - to find common query overlap

SeqsExtractorFromBlastX_V124.py - Extraction of ORF (open reading frame) from BLAST-X report. BLAST EST sequences against protein reference database and extract EST fragment that correspond to BLAST-X alignment.
SeqsExtractorFromBlastX_V126.py - current version (with no_hits counter).

SeqsExtractorFromTclBlast_V001.py - extraction of sub-region from BLAST report (blast-x) if hit ID has match to query ID.

seqs_subgroup_extr_001.py - sequence subgroup extractor (1)
seqs_subgroup_extr_003.py - sequence subgroup extractor (3)
to extract sequence subset from FASTA file based on gene ID list: version (1) - full size sequence extraction
version (3) - extraction of defined fragment

seqs_drobilka_003_mod.py - sequence splitter into overlapping fragments.

seqs_trimmer_2007_03_20.py - EST sequence trimmer. It's weird, use it on your own risk.

seqs_processor_ultra_polyA_V009.py - sequence masking based on BLAST-N search against Vector_M_PolyAAA.fasta vector database. It's weird too, use it on your own risk.

redundancy_elimination_005.py - redundancy elimination for sequences in FASTA file by Travis Kleeburg. read more here

qsep_002M.py - quality scores extractor from Phred output and trimmed sequences

Scripts to process CAP3 alignments:
Python_CAP3_ContigExtractor_Uni_2007_03_19.py
Python_CAP3_MM_Finder_Uni_2007_03_19.py
Python_CAP3_MM_Finder_Uni_2007_03_24f.py - current experimental version
Python_CAP3_MM_Finder_Uni_2007_03_24h.py - current experimental version
Python_CAP3_contig_poly_DIS_Uni_2007_03_19.py
Python_CAP3_ClipInfoExtractor_Uni_2007_03_19.py
Detailed description is here

Manipulation with CAP3 derivative files:
getcontig.py - post-processing of so-called CAP3 Info file after Python_CAP3_ContigExtractor_Uni_2007_03_19.py script
countContig.py - estimation of CAP3 contig complexity based on CAP3 Info file after Python_CAP3_ContigExtractor_Uni_2007_03_19.py script
read more here

SequenceTrimmer.py - to trim low-quality region from CAP3 alignment
detailed description is here

Scripts for Genetic Maps
addDuplMarker.py - add duplicated markers to non-redundant map
Instructions are here

MadMapper - current versions:
Python_MadMapper_V248_RECBIT_012NR.py - clustering
Python_MadMapper_V248_XDELTA_117.py - map construction
Python_MadMapper_V248_XDELTA_119.py - map construction (current version; variable column ID with pairwise data)
py_matrix_2D_V248_RECBIT.py - map visualization
MadMapper details here

MadMapper clustering based on numerical data
Python_UniCluster_V011.py - really 'beta' ...

Scripts to manipulate tab-delimited tables
tableRotation_2007_03_21.py
tableSort_2007_03_21.py
Read more here

Pixelirator - graphical data display for tab delimited tables

Scripts for Affymetrix Chip design
seqs_processor_and_translator_bin_V027_AGCT_Affy_V05.py - to generate Affy submission
seqs_processor_and_translator_bin_V027_AGCT_N2A.py - to convert 'N' to 'A' in fasta file
AffyProbeSetSorter-006.py
TkLife_Search_07M_Affy_05_off1_100L_ContigViewerTest.tcl
TkLife_Search_12M_AffySuper_25_off1_300L_025_035_25M.tcl
z-xlog-run-affy-chip.txt


email: akozik@atgc.org
last modified: May 14 2007