TreeComplexity

Estimation of genetic text complexity. Construction and visualization of context tree source (VMM model)

DNA sequences: Amino acid sequences:

Standard alphabet {A,T,G,C} 2-lettered alphabet (hydrophobic/hydrophilic)

2-lettered alphabets: 3-lettered charge alphabet (base/neutral/acid )

Weak/Strong [AT][GC] 3-lettered surface alphabet (outer/ambivalent/inner)

Purine/Pyrimidine [AG][TC] (For example, hydrophobic [AILMFPWV]=0, hydrophilic [RNDCQEGHKSTY]=1)

Text in user-defined alphabet

(Type DNA or amino acid symbols groups in brackets, like [at][gc] or [AILMFPWV][RNDCQEGHKSTY], case is not sensitive)

Legend for user-defined alphabet (By default digits 01234... in the output )

(Type one symbol for every group, like for [at][gc]: +- , or WS)

Input sequences here (FASTA format or plain text)

from Screen (cut & paste)...

or from File:

Preceding context length (1<n<12)

Method of pseudo counts calculation for absent contexts:

Default +0.5 for each absent context:

+1 count: Old variant: No pseudocounts:

Text output of the tree source (Optimized variable memory Markov model for VMM software)

Graphic output

Tree types : Standard tree or Round tree

Letters in image (uncheck if no place for letters in small image)

Width of picture (in pixels, 100<x<2048) Height of picture (in pixels, 100<y<1024)

Help Example Publications

Related algorithms: Variable memory Markov model prediction, LowComplexity regions search, Lempel-Ziv complexity

Other related algorithms for genetic text analysis: OligoRep system, Complexity profile, Verbumculus

This resource has been developed in Institute of Cytology and Genetics, Novosibirsk, Russia
Authors: Yu.L.Orlov, V.P.Filippov, V.N.Potapov
Contributor: S.V.Lavryushev, D.A.Grigorovich
Leader: N.A.Kolchanov

The research was partially supported by the Russian Foundation for Basic Research (RFBR), INTAS.

Last update 2005.