Estimation of genetic text complexity.  Construction and visualization of context tree source (VMM model)

DNA sequences:                                             Amino acid sequences:

Standard alphabet {A,T,G,C}                           2-lettered alphabet (hydrophobic/hydrophilic)  

      2-lettered alphabets:                                     3-lettered charge alphabet (base/neutral/acid )    

Weak/Strong          [AT][GC]                          3-lettered surface alphabet (outer/ambivalent/inner)    

Purine/Pyrimidine    [AG][TC]                            (For example, hydrophobic [AILMFPWV]=0, hydrophilic [RNDCQEGHKSTY]=1)


Text in user-defined alphabet

(Type DNA or amino acid symbols groups in brackets, like [at][gc] or  [AILMFPWV][RNDCQEGHKSTY], case is not sensitive

Legend for user-defined alphabet   (By default digits 01234... in the output

(Type one symbol for every group, like for [at][gc]: +- , or WS)


Input sequences here (FASTA format or plain text)

from Screen (cut & paste)...

or from File:

Preceding context length (1<n<12)

Method of pseudo counts calculation for absent contexts:

Default +0.5 for each absent context:    

+1 count:  Old variant:   No pseudocounts:

Text output of the tree source (Optimized variable memory Markov model for VMM software)


 Graphic output

Tree types : Standard tree  or  Round tree 

Letters in image (uncheck if no place for letters in small image)

Width of picture (in pixels, 100<x<2048)      Height of picture (in pixels, 100<y<1024)

       Example      Publications   

Related algorithms: Variable memory Markov model prediction, LowComplexity regions searchLempel-Ziv complexity

Other related algorithms for genetic text analysis: OligoRep system, Complexity profile, Verbumculus

The Institute of Cytology and Genetics (Russia)

This resource has been developed in Institute of Cytology and Genetics, Novosibirsk, Russia
Authors: Yu.L.Orlov, V.P.Filippov, V.N.Potapov
Contributor: S.V.Lavryushev, D.A.Grigorovich
Leader: N.A.Kolchanov

The research was partially supported by the Russian Foundation for Basic Research (RFBR), INTAS.

Last update 2005.