Complexity decomposition of genetic texts by modified Lempel-Ziv method

 

To search low complexity regions by other methods (entropy estimations, linguistic complexity)

use new LowComplexity program

To analyze only DNA sequences use smart form here

To analyze complexity profile use  LZprofile program

 

Complete menu of program parameters:

DNA   Standard alphabet {A,T,G,C}                       

Reduced DNA alphabets:   Weak/Strong  [AT][GC] Purine/Pyrimidine  [AG][TC]   Amino/Keto  [AC][TG]     

 

Amino acid sequences Standard 20-lettered alphabet {AILMFPWVRNDCQEGHKSTY}  

Simplified amino acid alphabets

2-lettered alphabet (hydrophobic/hydrophilic)            (i.e. [AILMFPWV] and [RNDCQEGHKSTY] )

3-lettered surface alphabet (outer/ambivalent/inner)   (i.e. [RNDQEHK], [ACGPSTWY] and [ILMFV] )

(Please uncheck search parameters 'Invert Copy' and 'Complementary Copy' for non-standard alphabets).

Text in user-defined alphabet

(Type DNA or text symbols groups in brackets, like [at][gc] or  0123, or [0][12]3, case is not sensitive

Non-standard complementary function for user-defined alphabet (default atgc->tacg) 

(By default A<->T G<->C, use another only for special estimations. Type symbols in appropriate order, i.e. tacg for or 1032 (0123->1032)  

 

Input sequences here (FASTA format or plain text) (cut & paste)

or from File:

Calculation method:

profile - local complexity in sliding window (default)

Profile parameters: Sliding window size    Profile step (shift of sliding window)


Decompose whole file by repeated fragments (sequences joined)   

Decompose each sequences in file (sequences not joined)       

Analyze all sequences (construct table) by all others sequences in the set (in FASTA-format) 

Count complexity vector (table of complexity values for all variants of complementary function)

 (can't use with best non-standard complementary function)

 

analyze sequence(s) by other sequence(s) (second file required)

(if you want to analyze all sequences by all, please please join it in one file and use option all sequences  above)
Input sequences here (FASTA format or plain text) (cut & paste)

or from File:

 

    Copying operations while sequence generation using:

Repeats: Direct Copy     Symmetric Copy

Complementary repeats:  Invert Copy     Direct Complementary Copy

 

Use best non-standard complementary function for each step   (use only for special estimations

 

    Output parameters:

No decomposition report (only complexity value, by default)

full detailed report (all components of decomposition)

economic report  (selected long components only) Show sequence in frame in 'economic' report if its complexity <      

tandem report (statistics of tandem repeats in decomposition)

statistical report (statistics of components lengths)  

Table distance between repeats for statistical reports w=

(only statistics of long repeats (for complete decomposition)  and low complexity regions (in profile) will be shown) 

User-defined length of long repeats >

to show sequence and use in statistical comparisons (bp)

                
Help
       Example      Publications   

Review of related methods for genetic text complexity analysis

Recent findings by the program in promoter complexitycomplete bacterial genomes complexity comparison

Old realization of DNA-oriented  Lempel-Ziv algorithm by (Babenko et al., 1999): Complexity profile_builder

Related complexity analysis algorithms: Complexity by context tree source, SIMPLE, Linguistic Complexity, Transformation  DistanceGenCompress

Graphical presentation of sequence regularities: OligoRep system, Verbumculus, DeBruijn graphs 

The Institute of Cytology and Genetics (Russia)

This resource has been developed in Institute of Cytology and Genetics, based on methods developed in Sobolev Institute of Mathematics, Novosibirsk, Russia 
Authors: Yu.L.Orlov, V.P.Filippov, V.D.Gusev, L.A.Miroshnichenko(Nemytikova)
Contributors: S.V.Lavryushev, D.A.Grigorovich
Leader: N.A.Kolchanov

The research was partially supported by the Russian Foundation for Basic Research (RFBR), INTAS, Ministry of Education (grant No.E02-6.0-250), NATO (LST.CLG 979815) and Siberian Branch of the Russian Academy of Sciences (Integration project No. 119).