Complexity Profile Builder

Overview:

Complexity of a nucleotide sequence is defined as the least number of events required to generate this sequence (Gusev et al., 1991). The events are (1) generation of a new symbol and (2) replication of the already generated symbol in certain orientation: direct (D), symmetrical (S), or inverse (I). While generating the source sequence, a definite set of orientations for replication may be specified via computer program interface; for example, replication in only one of the orientations (D-, S-, or I-complexities) or in any orientations (DSI-complexity) may be considered.

Algorithm:

The source data should be presented by the sample of sequences or by single sequence. Complexity is calculated within particular window, its length specified by user. Complexity value is represented by the number of steps to generate the sequence within frame (see Overview section). The frame is sliding over the sequence with 1bp shift and the resulting complexity values vector is plotted on the complexity profile ("draw profile" button) or is spooled in the text format ("view as text" button). In the case of multiple sequences input data it is assumed that sequences are phased, the complexity value is averaged over the phased frames of specified length and normalized by the number of sequences.

Input format:

The target sequence(s) should be presented as a single string per sequence without any delimiters within the sequence. The sequence(s) length could not exceed 32kbp.

References:

Gusev,V.D., Kulichkov,V.A. and Chupakhina,O.M. (1991) Complexity analysis of genomes. Measures of complexity and classification of the structural regulations revealed. Molek. Biol. (Mosk.), 25, 825-834.