Help. Variable memory Markov (VMM) model

for nucleosome formation site prediction

The program is intended for nucleosome formation site prediction in genomic DNA.

The program outputs probability estimation for of local DNA region to be in nucleosome structure contacting with histone octamer.

The input sequences should be in FASTA format. Sequences assumed to be phased (by equal length) to obtain averaged profile.

Program accepts even relatively short sequences (at least 10 bp). Upper size of sequences is up to 1 Mb.

User can input single sequence (even in plain format). In this case "SINGLE SEQUENCE" button should be selected

to avoid abundant output statistics (such as averaging by one sequence).

The output is profile in text format (raw of digits).

User can input sequence in the window (cut and paste)

Input sequence(s) here (FASTA format or plain text) (cut & paste)
>P00038 ttcactcttgaagccagcaaggccatgaacccaccaggaggaacaaacaactctggacgt gccatgtttaagagctgtaacactcactgtgaaggtctgcagcttcactcctgaagtcag tgagaccatgaagccactgggaggaatgaacaactctggacatgtcacctttaagagctc tgacactcactgcgaaggtctgcagcttctggacacaatactattcattctcaccttaaa gacgaggaaactaaggcaaagaacagtcaactaataagtccaagtatacagagctgctaa ggaatagtctgtctgatcccaaaggctgtgtcataaccgcttccctatactgcctctcag cagaggtaagagtcaagttttatttatcactgccaccccatcagccccagcttagtgcct gacacagggagatgctcaatcaatgctgattgttattgagtggactagaaatgcaaggca cagtgagcccctttgctgtgactgatggggtgtctgattttctgctataaagaggagagt gctgtatcaaacacactcctctggctcctagctctctctgttcactttgtttatccaatt tccctactcctccttcataactgcaccatgtggattcaaaattgcagcttagtgcagaca aagggaaaacggaattctgaatgaccccaaagggaaaactgaactctgaatgacccctgt gggtttgagagaagagaagcaggaacttgagagaggaggaagagagaaagtaattaaaat gtatcgttttaacttaatatttaaccgaatgatagcaaaatcttatctgaaattggaaaa gtcaaggttttgagtgctggttcggtgcccatttctttatgatttgatagtctgagaaga atacgacgggtgtggcttaaaaacctagatcacgtgtgtagttggaattgggtgttatat gagcaaacaaaataaatacctgtgcaacatacctgctttatgcactcaagcagagaagaa atccacaagtactcaccagcctcctggtctgcagagaagacagaatcaatatgagcacag caggaaaagtaagcaaaaaa >P00370 gacttctcgggctcaagcaatcctcccacctcagcctcccaagtagctgggactacgggc acacgccaccatgcctggctaatttttgtattttttgtagagatgggtcttcaccatgtt gatcaggctggtctcgaactcctgggctcatgcgatccaccccgccagctgattacaggg attccggtggtgagccaccgcgcccagacgccacttcatcgtattgtaaacgtctgttac ctttctgttcccctgtctactggactgtgagctccttagggccacgaattgaggatgggg cacagagcaagctctccaaacgtttgttgaatgagtgagggaatgaatgagttcaagcag atgctatacgttggctgttggagattttggctaaaatgggacttgcaggaaagcccgacg tccccctcgccatttccaggcaccgctcttcagcttgggctctgggtgagcgggataggg ctgggtgcaggattaggataatgtcatgggtgaggcaagttgaggatggaagaggtggct gatggctgggctgtggaactgatgatcctgaaaagaagaggggacagtctctggaaatct aagctgaggctgttgggggctacaggttgagggtcacgtgcagaagagaggctctgttct gaacctgcactatagaaaggtcagtgggatgcgggagcgtcggggcggggcggggcctat gttcccgtgtccccacgcctccagcaggggacgcccgggctgggggcggggagtcagacc gcgcctggtaccatccggacaaagcctgcgcgcgccccgccccgccattggccgtaccgc cccgcgccgccgccccatcccgcccctcgccgccgggtccggcgcgttaaagccaatagg aaccgccgccgttgttcccgtcacggccggggcagccaattgtggcggcgctcggcggct cgtggctctttcgcggcaaaaaggatttggcgcgtaaaagtggccgggactttgcaggca gcggcggccgggggcggagcgggatcgagccctcgccgaggcctgccgccatgggcccgc gccgccgccgccgcctgtca

Standard file upload instead of copying is also available

or From file

User can select sliding window size for probability estimation. Nucleosome size 146 bp is recommended.

Some part size of site like 50 bp is also acceptable, but at least 10 bp.

Profile step other than 1 bp also could be defined to avoid lager output for long sequnces.

Profile parameters:

Sliding window size, bp (>10) Profile step (shift of sliding window)

use pre-defined models (nucleosome):

or input pre-calculated VMM model in text format here (calculated by TreeComplexity program)

(cut & paste)

or from File:

The program allows usage of pre-calculated model for a set of DNA sequences.

Please construct such model by the related program: Complexity by context tree source

Output of this program (oligonucleotides and frequencies in text file) could be used as a VMM model for prediction (cut and paste in the form). Option "user-defined model" should be selected in the appropriate drop-down menu instead of "Nucleosome formation sites".

Bottom group of parameters define if single sequence or phased sequences should be analysed.

User can select format of the program output.

Output details: Single sequence

Prediction profile only Detailed profile (position and logarithm probability)

Set of phased sequence (averaged profile)

Mean profile only

Detailed report (Mean, standard deviation + profiles for every sew. in the set)

User can define supplementary options to mark sequence position in the output

Supplementary output options:

Shift position (Profile 0 position) bp

Centering position (Profile 0 position in the center of the sequence) Yes No

Default value of logarithm probability (CompareLevel in the detailed report output)

is calculated for sequence with equal nucleotide frequencies.

User can also obtain graphical plot of the profile using the corresponding option:

Graphic mode

Pleas check, this option is OFF by default.

Such graphic output is interactive, i.e. user can change borders of the profile, change dot size and color etc.

If you want obtain smoothed line, try to change the profile step to 1 (default is 10 bp).

Buttons "Execute" starts program run with the parameters selected.

Button "Reset form" changes all the parameters to default values.

Return to the main menu

Example Publications Results Related program: RECON Nucleosome database

Authors: Yu.L.Orlov, V.G.Levitsky

Contributors: S.V.Lavryushev, D.A.Grigorovich, S.A.Poplavsky

Leader: N.A.Kolchanov

The research was partially supported by the Russian Foundation for Basic Research (RFBR) and Siberian Branch of the Russian Academy of Sciences (Integration project No. 119).