PROGRAM TO RECOGNIZE NUCLEOSOME FORMATION SITES

by Victor G. Levitsky

 Contents:

  1. Objective and History
  2. Theoretical Model and Data Presentation
  3. Input Data Description
  4. Detailed Description of Operations to Run the Program
  5. Profile Examples
  6. References
  7. Concluding Notes and Communications

1. Objective and History

Main aim of the program is recognizing in a nucleotide sequence nucleosome binding sites. The current program is the original program by Levitsky developed in 1999-2000.

2. Theoretical Model and Data Presentation

Program generates a profile of the so called nucleosome formation potential (NFP) [1]. The NFP profile is constructed when a window of 160 bp in size slides within the sequence analyzed. Algorithm for NFP calculation is based on the special partition of nucleosome site region into non-overlapping parts. The NFP function is derived by discriminant analysis of dinucleotides frequencies [1]. The nucleosomal DNA database [2] was used as the trained set of nucleosome positioning sequences. Output data are presented in the graphical representation or in numerical delivery (option 'Graphic mode'). The NFP profile may be transformed to the interval [0;+1] (option 'Standardization by dispersion', so that value +1 corresponds to the best prediction and 0, to the worst. 

Confidence level (CF) value is applied in case the option 'Standardization by dispersion' is turned on. The CF value denotes the portion of correctly (i.e., within the interval [0, 1] of output range) recognized sequences of the training set. The default CF value is set at 0.95; reasonable values fall within the range 0.5 - 0.99. The higher CF allows to get more hits in the interval [0, 1]. Detailed description of calculation with CF value is given in [4].

The dinucleotide relative abundance distance [3] was chosen as additional restriction for input data to exclude the sequences with a poor dinucleotide content. Positions of the sliding window, which are ignored by the program, are marked by colour in the graphical representation of the program output and by symbol *, in numerical delivery. 

3. Input Data Description

The input data for this program are the following: sequence Length L. To run the program correctly, this values should be defined within certain limits: sequence Length L should not be less than 160 bp and not exceeds  32 kbp.

4. Detailed Description of Operations to Run the Program

Profile presentation for any nucleotide sequence can be obtained as follows:

1) Enter your nucleotide sequence the inputting sequence box.

2) Click over [SCAN] and wait for the end of the calculations.

3) The derived chart can be saved as a GIF image by clicking it context menu.

5. Profile Examples

Click the button EXAMPLE to see the example of program execution.

6. References

[1] Levitsky V.G., Podkolodnaya O.A., Kolchanov N.A., Podkolodny N.L. // Bioinformatics, 2001, V. 17, P.  998-1010.

[2] Ioshikhes I., Trifonov E.N. // Nucl. Acids Res. 1993. V. 21. P. 4857-4859.

[3]  Karlin S., Ladunga I. // Proc Natl Acad Sci USA 1994 V. 91, P. 12832-12836.

[4] Levitsky V.G. // Nucl. Acids Res. 2004 (in press).

7. Concluding Notes and Communications

Any comments and questions may be mailed to author - Levitsky

last modified 22.03.04.


Institute of Cytology and GeneticsLogo

Author: Victor Levitsky
Contributors: Sergey V. Lavryushev
Leader: Prof. N.A. Kolchanov

© 1997-2004, IC&G   SB RAS, Laboratory of Theoretical Genetics