PROGRAM TO RECOGNIZE EUKARYOTIC PROMOTERS

by Victor G. Levitsky

Contents:

  1. Objective and History
  2. Theoretical Model and Training sets
  3. Input Data Description
  4. Output Data Presentation
  5. Detailed Description of Operations to Run the Program
  6. Example of program execution
  7. References
  8. Concluding Notes and Communications

1. Objective and History

Main aim of the program is recognizing in a nucleotide sequence promoters. The current program is the original program by Levitsky developed in 1999-2002.

2. Theoretical Model and Training Sets

The recognition of eukaryotic gene promoters based on their partitioning to separate regions and on assessment of dinucleotide frequencies distribution by discriminant analysis [1,2]. The trained sets of promoters were compiled  by DPD [3] (Drosophila) and TRRD [4] (Human) data .

3. Input Data Description

The program generates output data by the profile of DPE-containing (DPE+) [5] or TATA--containing (TATA+) promoter recognition function. This profile constructed when window of size 400 bp ([-300, +100] referring to transcription start site) slides within analyzed sequence. If a promoter is detected, the position of potential transcription start site is reported. This has the consequences that no predictions are made within the first 300 bases and the last 100 bases of analyzed sequence and that this sequence has to be at least 400 bp long. The sequence should be in plain format (see Example). Nucleosome potential (NP) [6] was calculated in addition to the promoter recognition function. We have earlier demonstrated that the region [-50; +1] relative to the transcription start exhibits decreased mean values of NP [6].

4. Output Data Presentation

The output gives the scores between 0 and 1, 1 is a best hit. The Confidence Level CF value lets choose preferred compromise between sensitivity and specificity. The default CF value is set at 0.95; reasonable values are in the range of 0.5 to 0.99. The CF value denotes the portion of correctly recognized sequences of training set. The higher CF allows to get the more hits.

5. Detailed Description of Operations to Run the Program

Output data presentation for any nucleotide sequence can be obtained as follows:
1) Enter your nucleotide sequence into the sequence box. 
2) Designate parameters : 
    Promoter type: TATA-containing or DPE-containing promoter types should be selected; 
    Confidence Level (default 0.95); 
    Nucleosome potential calculation (default) 
    Output mode: graphic (default) or numeric
    Strand: forward (default) or reverse. 
3) Click over [SCAN] and wait for the end of the calculations.

6. Example of program execution

Click the button Example to see the example of program execution.

7. References

[1] Levitsky V.G., Katokhin A.V., Kolchanov N.A. // Computational technologies (Novosibirsk), 2000,  5, 41-47.
[2] Levitskii V.G., Katokhin A.V. // Mol. Biol. (Mosk.), 2001, 35(6), 970-978.
[3] Arkhipova I.R. // Genetics, 1995, 139, 1359-1369.
[4] Kolchanov N.A, Ignatieva E.V., Ananko E.A., Podkolodnaya O.A., Stepanenko I,L, Merkulova T.I., Pozdnyakov M.A., Podkolodny N.L., Naumochkin A.N., Romashchenko A.G.  // Nucleic Acids Res., 2002, 30(1), 312-317.
[5] Kutach A.K., Kadonaga J.T. // 2000,  Mol. Cell. Biol., 20(13), 4754-4764.
[6] Levitsky V.G., Podkolodnaya O.A., Kolchanov N.A., Podkolodny N.L.
// Bioinformatics, 17, 2001, 998-1010.

8. Concluding Notes and Communications

Any comments and questions may be mailed to author - Levitsky

last modified 07.07.02.


Institute of Cytology and GeneticsLogo

Author: Victor Levitsky
Contributors: Sergey V. Lavryushev
Leader: Prof. N.A. Kolchanov

© 1997-2001, IC&G   SB RAS, Laboratory of Theoretical Genetics