PROGRAM TO RECOGNIZE EUKARYOTIC PROMOTERS
by Victor G. Levitsky
1. Objective and History
Main aim of the program is recognizing in a nucleotide sequence promoters. The current program is the original program by Levitsky developed in 1999-2002.
2. Theoretical Model and Training Sets
The recognition of eukaryotic gene promoters based on their partitioning to separate regions and on assessment of dinucleotide frequencies distribution by discriminant analysis [1,2]. The trained sets of promoters were compiled by DPD  (Drosophila) and TRRD  (Human) data .
3. Input Data Description
The program generates output data by the profile of DPE-containing (DPE+)  or TATA--containing (TATA+) promoter recognition function. This profile constructed when window of size 400 bp ([-300, +100] referring to transcription start site) slides within analyzed sequence. If a promoter is detected, the position of potential transcription start site is reported. This has the consequences that no predictions are made within the first 300 bases and the last 100 bases of analyzed sequence and that this sequence has to be at least 400 bp long. The sequence should be in plain format (see Example). Nucleosome potential (NP)  was calculated in addition to the promoter recognition function. We have earlier demonstrated that the region [-50; +1] relative to the transcription start exhibits decreased mean values of NP .
4. Output Data Presentation
The output gives the scores between 0 and 1, 1 is a best hit. The Confidence Level CF value lets choose preferred compromise between sensitivity and specificity. The default CF value is set at 0.95; reasonable values are in the range of 0.5 to 0.99. The CF value denotes the portion of correctly recognized sequences of training set. The higher CF allows to get the more hits.
5. Detailed Description of Operations to Run the Program
Output data presentation for any nucleotide sequence can be obtained as
1) Enter your nucleotide sequence into the sequence box.
2) Designate parameters :
Promoter type: TATA-containing or DPE-containing promoter types should be selected;
Confidence Level (default 0.95);
Nucleosome potential calculation (default)
Output mode: graphic (default) or numeric
Strand: forward (default) or reverse.
3) Click over [SCAN] and wait for the end of the calculations.
6. Example of program execution
Click the button Example to see the example of program execution.
 Levitsky V.G., Katokhin A.V., Kolchanov N.A. // Computational
technologies (Novosibirsk), 2000, 5, 41-47.
 Levitskii V.G., Katokhin A.V. // Mol. Biol. (Mosk.), 2001, 35(6), 970-978.
 Arkhipova I.R. // Genetics, 1995, 139, 1359-1369.
 Kolchanov N.A, Ignatieva E.V., Ananko E.A., Podkolodnaya O.A., Stepanenko I,L, Merkulova T.I., Pozdnyakov M.A., Podkolodny N.L., Naumochkin A.N., Romashchenko A.G. // Nucleic Acids Res., 2002, 30(1), 312-317.
 Kutach A.K., Kadonaga J.T. // 2000, Mol. Cell. Biol., 20(13), 4754-4764.
 Levitsky V.G., Podkolodnaya O.A., Kolchanov N.A., Podkolodny N.L. // Bioinformatics, 17, 2001, 998-1010.
8. Concluding Notes and Communications
Any comments and questions may be mailed to author - Levitsky
last modified 07.07.02.
Contributors: Sergey V. Lavryushev
Leader: Prof. N.A. Kolchanov
© 1997-2001, IC&G SB RAS, Laboratory of Theoretical Genetics