PROGRAM TO RECOGNIZE EUKARYOTIC PROMOTERS
by Victor G. Levitsky
Contents:
Main aim of the program is recognizing in a nucleotide sequence promoters. The current program is the original program by Levitsky developed in 1999-2002.
2. Theoretical Model and Training Sets
The recognition of eukaryotic gene promoters based on their partitioning to separate regions and on assessment of dinucleotide frequencies distribution by discriminant analysis [1,2]. The trained sets of promoters were compiled by DPD [3] (Drosophila) and TRRD [4] (Human) data .
The program generates output data by the profile of DPE-containing (DPE+) [5] or TATA--containing (TATA+) promoter recognition function. This profile constructed when window of size 400 bp ([-300, +100] referring to transcription start site) slides within analyzed sequence. If a promoter is detected, the position of potential transcription start site is reported. This has the consequences that no predictions are made within the first 300 bases and the last 100 bases of analyzed sequence and that this sequence has to be at least 400 bp long. The sequence should be in plain format (see Example). Nucleosome potential (NP) [6] was calculated in addition to the promoter recognition function. We have earlier demonstrated that the region [-50; +1] relative to the transcription start exhibits decreased mean values of NP [6].
The output gives the scores between 0 and 1, 1 is a best hit. The Confidence Level CF value lets choose preferred compromise between sensitivity and specificity. The default CF value is set at 0.95; reasonable values are in the range of 0.5 to 0.99. The CF value denotes the portion of correctly recognized sequences of training set. The higher CF allows to get the more hits.
5. Detailed Description of Operations to Run the Program
Output data presentation for any nucleotide sequence can be obtained as
follows:
1) Enter your nucleotide sequence into the sequence box.
2) Designate parameters :
Promoter type: TATA-containing or DPE-containing
promoter types should be selected;
Confidence Level (default 0.95);
Nucleosome potential calculation (default)
Output mode: graphic (default) or numeric
Strand: forward (default) or reverse.
3) Click over [SCAN] and wait for the end of the calculations.
6. Example of program execution
Click the button Example to see the example of program execution.
[1] Levitsky V.G., Katokhin A.V., Kolchanov N.A. // Computational
technologies (Novosibirsk), 2000, 5, 41-47.
[2] Levitskii V.G., Katokhin A.V. // Mol. Biol. (Mosk.), 2001,
35(6), 970-978.
[3] Arkhipova I.R. // Genetics, 1995, 139, 1359-1369.
[4] Kolchanov
N.A, Ignatieva E.V., Ananko E.A., Podkolodnaya O.A., Stepanenko I,L, Merkulova
T.I., Pozdnyakov M.A., Podkolodny N.L., Naumochkin A.N., Romashchenko A.G. // Nucleic Acids
Res., 2002, 30(1), 312-317.
[5] Kutach A.K., Kadonaga J.T. // 2000, Mol. Cell. Biol.,
20(13), 4754-4764.
[6] Levitsky V.G.,
Podkolodnaya O.A., Kolchanov N.A., Podkolodny N.L.
//
Bioinformatics, 17,
2001, 998-1010.
8. Concluding Notes and Communications
Any comments and questions may be mailed to author - Levitsky
last modified 07.07.02.
Author: Victor
Levitsky
Contributors: Sergey V. Lavryushev
Leader: Prof.
N.A. Kolchanov
© 1997-2001, IC&G SB RAS, Laboratory of Theoretical Genetics