In the past decade, a prolific growth of new data on molecular mechanisms of regulation of eukaryotic gene expression has taken place. Gene expression on transcriptional level is mainly regulated by sequence-specific interactions of transcription factors with their target sites (cis-elements) located in gene transcription regulatory regions.
At present, the information on regulatory sequences in eukaryotic genomes is vigorously accumulated in many specialized databases: EPD, TFD, TRANSFAC [1], TRRD [2], COMPEL [3] and in the sequence databases: EMBL and GeneBank.
There is a serious drawback in manipulating the databases: they are poorly linked.
To provide the comprehensive research on mechanisms controlling eukaryotic gene expression on the transcriptional level we have developed two databases: TRRD (Transcription Regulatory Region Database) for accumulation of the data on structure- function organization of gene regulatory regions, and COMPEL, the database on composite regulatory elements that contains contiguous or overlapping binding sites for different transcription factors from different regulatory pathways. In these databases, we collect the data concerning various features of gene expression regulation, gene classifications, structure of the gene regulatory regions, cis-elements, composite elements, promoters, and enhancers. The links between TRRD, COMPEL and TRANSFAC were recently set up.
We have developed the FUNSYTE [4] computer toolbox for analysis and recognition of regulatory genomic sequences. This toolbox contains software running under DOS and Windows. It provides: (i) Access to the databases on transcriptional regulation of eukaryotic genes, TRANSFAC, TRRD, COMPEL (in relation model), and to the sequence databases.
(ii) Extraction of the information from the databases and anchoring of the sequences in
the databases and preparing samples of the regulatory genomic sequences. (iii) Analysis of
regulatory genomic sequences with the software. The toolbox has:
(i) software for the analysis of inner structure of the regulatory genomic sequences
(oligonucleotide context features; information measures; correlation of base-in-position
frequences; local site alignment; and calculation of DNA conformational parameters);
(ii) software for the development of recognition methods for regulatory genomic sequences
(constructing consensuses, recognition groups of homologues, nucleotide and
oligonucleotide weight matrices; subsampling by clasterization procedures; and developing
recognition methods by means of pattern recognition methods: perceptron. Fisher
discriminate, and SITE VIDEO [5]); (iii) software for applying these methods for
identification of the regulatory sequences in newly discovered nucleotide sequences from
eukaryotic genomes (search for potential binding sites for transcription factors; search
for potential composite elements; search for potential MAR sites and nucleosome binding
sites; search for potential promoter sequences and calculation of transcription regulation
potential).