Introduction
Transcription is controlled by a set of functional DNA sites realising specific functions through the interaction with the relevant proteins - transcription factors (Latchman, 1995). The experimental data on DNA sequences and functions of thousands transcription factor binding sites are currently accumulated within various databases. The best-known databases in this intensively developed area are EMBL Data Library (Stoesser et al., 1998); TRANSFAC (Heinemeyer et al., 1999), TRRD (Kolchanov et al., 1999), EpoDB (Stoeckert et al., 1999), EPD (Perier et al., 1999), RegulonDB (Salgado et al., 1999). All of them are initial source of information for developing the methods for transcription factor binding sites recognition.
Since the probabilistic information content was first applied to recognising the functional DNA sites (Schneider et al., 1986), and, then, Berg and von Hippel (1987) have developed the statistical-mechanical theory on the proteins binding DNA sites, the weight matrix-based approach remains dominant up to now. Hundreds of the weight matrix variants for transcription factor binding sites recognition have been calculated and successfully applied. For example, the commonly accepted MatInspector (Quandt et al., 1995) is currently manipulating by over 200 weight matrices of this type. Among the recently developed tools for transcription factor binding sites recognition, Object-Oriented Transcription Factors Database OOTFD (Ghosh, 1998), PromFD (Chen et al., 1997), TESS (Stoeckert et al., 1999), the TRANSFAC-based expert system (Heinemeyer et al., 1999), DPInteract (Robison et al., 1998), and RegulonDB (Salgado et al., 1999) are widely used. Nevertheless, recent evaluation of computer tools for transcription factor binding sites recognition within genomic DNA (Roulet et al., 1998) indicated that the current site recognition tools are typically better correlate with each other rather than with DNA/protein affinity magnitudes. It follows that the problem of developing the methods for transcription factor binding site recognition in DNA sequences outcoming from probabilistic and statistical mechanics backgrounds based on the site evolution origin paradigm should deal with molecular mechanisms of DNA/protein interaction too.
That is why we believe that the sequence-dependent conformational and physico-chemical properties of B-helical DNA may be complementary to the probabilistic properties predominantly considered up to now. Essentially, the increasing volume of experimental data supports the evidence that transcription factor binding sites functioning is to a large extent determined by conformational features of DNA (Starr et al. 1995; Meierhans et al., 1997).
Dickerson and Drew were the first to fall upon sequence-conformation relationships in DNA and to obtain the X-ray structures of dodecamers (Dickerson and Drew, 1981). These structures laid grounds for Calladine’s rules for predicting the conformation of B-DNA on the basis of DNA sequence (Calladine and Drew, 1984). A growing number of crystallographic and physico-chemical studies during the last decade propose the local non-similarity of conformational and physico-chemical DNA properties and their dependence on nucleotide context (Suzuki et al., 1997, Grzeskowiak, 1996; Frank et al., 1997). What are the relationships between sequence-dependent, conformational, physico-chemical DNA properties, and specificity of site recognition?
Numerous experimental studies of B-DNA helix enable to determine a considerable number of mean values of conformational and physico-chemical parameters of di- and trinucleotides (Gartenberg and Crothers, 1988; Sugimoto et al., 1996; Suzuki et al., 1996; Bolshoy et al., 1991; Gorin et al., 1995; Gotoh and Tagashira, 1981; Satchwell et al., 1986). The methods based on dependence of DNA conformational features upon the local nucleotide context are applied both for analysis of transcription factor binding sites (Karas et al., 1996) and extended promoter regions (Baldi et al., 1998).
It has been previously shown that conformational DNA features may be significant for a site functioning (Ponomarenko et al., 1996, 1997a,b). A computer system ACTIVITY has been developed for analysis of functional site activity (Ponomarenko et al., 1997c), demonstrating the application of conformational and physico-chemical parameters of B-DNA dinucleotides for the site activity prediction based on the sequence content (Kolchanov et al., 1998).
Pursuing these studies, we suggest an approach for revealing significant conformational and physico-chemical features of functional sites implemented in the activated database B-DNA-VIDEO. We have applied B-DNA-VIDEO to study the sets of various transcription factor binding sites and demonstrated that the binding sites of all transcription factors analysed are characterised by a specific set of significant conformational and physico-chemical DNA features. In addition, we have demonstrated how to apply these features for transcription factor binding site recognition. See the list of publications.