SELEX_DB is being developed at the Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science. SELEX_DB is designed for accumulation of experimental data on selected affinity-enriched sequences from different combinatorial libraries.

During the last ten years, the novel technologies have been designed for identification of high affinity DNA and RNA sequences (ligands) to a wide variety of different targets, including nucleic acid binding proteins, peptides, and small organic molecules. Among these technologies are the following: SELEX (Systematic Evolution of Ligands by Exponential enrichment), SAAB (Selected And Amplified Binding site imprint assay), REPSA (Restriction Endonuclease Protection Selection and Amplification), CASTing (Cyclical Amplification and Selection of Targets) and other binding site selection procedures. In general, genetic analysis in vitro of the structural and functional properties of many nucleic acids was enhanced by the availability of methods for the amplification of nucleic acid sequences.

Given current advance in sequencing whole genomes, combinatorial methods will be important in the next generation of studies, thus making the bridge between raw sequence data and actual biological processes. At present, enormous starting libraries are used in different SELEX processes and contain up to 1014–1015 sequences. Naturally, this information needs to be collected into public databases available via the Internet.

The site sequences listed within the SELEX_DB may be used as independent control data in developing both novel methods for functional site recognition within gene sequences and recognition under concrete experimental conditions documented in the database. Additionally, information on functional site sequences and experimental conditions for their determination is useful for planning novel experiments applying SELEX technology.



The first release, SELEX_DB 1.0, contains 105 entries with description of selected DNA/RNA sequences from 85 original papers. Mostly, SELEX_DB contains the sequences of different proteins binding to DNA, they comprise up to 85% of the database content. The binding sites for proteins causing various disorders, such as B-cell acute lymphoblastic leukemias (S00G0002), breast cancer (S00G0038), or myeloid leukemia (S00G0047) are described. Among RNA binding proteins are those influencing splice site selection (for example, S00G0023), proteins influencing post-transcriptional regulation (S00G0041), or recombination (S00G0043).

Among the organisms, for which the target sequences were selected are human, mouse, chicken, Drosophila, rat, rabbit, some plants, and others.


A SELEX_DB entry corresponds to a single experiment.

The entry description is based on 27 fields: AC, an accession number of an experiment; ID, identifier; DA, date of creation; DT, date of the last update; FV, release number; MN, name of an entry; CR, name of an annotator (linked to SCIENTIST database); NF, name of a ligand; OS, organism; OC, taxon; TE, templates for amplification; EX, type of an experiment; EC, experimental conditions; RF, reference to the literature source (link to SELEX_BIB database); KW, keywords; NS, sequence quantity; AA, aligned sequences as they are represented in the original paper; WA, WT, WG, WC, weight impacts of the letters A,T,G, and C, respectively, at functionally important positions; CN, consensus; DR, links to the other database entries if any; WW, a link to recognition tools; NM, number of sequences in the set; SQ, sequences. The field CC contains different annotator’s comments concerning the functional role of a factor or peculiarities of consensus evaluation.


To activate SELEX_DB information, the supplementary database SELEX_TOOLS has been developed by analogy to technology applied by the authors in the databases MATRIX, ACTIVITY , and B-DNA-FEATURES.

For a fixed functional site, by using the nucleotide occurrence matrix stored within four SELEX_DB fields WA, WT, WG, and WC, the C-encoded procedures recognizing this site were generated and stored within SELEX_TOOLS database accompanying SELEX_DB. For example, the SELEX_DB entry S00J0008 describing the randomized/selected DNA's binding the transcription factor YY-1 contains the field "DR SELEX_TOOLS; S00j008a". By clicking this field, the SELEX_TOOLS entry S00J0008a is loaded. Then the C-procedures for recognition of transcription factor binding site YY-1 with the core "CCAT" are seen in the window. Besides, the entry S00J0008a contains the field "WW RECOGNITION", which links to the Web-based tools implementing these C-procedures for an arbitrary DNA sequence.

The other way of SELEX_DB activation is the usage of SRS-formatted keywords. For example, the SELEX_DB entry S00J0008 contains the field “KW YY1, globin gene, DNA-binding (Medline,GenBank)”. The search of terms contained in this field may be automatically provided in MEDLINE or GeneBank databases.

Thus, SELEX_DB is a database, the Web-tools for genomic sequence analysis, and the query access to MEDLINE and GeneBank databases for extracting related papers and sequences. That is why SELEX_DB is called an "activated database".

SELEX_DB publications

Julia V. Ponomarenko, Galina V. Orlova, Mikhail P. Ponomarenko, Sergey V. Lavryushev, Anatoly S. Frolov, Svetlana V. Zybova, Nikolay A. Kolchanov. SELEX_DB: an activated database on selected randomized DNA/RNA sequences addressed to genomic sequence annotation. Nucleic Acids Research, 2000, 28, 205-208.   "Full-text (in PDF-format)"



