INTRODUCTION
Application of Single Nucleotide Polymorphism (SNP) analysis to the human genome is currently among the greatest challenges presented by the human genome sequence initiative. This novel research field permits exploration of the influence of specific sequence alterations on disease susceptibility, drug resistance/sensitivity, and ultimately health care. The number of experimentally detected SNPs is growing tremendously. Currently the HGMD database contains more than 10,000 SNPs that alter codon translation, more than 1000 that affect splice sites, and less than 200 that influence gene regulatory regions. The dbSNP, HGBASE, ALFRED, and OMIM databases have the similar ratio of various SNPs. Obviously, functional alteration of highly conserved codons and splice sites, resulting in alteration of protein structure and function, are detected more easily than less conserved regulatory regions such as promoters, enhancers, silencers, introns, etc. Recent experiments have shown that regulatory SNP's may be manifest in several ways, including (i) alteration of function of a site important for normal regulation; (ii) a difference in affinity of protein binding at such a site, or (iii) function of a site not normally participating in proper regulation. Thus, the influence of an SNP cannot be predicted reliably only by inspection of the local region for potential regulatory elements similar to those of known sequence.
Although the field of SNP analysis in regulatory genome regions is rather new, it is being developed against the experimental background presented in the databases TRRD, COMPEL, TRANSFAC, RegulonDB, EpoDB, ACTIVITY and others, which accumulate information not only about natural occurring site variants, but also about artificially constructed ones. Among these artificial variants, the site-directed and site-specific mutagenesis altering several nucleotides are more informative for the SNP analysis of regulatory DNA regions, rather than deletions, insertions, or hybrid constructs. Since not only the presence/absence of the protein binding site in regulatory region may affect disease penetration, but also quantitative alterations of binding efficiency (e.g., GATA-binding efficiency alterations cause delta-thalassemia), the data on sequence-activity relationships are informative for SNP analysis of regulatory regions.
From this perspective, our Web-resource rSNP_Guide integrates both experimental data treatment of SNPs and relevant experimental data on artificial mutagenesis.