HOW TO USE THE rSNP_Guide FOR ANALYSIS OF TRANSCRIPTION FACTOR BINDING TO TARGET SEQUENCE

The user first must have a target sequence and mutated variants of interest for the (+) and (-) strands. For example, to use rSNP_Guide for analysis of the human platelet glycoprotein Ibb gene, GpIbb , promoter allele of which (“-133C/G”) has been associated with Bernard-Soulier syndrome, four DNA sequences should be initially prepared, such as: the wild-type allele “WT” (+)-chain, 5’-tgtgctatCtgccgctg-3’, and (-)-chain, 5’-cagcggcaGatagcaca-3'; and, the mutant allele “-133C/G”, 5’-tgtgctatGtgccgctg-3’ and 5’-cagcggcaCatagcaca-3' (where nucleotide alterations are colored and underlined).

In the upper section of the user interface window, a TF is selected. When a TF is clicked on, a window appears for its corresponding site recognition. The user enters into the input box each sequence variant and for each the button “Execute” is clicked. Clicking brings up a screen with the graphical representation of the TF site recognition score. With a single dominant peak, the corresponding score on the left axis is entered in the proper box in the user interface. In our example, the DNA sequence, “5’-tgtgctatGtgccgctg-3’”, “-133C/G” allele, (+)-chain, was entered into the AP-1 binding site recognition program. For this sequence variant, the AP-1 site recognition Score profile has a single peak with the maximal value “0”, which is put into the proper box. In the case of multiple peaks, the user can choose a peak according to relative height or proximity to a sequence region most likely to influence binding. When all sequence variants of interest have been examined and the data entered in the user interface, the next TF is selected and the process repeated.

When all TFs of interest have been examined, the experimental data are entered in the bottom section of the user interface window. In this section, the user enters in line “DNA/protein Binding” data for each sequence variant, estimating the relative degree of protein binding on a scale of +1 (maximal) to –1 (minimal). In our example, it was experimentally determined that the gel mobility shift assay of the GpIbb gene allele “WT” contains a band corresponding to the DNA complex with unknown nuclear protein, whereas in case of “-133C/G” allele, this band is absent. There is an illustration how these experimental data are entered into the proper boxes of the field “DNA-protein binding”: i.e., the first allele, “WT”, was assigned the input value “+1”, and, the second allele, “-133C/G”, the value “0”.

Then the button “Calculate” should be clicked, following which the calculated values will appear in the window “Prediction”. A single TF may be predicted to bind. In our example, the site GATA was predicted to be a TF-site candidate responsible for Bernard-Soulier syndrome.

In the case that several TFs are predicted, the relevant matches and closeness of fit to the data are evident in the statistical analysis windows. Euclidean distance is calculated by comparison of the predicted alteration with the experimental data, and corresponding t-test values are shown for each TF. If no TF is predicted, the significance value (p=0,01, default) can be changed to p = 0.025, 0.05, or 0.1. Each section of the user interface is supported by the Help function, which explains the meaning or usefulness of each box. To assist the inexperienced user, sample reports are available to show how the rSNP_Tools have been used successfully. The database rSNP_Report accumulates the reports of the rSNP_Tools practical usage.