Home

Statistical features of the sites

The weighted concentrations of mono-, di, three-, and tetranucleotides different for the site sequences are considered as the statistical features of the nucleotide context of the site. For the sequence S=s1...si...sL of the length L, the weighted concentration of the oligonucleotide Z=z1...zj...zm of the length m is estimated by the equation

where 1£ m£ L; d Z(sisi+1...si+m-1) is the function denoting the presence "1" or absence "0" of the oligonucleotide Z in the i-th position of the sequence S; siÎ {A, T, G, C}; zjÎ {A, T, G, C, W=A/T, R=A/G, M=A/C, K=T/G, Y=T/C, S=G/C, B=T/G/C, V=A/G/C, H=A/T/C, D=A/T/G, N=A/T/G/C}; w(i) - is the function of position’s significance (0£ w(i)£ 1), which permits to account that different oligonucleotides provide the most considerable impact if they are located in different site positions. A total of 180 weight functions w(i) is utilized in the system ACTIVITY. The examples of weight functions are stored in the database WEIGHT.