Site activity predicting
Within the limits of the linear-additive approximation, the activity of the site with the nucleotide sequence Sn is described by the equation:
where F0(Sn) is the basal activity of the given type site, which is determined by obligatory features representative for the given site type in the sequence Sn; {Xm}m=1,M are significant features of the site; and Fm is the impact of the facultative feature Xm into the site activity F.
Using the features selected for the site and applying the multiple linear regression, an optimisation of the canonical equation (1) is performed for constructing the method for site activity prediction. C-code of the program enabling the calculation of the site activity value for an arbitrary sequence is automatically generated by the system ACTIVITY according the optimised equation (1) and is stored in the KNOWLEDGE database.