Utility
The difference between the distributions X_{k,a,b}{S} and X_{k,a,b}{R} of conformational or physico-chemical features is tested for significance using four statistical criteria (Lehman, 1959): (1) the difference between the means of X_{k,a,b}{S} and X_{k,a,b}{R}; (2) the difference between the variances of X_{k,a,b}{S} and X_{k,a,b}{R}; (3) the difference between the densities of X_{k,a,b}{S} and X_{k,a,b}{R}; and (4) the difference between the ranges of X_{k,a,b}{S} and X_{k,a,b}{R}. As the criteria used imply that the tested distributions were Gaussian, two additional criteria were used to confirm it: (5) for X_{k,a,b}{S} and (6) for X_{k,a,b}{R}. To reduce the adverse effects of heterogeneity, these criteria were tested on 100 subsets {S_{n}} and {R_{n}} (1£ n£ 100), each randomly retrieved from {S} and {R}, respectively. In terms of fuzzy logic (Zadeh, 1965), if the difference between the distributions X_{k,a,b}{S_{n}} and X_{k,a,b}{R_{n}} is significant according to the m-th criterion (1£ m£ 6), then a positive between 0 and 1 is assigned to the partial utility U_{mn}(X_{k,a,b}); otherwise, a negative between -1 and 0. Hence, the total number of partial utilities is 6´ 100=600 {U_{mn}(X_{k,a,b})}. In terms of decision making theory (Fishburn, 1970), the generalised difference between X_{k,a,b}{S} and X_{k,a,b}{R} is the mean of the 600 partial utilities:
Thus calculated utility value U(X_{k,a,b}) is the integral characteristic of the discriminating ability of X_{k,a,b}. It has two important features (Fishburn, 1970):
U(X_{k,a,b})<0 |
implies that “X_{k,a,b} falls short of significance”; |
(3) |
U(X_{k,a,b})>U(X_{q,c,d})³ 0 |
implies that “X_{k,a,b} is better discriminating Site/Random than X_{q,c,d}” |
(4) |
Note that the highest value of U(X_{k,a,b}) pinpoints to the best, in terms of utility, B-DNA feature X_{k,a,b} of the site. Each conformational feature X_{k,a,b} with U(X_{k,a,b})<0 is discarded by decision (3). If any two features X_{k,a,b} and X_{q,c,d} correlate, the feature X_{q,c,d} with the lowest value of U(X_{q,c,d}) is discarded by decision (4).