What does WebProAnalyst do?
· WebProAnalyst does web-accessible analysis for scanning the quantitative structure-activity relationships in protein families.
· WebProAnalyst searches for a sequence region, whose substitutions are correlated with variations in the activities of a homologous protein set, the so-called activity modulating sites.
· WebProAnalyst allows users to search for the key physicochemical characteristics of the sites that affect the changes in protein activities.
· WebProAnalyst enables the building of multiple linear regression and neural networks models that relate these characteristics to protein activities.
How does WebProAnalyst work?
· WebProAnalyst implements multiple linear regression analysis, back propagation neural networks and the Structure-Activity Correlation/Determination Coefficient (SACC/SADC). A back propagation neural network is implemented as a two-layered network, one layer as input, the other as output (Rumelhart et al, 1986). WebProAnalyst uses alignment of amino acid sequences and data on protein activity (pK, Km, ED50, among others). The input data are the numerical values for the physicochemical characteristics of a site in the multiple alignment given by a slide window. The output data are the predicted activity values. The current version of WebProAnalyst handles a single activity for a single protein.
· The SACC/SADC may be defined as an estimate of the strongest multiple correlation between the physicochemical characteristics of a site in a multiple alignment and protein activities. The SACC/SADC coefficient makes possible the calculation of the possible highest correlation achievable for the quantitative relationship between the physicochemical properties of sites and protein activities. The SACC/SADC is a convenient means for an arrangement of positions by their functional significance.
· WebProAnalyst outputs a list of multiple alignment positions, the respective correlation values, also regression analysis parameters for the relationships between the amino acid physicochemical characteristics at these positions and the protein activity values.
Using the WebProAnalyst Interface
Input for WebProAnalyst
Step 1. Pick a multiple alignment to scan
Paste or type a sequence multiple alignment of a queried protein set into the text window. The sequences should be in FASTA format. You can upload file by clicking the "Browse" button.
Step2. Specify protein activity values
The activity values should be numerical. A single activity value should be indicated for each protein. A question mark (?) stands instead of a numerical value for proteins of unknown activities. Using the built models, the program provides automated calculation of the activities of the marked (?) proteins. The numbers can be separated by any character except digit, point, minus, ?, Å, and å. Example, 1.5 -10 ? 45.66 etc. It is useful to take the logarithm of the activity values, if they considerably differ from each other, tenfold or more.
Step 3. Specify the analysis type
· SADC/SACC
· Multiple linear regression analysis
· Neural networks
In the case of multiple linear regression analysis and neural networks, choose either one-factor or two-factor analysis. The factors are the physicochemical property sites used in the regression model or neural networks.
Step 4. Specify the site properties
Factor 1
The selected physicochemical properties are used, one by one, as factor 1 in multiple linear regression analysis and neural networks.
Factor 2
It is specified only in the case when two-factor analysis checked. Selected physicochemical properties are used, one by one, as factor 2 in multiple linear regression analysis and neural networks.
In the case of one factor analysis, WebProAnalyst will test, one by one, all the physicochemical properties from Factor 1 window. Each of the properties will be used for building the linear regression or neural networks model.
In the case of two-factor analysis, WebProAnalyst tests one by one, all the combinations of the Factor 1 and Factor 2 pairs.
Step 5. Specify the queried fragment position in multiple alignment
Specify the start and end of the queried fragment in the multiple alignment.
Step 6. Specify the slide window length
The slide window identifies the sites in the queried fragment in the multiple alignment. The physicochemical properties are calculated and a model of the quantitative structure-activity relationships is built for each site.
Step7. Submit query
Click "Scan" button at the bottom of the page.
WebProAnalyst Output
SADC/SACC is checked
Output contains list of the SACC and SADC values
Example,
Site 2-5
SACC=0.533 SADC=0.730
Site 3-6
SACC=0.976 SADC=0.988
Site 4-7
SACC=0.976 SADC=0.988
Site 5-8
SACC=0.942 SADC=0.971
Site 6-9
SACC=0.956 SADC=0.978
Site 7-10
SACC=0.940 SADC=0.969
Site 8-11
SACC=0.940 SADC=0.969
Site 9-12
SACC=0.941 SADC=0.970
Site 10-13
SACC=0.929 SADC=0.964
Site 11-14
SACC=0.923 SADC=0.961
Site 12-15
SACC=0.923 SADC=0.961
Checked are the “multiple linear regression analysis ” and “Output of correlations only”
The output contains the following information.
The line that contains:
· The start and the end of the site in the multiple alignment.
· The correlation coefficients. The correlation coefficient is multiple when analysis is two-factor, the correlation coefficient between the physicochemical properties and activities when it is one-factor. The multiple correlation coefficient is calculated as the paired correlation between the observed and predicted activities.
· The F-value and the P-value for the Fisher’s statistics in the regression equation.
· SACC.
Example,
>81) Site 3-6
The correlation coefficient R=-0.972
F=153.397 P=0.000 SACC=0.976
Then, the list of physicochemical properties is generated.
Example,
Properties:
1) The alpha-helix periodicity of Hopp-Woods
Hydrophilicity
Checked are the “multiple linear regression analysis ”and “Output of detailed statistical information about every sequence”
Additionally, information about every sequence Sequence name, Site sequence, Site properties, Measured and predicted activities, is generated. The results also include the regression equation, 95% confidence intervals of the regression equation coefficients, t-test statistics and P-value for H0 in each equation coefficient.
Example,
>1) Site 3-6 The correlation coefficient R=-0.972 F=153.397 P=0.000 SACC=0.976
Properties:
1) The alpha-helix periodicity of Hopp-Woods Hydrophilicity
[Seq name ]
[Sequence] [Properties]
[Activity measured Activity predicted]
[seq1
] [RKFH]
[ 1.56 ]
[1.4100
1.0494]
[seq2
] [RLFH]
[ 1.36 ]
[1.6400
1.7048]
[seq3
] [RKFK]
[ 1.61 ]
[0.7400
0.8861]
[seq4
] [RKFH]
[ 1.56 ]
[1.0900
1.0494]
[seq5
] [RKFH]
[ 1.56 ]
[1.0900
1.0494]
[seq6
] [RKFH]
[ 1.56 ]
[0.9500
1.0494]
[seq7
] [RKFH]
[ 1.56 ]
[0.9100
1.0494]
[seq8
] [RKFH]
[ 1.56 ]
[1.0600
1.0494]
[seq9
] [RKFH]
[ 1.56 ]
[1.0600
1.0494]
[seq10 ]
[RLFK] [
1.99 ] [-0.5100
-0.3584]
[seq11
] [RLFK]
[ 1.99 ]
[-0.2200 -0.3584]
[seq12
] [SSPG]
[ 0.10 ]
[ ? 5.7973]
Regression equation:
Y=-3.243*X1+ 6.110
95% confidence intervals for the
regression coefficients
K1
K0
Lower -3.835
5.141
Upper
-2.651 7.079
t-test statistics for H0
12.385
14.263
P-values for H0
0.007
0.004
F-test statistic for H0 = 153.397288
P-value
for H0 = 0.000001
“Neural networks” is
checked
As in the case of the multiple linear regression analysis.
“Output of correlations only” is checked
Example,
>81) Site 3-6
Multiple coefficient correlation R=0.973 SACC=0.976
Properties:
1) Alpha-helix periodicity of Hopp-Woods Hydrophilicity
“Output of detailed statistical information about every sequence” is checked
Example,
>1) Site 3-6
Multiple correlation coefficient R=0.973
SACC=0.976
Properties:
1) The alpha-helix periodicity of Hopp-Woods Hydrophilicity
[Seq name
] [Sequence]
[Properties] [Activity measured Activity
predicted]
[seq1
] [RKFH]
[ 1.56 ]
[1.4100
1.0669]
[seq2
] [RLFH]
[ 1.36 ]
[1.6400
1.5833]
[seq3
] [RKFK]
[ 1.61 ]
[0.7400
0.8856]
[seq4
] [RKFH]
[ 1.56 ]
[1.0900
1.0669]
[seq5
] [RKFH]
[ 1.56 ]
[1.0900
1.0669]
[seq6
] [RKFH]
[ 1.56 ]
[0.9500
1.0669]
[seq7
] [RKFH]
[ 1.56 ]
[0.9100
1.0669]
[seq8
] [RKFH]
[ 1.56 ]
[1.0600
1.0669]
[seq9
] [RKFH]
[ 1.56 ]
[1.0600
1.0669]
[seq10
] [RLFK]
[ 1.99 ]
[-0.5100 -0.3931]
[seq11
] [RLFK]
[ 1.99 ]
[-0.2200 -0.3931]
[seq12
] [SSPG]
[ 0.10 ]
[ ? 1.9085]
Comments and questions are welcome to Vladimir Ivanisenko