crasp_logo

ANALYSIS OF PAIR POSITIONAL

CORRELATIONS

[Tutorial page]

 

Integral characteristics analysis      Data Examples     Related papers     Help Info      Feedback


Task formulation

To consider multiple alignment of protein sequences of the "homeodomain"  family.
To determine pairs and groups of protein positions that are substituted co-adaptively with respect to isoelectric point values of amino acids.

Preliminary step

Open new browser window of the pairwise correlation analysis CRASP page. This will be your CRASP package working window. Use it in parallel with this tutorial window. Open new browser window at Result example page to retreive the data from test analysis by the example of homeodomain analysis.
Note. In this example, the robustmness of correlation coefficients was additionally estimated. This estimation has revealed that a part of significant correlation coefficients is unstable, hence, they were excluded from consideration. That is why the results of clusterisation of positions shown in 'Result example page' differ from results obtained without accounting of robustness.
The description of analysis, methods and algorithms could be obtained here.

Step 1. Input the sequence data

Enter the sequence alignment data in FASTA format into the 'Sequence data' field or download the sequence data from file by using dialog window 'Load from file'. Click the button 'ON' and input the file name into the text field.
input.png (19333 bytes)
Your actions with the working window:
  • Load the page with the homeodomain sequence alignment in result example window. Mark all the sequence data and copy it in clipboard.
  • Return to your working window. Set 'Load from screen' button ON and paste the sequence data into the text-box below.

Step 2. Input calculation parameters

Select the parameters of calculation.
input_par1.gif (4407 bytes)
Select physico-chemical residue's characteristics from the 'AminoAcid quantity' menu, which contains 36  properties.
Important note: this option is valid if 'AAindex number'  field contains zero value.
input_par2.gif (9996 bytes)
You may select one of more than 400 characteristics from AAIndex database (see details). Type the database entry number.
input_par3.gif (1216 bytes)
Select the type of calculated matrix (see details).
input_par4.gif (2889 bytes)
Specify the variability threshold (number of different amino-acid types in alignment column) in order to exclude low-variable protein positions.  
input_par5.gif (1225 bytes)
Two parameters are introduced to modify the notation of alignment position in output data. Specify in the field 'Selected sequence #' which sequence of the alignment should be a reference sequence. Specify in the field 'First AA number'  the ordinal number of the first position  in the reference sequence.
input_par6.gif (1473 bytes)
Your actions with the working example window:
Set the following calculation parameters:
  • AminoAcid Quantity: Isoelectric point;
  • AAindex number: 0;
  • Type of matrix: partial correlation;
  • Variability threshold: 5;
  • Selected sequence: 1;
  • First amino acid number: 2.

Step 2. Input weighting data

Sometimes it is necessary to apply data weighting. CRASP package realises several standard schemes of data weighting. Also, a user may enter other weights for all sequences. Data weighting is set in the fields below.
wt_input.png (19412 bytes)
Current version of CRASP applies several weighting methods. Select one of them or the option without weighting 'Off'. By choosing 'User defined' weights, enter weight values for each sequence that should be divided by symbol-separator (by default - ;). You may introduce your own symbol-separator, specified in the text-box 'Separator'. By using the weights by Altschul et al. and Felsenstein, enter the phylogenetic tree in *.ph format or load it from file. Details are here.
wt_options.gif (3738 bytes)
Your actions with the working window:
  • Select the weighting method:
    • Felsenstein
  • Go to the page with homeodomain phylogenetic tree in result example window. Mark the data and copy in clipboard.
  • Return to your working window. Set [Load from screen] button ON and paste the data into the text-box.

Step 3. Output matrix data parameters

Choose the format of the output correlation matrix data and supplementary information.
out_options.gif (6674 bytes)
Correlation matrix output is possible in four formats. Choose one of them. In the text format, the matrix is represented as a table with numerical values. In HTML format, the table cells are coloured in accordance with significance level of correlation coefficient. In GIF format, the matrix is displayed as a colored diagram with "cartographic" palette (blue-green-red) and a diagram of significant pairs (see details)
out_options1.gif (3965 bytes)
Choose the significance level for correlation coefficient out of 4 possible modes (see details). These values will be listed in the resulting page. For HTML format of correlation matrix and 'Significant pairs' format, the significant coefficients will be marked by blue and red colors for negative and positive values, respectively.
out_options2.gif (3629 bytes)
Your actions with the working window:
  • Select 'Significant pairs'  format for correlation matrix output.
  • Select significance level of correlation coefficient '99.9%'.

Step 4. Additional output parameters

A seria of additional options enables to display GIF-image for sub-matrix of correlation coefficients корреляции for positions from significantly correlating clusters. If '0.' is entered in the field 'Clustering cut-off value', then the threshold equals to critical value of correlation coefficient (see details).
cl_opt.png (6471 bytes)
By displaying output results, it is possible to detect the regions of correlation matrix with prevailing number of significant correlation coefficients. these are square sub-matrices of user defined size. If the number of significant correlation coefficients in this square window exceeds some critical value ordered by user,  these regions are marked by blue.  This search could be done respectively the sign (positive, negative, or both) of correlation coefficient.
block_opt.png (11406 bytes)
Your actions with the working window:

In the field 'Clustering highly correlated position'

  • Set 'Show rearranged matrix' button ON
  • Set 'Clustering cut-off' value to '0'

In the field 'Detecting regions of high density of significant correlation'

  • Set window size to 5.
  • Choose 'Positive and negative' option
  • Set 'Significance level' as '95%'

Step 5. Running CRASP

To RUN CRASP click the button 'Execute':
run.png (6074 bytes)
Your actions with the working window:
  • click 'Execute'
The result page will be displayed automatically.