CRASP |
Integral linear characteristics analysis Help information and parameters description |
This part of CRASP package allows to estimate the constancy of certain protein physico-chemical characteristic. This characteristic is defined as linear combination of physico-chemical amino acid property values at protein positions.
Calculation parameters
At present the CRASP package allows to input aligned sequences in FASTA format only. Sequence should be represented in a standard 20-letter code and symbol '-' for gaps. Allowed symbols are:
ARNDCQEGHILKMFPSTWYV-. Both upper case and lower case letters are accepted. Sequences should be aligned. In case of unaligned sequences length of shortest sequence is assigned to alignment length, all the other sequences are cut. Examples of input sequence data for several protein families are presented here.Amino acid physico-chemical characteristics
These characteristics of amino acids reflect physical and chemical interactions between residues. At present CRASP package contains the data on 36 physico-chemical scales. User is allowed to select one of the characteristics at this step of analysis.
This field contains the ordinal number of amino acid
property from the AAINDEX database. This value could be from 1 to 434 (see list of
indices here).
If this value is zero, Amino
acid physico-chemical characteristic is selected from menu (see above).
AAINDEX database reference: Tomii, K. and Kanehisa, M. (1996) Analysis of amino acid indices and mutation matrices for sequence
comparison and structure prediction of proteins. Protein Eng. 9,
27-36 . Internet : http://www.genome.ad.jp/dbget/aaindex.html.
This parameter allows to setup number of simulated randomized samples to estimate the conservation of particular integral characteristic. For the characteristic including many positions and for huge sequence samples, the calculations are time-consuming, so we recommend to use small values for this parameter.
Weighting sequence data
It is known that over-representation of some homologous sequences in the sample may cause biases in statistical estimates. To avoid such biases, different schemes of sequence weighting were proposed. These approaches reduce the weights of over-represented sequences and imply that "true" distribution of sequences in the sequence space is expected to be homogeneous. The software package CRASP enables to apply different schemes of data weighting. The option is controlled by Weighting type parameter.
Four types of weighting are allowed:
OFF (default) |
All the sequence weights are equal to 1 |
Felsenstein |
The method is suggested by Felsenstein (1985) and its calculation is based on evolutionary tree data. If you these data are avaliable, this weighting scheme is recommended |
Vingron & Argos |
The method suggested by Vingron and Argos (1985) |
User defined |
The weight coefficients are introduced by user |
Felsenstein weighting. In this field, input the phylogenetic tree in (*.ph) format. This format is supported by many tree-inferring packages such as CLUSTALW, Phylip,Treecon, etc. If you use the tree, be sure of correspondence between sequence identificators in sequence data and in tree data. In tcase of mismatch, the CRASP program exits with an error. However, the CRASP package allows for the sequence ID in sequence data to be longer than in tree format, for example, AP1_CHICK_156 (sequence input) and AP1_CHICK (tree input). See an example of input data for this weighting scheme
User defined weighting. In this field, the values for each sequence should be input in a separate line (default format). However, these weights can be introduced in a free format. In this case, define separator-symbols (e.g., ;,: ). Specify these symbols in Separator field.
Integral protein characteristics description
This characteristic is defined as linear combination of selected physico-chemical amino acid property values at protein positions. Four characteristics are available for analysis simultaneously. Define characteristic name and description.
For convenience, you may assign specific names as a character string up to 50 symbols for integral characteristics.
To setup integral physico-chemical characteristics, use the format:
x1(npos1); x2(npos2); ...; xn(nposn);
xi |
Arbitrary numbers in a floating point format |
nposi |
Corresponding positions of alignment enumerated in an arbitrary form (using ',' and '-' symbols), for example: (1-3,4,5,30-44) denotes positions from 1 to 5 and from 30 to 44. |
Examples of characteristics:
Net value of a certain amino acid characteristic at the alignment positions 6-8 :
F1 | Net value | 1.(6-8); |
Projection of alpha helical momentum (for helix positions 1 to 5 ):
F1 | Helix Momentum | 1.(1); -0.17(2); -0.94(3); 0.5(4); 0.77(5); |
where cos(0° )=1; cos(100° )=-0.17; cos(200° )=-.94; ....
Output parameters
Two output data modes are allowed:
Text (default) |
ASCII text file (with HTML header) |
Graphic |
Plots in GIF-format |
ASCII-text format (convenient for further data analysis and their graphical representation by statistical packages (Excel, Statistica, etc.).
Output data include:
Graphic output displays different types of plots, characterizing the distribution of integral characteristics both in original and simulated data samples. To select a plot, click the appropriate check-box.
These plots represent Fi values distribution in original protein sequence sample (as a histogram) |
|
These plots represent (Fi, Fj) values mutual distribution in original protein sequence sample. Each point represents a pair of (Fi, Fj) characteristic values for particular protein in the sample (scatterplots) |
|
These plots represent the Fi values dispersion distribution in simulated random samples (as a histogram) |
|
These plots represent the ratio Dexp(F)/D(F) values distribution in simulated random samples (as a histogram) |