About ab initio human miRNA prediction

Ab initio human miRNA prediction

Program description: The program for ab initio human miRNA prediction in an arbitrary genomic sequence.

Biological task that could be solved: Identification of novel human miRNAs.

Data input: Into the text-box (red arrow #1 in Fig.1), enter or insert from the clipboard a genomic sequence to be analyzed. You may input this sequence in the fasta format or in a plain text format without the comment line. The size limit is 10 000 symbols. Use the alphabets ATGC(atgc) or AUGC(augc), any other symbol in the nucleotide sequence will give an error. Line foldings and spaces are ignored.

INPUT DATA
Figure 1. The main program window.

Program options: To predict miRNA/miRNA* duplex in the known miRNA precursor you need to input the pre-miRNA into the text-box (red arrow #1) and select "optimal miRNA pair" or "optimal and next suboptimal miRNA pairs" radio button (red arrow #2). Leave the other fields (red arrows #3-6) empty. The "optimal miRNA pair" option limits search to miRNA/miRNA* duplex with the highest score; the "optimal and next suboptimal miRNA pairs" limits search to the two top miRNA/miRNA* candidates with the highest scores. The result from the first option includes the result from the second option. If you need to predict miRNAs in an arbitrary genomic sequence you need additionally to adjust pre-miRNA prediction options (red arrows #3-6, see here).

Program execution: Start the program by clicking the button "Submit" (red arrow #7).

Data output: Program execution will bring up the resulting window with the data output.
They include the following (see the picture below):
• The number of the predicted pre-miRNA (red arrow #1);
• The start position of this pre-miRNA in the genomic sequence (red arrow #2);
• The pre-miRNA's stem-loop with marked miRNA/miRNA* duplex for the optimal miRNA/miRNA* duplex (red arrow #3) or for the best two variants (red arrows #3-4). The miRNAs are marked by the uppercase letters.
• The sequences of the predicted miRNAs.

INPUT DATA
Figure 2. The resulting window.

Evaluation of the human miRNA prediction:

Table 1. Evaluation of the human miRNAs prediction over the 5–fold cross validation. Table illustrates the percentages of predicted miRNAs that are located within 0–3 nucleotides from the start of the real mature miRNAs for the optimal miRNA pair and the best of the optimal and sub-optimal miRNA pairs. E(nt) is the average position deviation. The human mature miRNAs and precursors are from miRBase database, release 21.0.

Table 2. Evaluation of the human miRNAs prediction over the miR19-21 testing dataset. Table illustrates the percentages of predicted miRNAs that are located within 0–3 nucleotides from the start of the real mature miRNAs for the optimal miRNA pair and the best of the optimal and sub-optimal miRNA pairs. E(nt) is the average position deviation. The miR19-21 training set contains the human mature miRNAs and precursors from miRBase database, release 19.0. The evaluating set contains the new human mature miRNA and precursor sequences which were added in 20-21.0 release of the miRBase database..

Comparing the efficiency with other programs for miRNA prediction:

Table 3. Performance comparison between the proposed method and ProMiR. Two parameters were measured: the mean of the absolute distance between the predicted and true sites and the standard deviation (SD) The dataset contains the human mature miRNAs and precursors from miRBase database, release 2.2.

Table 4. Performance comparison between the proposed method and MatureBayes. The distance from the truth is the absolute distance between predicted start position of mature miRNA and that of actual mature miRNA. The training set contains the human and mouse mature miRNAs and precursors from miRBase database, release 10.1. The evaluating set contains new human and mouse mature miRNA and precursor sequences which were added in 11-14 release of miRBase database.

Table 5. Performance comparison between the proposed method, MiRmat and MaturePred. The distance from the truth is the absolute distance between predicted start position of mature miRNA and that of actual mature miRNA. The training set contains the human and mouse mature miRNAs and precursors from miRBase database, release 17.0. The evaluating set contains new human and mouse mature miRNA and precursor sequences which were added in 18 release of miRBase database.

RERERENCES:
MiRmat: He C, Li YX, Zhang G, Gu Z, Yang R, Li J, Lu ZJ, Zhou ZH, Zhang C, Wang J: MiRmat: mature microRNA sequence prediction. PLoS One 2012, 7(12):e51673.
maturePred: Ping Xuan, Maozu Guo, Yangchao Huang, Wenbin Li, Yufei Huang (2011) MaturePred: Efficient Identification of MicroRNAs within Novel Plant Pre-miRNAs. PLoS ONE, 6(11): e27422.
matureBayes: Gkirtzou K, Tsamardinos I, Tsakalides P, Poirazi P (2010) MatureBayes: A Probabilistic Algorithm for Identifying the Mature miRNA within Novel Precursors. PLoS ONE 5(8): e11843.
ProMiR: Nam, 2005. Nam J.W., Shin K.R., Han J., Lee Y., Kim V.N., Zhang B.T. 2005. Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Res. 33(11), 3570–3581.