Structural features of mRNA 5'UTRs of eukaryotic genes expressed at high and low levels

Vorobiev* D.G., Titov I.I., Kochetov A.V., Kolchanov N.A.

Institute of Cytology and Genetics, Lavrentyev Ave., 10, Novosibirsk, 630090, Russia

Keywords: eukaryotic mRNAs, translation efficiency, secondary structure, statistical analysis, Z-Scor.



Motivation: It was shown earlier (Kochetov et al., 1998), 5’-untranslated regions (5’UTRs) of mRNAs of high and low expression eukaryotic genes (HE and LE, respectively) differ by the context characteristics: leader sequences of HE mRNAs are shorter, they contain less false start codons, and they are more assymmetric by the content of the complementary nucleotides (G/C and A/U). These data enable us to suppose that 5’UTRs of HE genes are characterized by more stable secondary structure (SS).

Results: In the paper presented, this hypothesis was verified: HE mRNA 5’UTRs is likely to form more stable secondary structure in compare with low expression ones. It was also found that LE 5’UTRs have more stable SS than randomly generated sequences of the same length and base composition. This facts allows to assume that higher stability of LE 5’UTR secondary structure may play an important role in the control of expression of regulatory genes.


A large bulk of experimental data evidences that translation efficiencies of various eukaryotic mRNAs may vary within wide range (Ray et al., 1983; Kochetov and Shumny, 1998). The translation initiation, which is known to include the process of scanning along the 5’UTR by 40S ribosomal subunit, is strongly influenced by 5’UTR features. For example, it was shown experimentally that the hairpin-like structures might exert negative effect to the rate of 40S ribosomal subunit moving along mRNA. The extent of a hairpin negative effect on eukaryotic mRNA translation in vivo depends upon its stability and localization within the molecule (Kozak, 1994; Vega Laso et al., 1993).

Methods and algorithms

The 5'UTR sequences were extracted from the EMBL database. We have used only those genes, for which transcription start was detected experimentally, this fact being verified by annotating literature sources.

In order to predict secondary structure, we have applied the software program "Fitness" developed by us previously for prediction of RNA’s SS. This program is based on genetical algorithm described elsewhere (Titov, 2000). For calculation of SS energy, this program uses thermodynamical parameters from Turner compilation (Turner et al., 1988). For each sequence from the sample, we have calculated the energy E of the most stable SS and the energy Ehairpin of the most stable hairpin. To characterize the favor of nucleotide content during SS formation, we have calculated the following parameters: (1) G+C-content, (2) the measure of disbalance between complementary nucleotides G and C, calculated as


and (3) the weighted content of complementary pairs, or energy capacity given as


where PA, PU, PG, PC are the shares of corresponding nucleotides in a sequence. We have also calculated z-Score of the values E and Ehairpin:


where Enat is the energy of SS in a natural sequence, Erand – the energy of SS in a random sequence of the same length and nucleotide content.

The values of characteristics Ecapacity, E, Ehairpin, and G+C-content in the samples examined were found to be normally distributed (according to Kolmogorov-Smirnov test; data not shown). Statistical analysis was performed by the software package Statistica 4.5 (StatsoftTM). In order to compare the mean values of two Gaussian distributions, we have applied the Student’s criterion (t-test).


We have compiled the samples of 5’UTR for two groups of genes, namely, HE and LE genes. The group HE contains 92 genes, the expression products of which are produced in the cells in a large amount (e.g., actins, tubulins, HSPs, etc.). We based on the assumption that mRNAs, corresponding to these genes, should be effectively translated. The group of LE genes includes 50 genes referring to regulatory protein synthesis (e.g., transcription factors, growth factors, etc.). Their expression is under strict control (Chen and Shyu, 1995; Pahl and Baeuerle, 1996), whereas excessive production causes strong disorders. The samples were subdivided according to taxon ranging: monocot plant genes, dicot plant genes, and the genes of mammals.

The SS was predicted for the natural and random mRNAs leader sequences. If the length of a leader exceeds 250 nucleotides, then only first 250 nucleotides were taken into analysis.

The testing of the algorithm’ stability was performed on the set of randomly generated sequences with the equal content of A,T,G,C nucleotides in the interval ranging in length from 20 to 300 nucleotides. As was expected, the values of energies in optimal SS and their dispersions were linearly dependent upon the length of the sequence. Testing the dependency of energy from the G+C-content under the fixed length of the sequence has also proved the stability of the algorithm. The free energy of predicted SSs reduced with the growth of G+C-content.

Dependance of the secondary structure parameters from the length of 5’UTR. As is known, the mean values of the SS' energy linearly depends upon the length of a sequence (Fontana et. al., 1993). It occurred that in average, the leader of a LE gene (with the length of 236.6± 265.7 nucleotides) is about three fold longer than the leader of a HE gene (with the length of 87.3± 50.0 nucleotides).

Dependence of the secondary structure of leader mRNA on the 5'UTR contextual features. At the next step, we have tried to determine the impact of nucleotide content into stability of SS. We have compared the values of G+C-content, disbalance in G and C contents, and energy capacity Ecapacity in the groups of HE and LE genes. The results of this comparison are shown in Table 1. The data obtained make the evidence that in two taxonomic groups, in monocots and in mammals, the values of energy capacity of leaders in LE genes are significantly (p<0.05) higher than in HE genes. Therefore, nucleotide context of LE genes is more favorable for formation of the secondary structure.

Comparative analysis of the secondary structure parameters in 5’UTRs and random sequences. Since LE mRNA 5’UTR features potentiate formation of the stable secondary structure (they are longer and contain closer concentrations of complementary nucleotides) these factors could make the major impact in the difference in secondary structure stability between 5’UTRs of HE and LE mRNAs. However, apart from these parameters, SS stability can depend on the content of repeats. To take into account this factor we tried to eliminate the effects of nucleotide content and the length of a sequence. With this aim, for each natural sequence, we have calculated z-Score of the SS’ characteristics. The mean z-Score values calculated for the samples of HE and LE genes were compared to each other. The results of comparison are given in Table 1.

Table 1. Characteristics of nucleotide content and SS of the 5’UTRs of mRNA from the samples of HE and LE genes. The significance of the difference between the groups of HE and LE genes.













Volume of a sample









G+C–content, %

37.1± 7.7

35.9± 6.7

56.2± 8.8

56.4± 11.0

53.7± 12.5

62.3± 11.7

46.2± 13.1

49.4± 15.4

the difference is insignificant

the difference is insignificant


the difference is insignificant

Misbalance by G and C

0.30± 0.22

0.32± 0.23

0.35± 0.21

0.17± 0.1

0.21± 0.16

0.17± 0.11

0.28± 0.2

0.24± 0.19

Energy capacity

1.31± 0.23

1.28± 0.3

1.65± 0.21

1.92± 0.36

1.86± 0.43

2.15± 0.53

1.57± 0.4

1.72± 0.56

the difference is insignificant



the difference is insignificant

Energy z-Score of complete SS

-0.19± 1.22

-0.9± 1.43

0.15± 0.5

0.31± 1.02

0.26± 1.1

-0.26± 1.57

0.03± 1.11

-0.42± 1.46


the difference is insignificant

the difference is insignificant


Z-Score of the energy of the most stable hairpin

-0.12± 1.09

-1.24± 1.78

-0.46± 1.49

0.14± 0.71

0.28± 1.04

-0.02± 1.08

0.06± 1.03

-0.65± 1.58


the difference is insignificant

the difference is insignificant


Energy per nucleotide

-0.03± 0.06

-0.05± 0.05

-0.05± 0.09

-0.11± 0.05

-0.07± 0.07

-0.17± 0.08

-0.05± 0.06

-0.11± 0.08

the difference is insignificant




It was found that in the group of LE genes, the secondary structure is more stable in comparison with random sequences (mean z-Score value is < 0). The difference of mean z-Score values between HE and LE genes groups are significant (p<0.05) for the samples of monocots genes and for the united sample of all taxa.

Thus, the factor determining the difference in SS’ energy between HE and LE genes is the length of a leader sequence. However, the differences still occur if to consider the energy normalized per 5’UTR length. The main impact into these differences is produced by G+C–content in mammals; by misbalance in C and G nucleotides – in monocots. In dicots, the difference appear due to the properties of a sequence, which are not related to the total nucleotide content, but to the consequence of nucleotides.

An increased SS stability in the leader sequences of LE genes in comparison to random sequences was unexpected phenomenon. We tried to determine the putative cause of this fact. In each sequence, we made the search for the most stable hairpin without defects and with a loop. It occurred that the stem of such a hairpin is present in SS approximately in 60 % of cases (data not shown). As a consequence, the z-Score value of the SS’ energy significantly (R2=0.45, a <0.01) correlates to the z-Score value of the most stable hairpin. We have found that in the sample of LE genes of monocots and in the united sample of LE genes, z-Score values of the most stable hairpin significantly (p<0.05) less than in the corresponding samples of HE genes. Besides, by modulo, z-Score value of the most stable hairpin (equalling to -1.24 in monocots and to -0.65 in the united sample) is higher than the z-Score values of the total sample of SS sequences (-0.9 and -0.42, respectively). This means that the basic factor determining the general stabilization of SS in the group of LE genes compared to random sequences is the presence in the 5'UTR of inverted repeat, which forms the hairpin with increased stability.


Functionally active mRNA probably should be translated with with a certain efficiency, hence, the presence of "strong" negative signals is unlikely. Therefore, the parameters of 5’UTRs of mRNAs of LE genes should be adopted to support translation. From the other hand, the sample of HE genes is unsufficiently characterized. The situation is possible when we use the sequence of a gene referring to the multi-gene family, which makes the small impact into the total protein synthesis. The 5'UTR parameters for such a gene may differ from the values typical for HE genes. Hence, the difference between the samples of HE and LE genes could be little and be revealed only by statistical methods. This very situation we observe in the case considered.

Unexpectedly, the 5’UTRs of LE mRNAs form more stable SS than random sequences. The results of computational analysis showed that LE mRNA 5’UTRs form more stable secondary structure than HE ones. With this respect, our results may be interpreted as supporting the evidence that SS of leaders in LE genes has the functional significance. This functional significance may be realized through supporting translational activity of mRNAs at a low level and preventing from deleterious excessive production.


The he authors are grateful to Galina Orlova for the help in translation the manuscript. Alex Kochetov was supported by SD RAS grant for young scientists.


  1. Chen, C.-Y.A. and Shyu, A.-B. (1995) Trends Biochem. Sci., 20, 465-470.
  2. Fontana W., Konings D., Stadler P., Schuster P. (1993) Biopolymers, 33, 1389-1404.
  3. Kochetov, A.V., Ischenko, I.V., Vorobiev, D.G., Kel, A.E., Babenko, V.N., Kisselev, L.L., Kolchanov, N.A. (1998) Eukaryotic mRNAs encoding abundant and scarce proteins are statistically dissimilar in many structural features. FEBS Lett., 440, 351-355.
  4. Kochetov, A.V. and Shumny, V.K. (1998) Influence of the mRNA structure on the translation initiation process in plant cells. Advances In Current Biology (Russ), 118, 754-770.
  5. Kozak, M. (1994). Determinants of translational fidelity and efficiency in vertebrate mRNAs. Biochimie, 76, 815-821.
  6. Pahl, H.L. and Baeuerle, P.A. (1996) Control of gene expression by proteolysis. Curr. Opin. Cell Biol., 8, 340-347.
  7. Ray, B.K., Brendler, T.G., Adya, S., Daniels-McQeen, S., Miller, J.K., Hershey, J.W.B., Grifo, J.A., Merrick, W.C., Thach, R.E. (1983) Role of mRNA competition in regulating translation: further characterization of mRNA discriminatory initiation factors. Proc. Natl. Acad. Sci. USA, 80, 663-667.
  8. Titov I. I., Ivanisenko V. A., Kolchanov N. A. (2000) Fitness - a WWW-resource for RNA folding simulation based on genetic algorithm with local minimization. Computational Technologies, SB RAS, Novosibirsk, in press.
  9. Turner, D.H., Sugimoto, N., Freier, S.M. (1988) RNA structure prediction. Ann. Rev. Biophys. Biophys. Chem., 17, 167-192.
  10. Vega Laso, M. R., Zhu, D., Sagliocco, F., Brown, A. J., Tuite, M. F., McCarthy J.E. (1993) Inhibition of translational initiation in the yeast Sacharomyces cerevisiae as a function of the position and stability od hairpin. J. Biol. Chem., 268, 6453-6462.