Structural features of mRNA 5'UTRs of eukaryotic genes expressed at high and low levels
Vorobiev* D.G., Titov I.I., Kochetov A.V., Kolchanov N.A.
Institute of Cytology and Genetics, Lavrentyev Ave., 10, Novosibirsk, 630090, Russia
Keywords: eukaryotic mRNAs, translation efficiency, secondary structure, statistical analysis, Z-Scor.
*
e-mail: denis@bionet.nsc.ruAbstract.
Motivation: It was shown earlier (Kochetov et al., 1998), 5’-untranslated regions (5’UTRs) of mRNAs of high and low expression eukaryotic genes (HE and LE, respectively) differ by the context characteristics: leader sequences of HE mRNAs are shorter, they contain less false start codons, and they are more assymmetric by the content of the complementary nucleotides (G/C and A/U). These data enable us to suppose that 5’UTRs of HE genes are characterized by more stable secondary structure (SS).
Results: In the paper presented, this hypothesis was verified: HE mRNA 5’UTRs is likely to form more stable secondary structure in compare with low expression ones. It was also found that LE 5’UTRs have more stable SS than randomly generated sequences of the same length and base composition. This facts allows to assume that higher stability of LE 5’UTR secondary structure may play an important role in the control of expression of regulatory genes.
Introduction
A large bulk of experimental data evidences that translation efficiencies of various eukaryotic mRNAs may vary within wide range (Ray et al., 1983; Kochetov and Shumny, 1998). The translation initiation, which is known to include the process of scanning along the 5’UTR by 40S ribosomal subunit, is strongly influenced by 5’UTR features. For example, it was shown experimentally that the hairpin-like structures might exert negative effect to the rate of 40S ribosomal subunit moving along mRNA. The extent of a hairpin negative effect on eukaryotic mRNA translation in vivo depends upon its stability and localization within the molecule (Kozak, 1994; Vega Laso et al., 1993).
Methods and algorithms
The 5'UTR sequences were extracted from the EMBL database. We have used only those genes, for which transcription start was detected experimentally, this fact being verified by annotating literature sources.
In order to predict secondary structure, we have applied the software program "Fitness" developed by us previously for prediction of RNA’s SS. This program is based on genetical algorithm described elsewhere (Titov et.al., 2000). For calculation of SS energy, this program uses thermodynamical parameters from Turner compilation (Turner et al., 1988). For each sequence from the sample, we have calculated the energy E of the most stable SS and the energy Ehairpin of the most stable hairpin. To characterize the favor of nucleotide content during SS formation, we have calculated the following parameters: (1) G+C-content, (2) the measure of disbalance between complementary nucleotides G and C, calculated as
,
and (3) the weighted content of complementary pairs, or energy capacity given as
,
where PA, PU, PG, PC are the shares of corresponding nucleotides in a sequence. We have also calculated z-Score of the values E and Ehairpin:
,
where Enat is the energy of SS in a natural sequence, Erand – the energy of SS in a random sequence of the same length and nucleotide content.
The values of characteristics Ecapacity, E, Ehairpin, and G+C-content in the samples examined were found to be normally distributed (according to Kolmogorov-Smirnov test; data not shown). Statistical analysis was performed by the software package Statistica 4.5 (StatsoftTM). In order to compare the mean values of two Gaussian distributions, we have applied the Student’s criterion (t-test).
Results
We have compiled the samples of 5’UTR for two groups of genes, namely, HE and LE genes. The group HE contains 92 genes, the expression products of which are produced in the cells in a large amount (e.g., actins, tubulins, HSPs, etc.). We based on the assumption that mRNAs, corresponding to these genes, should be effectively translated. The group of LE genes includes 50 genes referring to regulatory protein synthesis (e.g., transcription factors, growth factors, etc.). Their expression is under strict control (Chen and Shyu, 1995; Pahl and Baeuerle, 1996), whereas excessive production causes strong disorders. The samples were subdivided according to taxon ranging: monocot plant genes, dicot plant genes, and the genes of mammals.
The SS was predicted for the natural and random mRNAs leader sequences. If the length of a leader exceeds 250 nucleotides, then only first 250 nucleotides were taken into analysis.
The testing of the algorithm’ stability was performed on the set of randomly generated sequences with the equal content of A,T,G,C nucleotides in the interval ranging in length from 20 to 300 nucleotides. As was expected, the values of energies in optimal SS and their dispersions were linearly dependent upon the length of the sequence. Testing the dependency of energy from the G+C-content under the fixed length of the sequence has also proved the stability of the algorithm. The free energy of predicted SSs reduced with the growth of G+C-content.
Dependance of the secondary structure parameters from the length of 5’UTR. As is known, the mean values of the SS' energy linearly depends upon the length of a sequence (Fontana et. al., 1993). It occurred that in average, the leader of a LE gene (with the length of 236.6± 265.7 nucleotides) is about three fold longer than the leader of a HE gene (with the length of 87.3± 50.0 nucleotides).
Dependence of the secondary structure of leader mRNA on the 5'UTR contextual features. At the next step, we have tried to determine the impact of nucleotide content into stability of SS. We have compared the values of G+C-content, disbalance in G and C contents, and energy capacity Ecapacity in the groups of HE and LE genes. The results of this comparison are shown in Table 1. The data obtained make the evidence that in two taxonomic groups, in monocots and in mammals, the values of energy capacity of leaders in LE genes are significantly (p<0.05) higher than in HE genes. Therefore, nucleotide context of LE genes is more favorable for formation of the secondary structure.
Comparative analysis of the secondary structure parameters in 5’UTRs and random sequences. Since LE mRNA 5’UTR features potentiate formation of the stable secondary structure (they are longer and contain closer concentrations of complementary nucleotides) these factors could make the major impact in the difference in secondary structure stability between 5’UTRs of HE and LE mRNAs. However, apart from these parameters, SS stability can depend on the content of repeats. To take into account this factor we tried to eliminate the effects of nucleotide content and the length of a sequence. With this aim, for each natural sequence, we have calculated z-Score of the SS’ characteristics. The mean z-Score values calculated for the samples of HE and LE genes were compared to each other. The results of comparison are given in Table 1.
Table 1
dicots |
monocots |
mammals |
total |
|||||
HE |
LE |
HE |
LE |
HE |
LE |
HE |
LE |
|
Volume of a sample |
44 |
22 |
15 |
11 |
33 |
17 |
92 |
50 |
G+C–content, % |
37.1± 7.7 |
35.9± 6.7 |
56.2± 8.8 |
56.4± 11.0 |
53.7± 12.5 |
62.3± 11.7 |
46.2± 13.1 |
49.4± 15.4 |
the difference is insignificant |
the difference is insignificant |
p<0.05 |
the difference is insignificant |
|||||
Misbalance by G and C |
0.30± 0.22 |
0.32± 0.23 |
0.35± 0.21 |
0.17± 0.1 |
0.21± 0.16 |
0.17± 0.11 |
0.28± 0.2 |
0.24± 0.19 |
Energy capacity |
1.31± 0.23 |
1.28± 0.3 |
1.65± 0.21 |
1.92± 0.36 |
1.86± 0.43 |
2.15± 0.53 |
1.57± 0.4 |
1.72± 0.56 |
the difference is insignificant |
p<0.05 |
p<0.05 |
the difference is insignificant |
|||||
Energy z-Score of complete SS |
-0.19± 1.22 |
-0.9± 1.43 |
0.15± 0.5 |
0.31± 1.02 |
0.26± 1.1 |
-0.26± 1.57 |
0.03± 1.11 |
-0.42± 1.46 |
p<0.05 |
the difference is insignificant |
the difference is insignificant |
p<0.05 |
|||||
Z-Score of the energy of the most stable hairpin |
-0.12± 1.09 |
-1.24± 1.78 |
-0.46± 1.49 |
0.14± 0.71 |
0.28± 1.04 |
-0.02± 1.08 |
0.06± 1.03 |
-0.65± 1.58 |
p<0.01 |
the difference is insignificant |
the difference is insignificant |
p<0.01 |
|||||
Energy per nucleotide |
-0.03± 0.06 |
-0.05± 0.05 |
-0.05± 0.09 |
-0.11± 0.05 |
-0.07± 0.07 |
-0.17± 0.08 |
-0.05± 0.06 |
-0.11± 0.08 |
the difference is insignificant |
p<0.01 |
p<0.01 |
p<0.01 |
Thus, the factor determining the difference in SS’ energy between HE and LE genes is the length of a leader sequence. However, the differences still occur if to consider the energy normalized per 5’UTR length. The main impact into these differences is produced by G+C–content in mammals; by misbalance in C and G nucleotides – in monocots. In dicots, the difference appear due to the properties of a sequence, which are not related to the total nucleotide content, but to the consequence of nucleotides.
An increased SS stability in the leader sequences of LE genes in comparison to random sequences was unexpected phenomenon. We tried to determine the putative cause of this fact. In each sequence, we made the search for the most stable hairpin without defects and with a loop. It occurred that the stem of such a hairpin is present in SS approximately in 60 % of cases (data not shown). As a consequence, the z-Score value of the SS’ energy significantly (R2=0.45, a <0.01) correlates to the z-Score value of the most stable hairpin. We have found that in the sample of LE genes of monocots and in the united sample of LE genes, z-Score values of the most stable hairpin significantly (p<0.05) less than in the corresponding samples of HE genes. Besides, by modulo, z-Score value of the most stable hairpin (equalling to -1.24 in monocots and to -0.65 in the united sample) is higher than the z-Score values of the total sample of SS sequences (-0.9 and -0.42, respectively). This means that the basic factor determining the general stabilization of SS in the group of LE genes compared to random sequences is the presence in the 5'UTR of inverted repeat, which forms the hairpin with increased stability.
Discussion
Functionally active mRNA probably should be translated with with a certain efficiency, hence, the presence of "strong" negative signals is unlikely. Therefore, the parameters of 5’UTRs of mRNAs of LE genes should be adopted to support translation. From the other hand, the sample of HE genes is unsufficiently characterized. The situation is possible when we use the sequence of a gene referring to the multi-gene family, which makes the small impact into the total protein synthesis. The 5'UTR parameters for such a gene may differ from the values typical for HE genes. Hence, the difference between the samples of HE and LE genes could be little and be revealed only by statistical methods. This very situation we observe in the case considered.
Unexpectedly, the 5’UTRs of LE mRNAs form more stable SS than random sequences. The results of computational analysis showed that LE mRNA 5’UTRs form more stable secondary structure than HE ones. With this respect, our results may be interpreted as supporting the evidence that SS of leaders in LE genes has the functional significance. This functional significance may be realized through supporting translational activity of mRNAs at a low level and preventing from deleterious excessive production.
Acknowledgements
The he authors are grateful to Galina Orlova for the help in translation the manuscript. Alex Kochetov was supported by SD RAS grant for young scientists.
References