TRANSGENE: complex resource for optimization of transgene expression characteristics
Aim Background Program description Implementation Limitations Acknowledgement Contacts
TRANSGENE is a computational resource aimed to adjust foreign gene characteristics to optimize expression in plants. We plan to compose a complex resource allowing to: (i) optimize species-specific translation initiation and termination signals and synonymous codon content, (ii) exclude potential splicing and polyadenylation signals, (iii) select appropriate promoter and posttranscriptional control elements, and so on. Currently TRANSGENE is a pilot version containing some programs potentially useful for specialists in plant gene engineering. They allow user to optimize the usage of synonymous codons and provide information on the optimal contexts of translation initiation and termination signals. BackgroundIt is well known that expression of foreign genes in genetically modified organisms can be low. This can result from an occasional presence of sequence element similar to expression signals of a recipient organism (e.g., sites of splicing or polyadenylation) located within the protein coding region. It is also possible that transgene expression signals will be different from the signals typical for genes of a recipient organism (e.g., translation initiation and termination signals, frequency of synonymous codon, polyadenylation & splicing signals (if present)). Synonymous codon content. Despite the interrelationship between codon content and translation elongation rate was proved for some bacteria, Saccharomyces cerevisiae and Drosophila melanogaster, no strict evidence for such a phenomenon for higher plants and mammals was reported. However, it was frequently reported that an optimization of synonymous codon content could increase the transgene expression level. In these cases the codon contents of transgenes were adjusted to decrease the frequency of synonymous codons rarely occurred in genes of corresponding recipient organisms.We performed computational analysis of correlations between synonymous codon content and plant mRNA translation efficiency. For this purpose we isolated mRNAs of Lycopersicon esculentum, Solanum tuberosum and Zea mays frequently used for transgenesis. mRNAs were classified on High and Low expression groups by using different criteria reflecting translation efficiency (start codon context, protein synthesis level, etc). To reveal the correlations between codon content and gene expression rate we used approach proposed by us earlier (Likhoshvai and Matushkin, 2002). It was found that the samples of High and Low genes differ in synonymous codon content significantly. For example, for potato the difference in codon usage index (Likhoshvai and Matushkin, 2002) was 0.29 (p<0.019). Some examples of codons differently present in High and Low genes are given in the Table below.
Frequencies of some synonymous codons in High and Low expression potato mRNAs.
Codon preferences were species-specific and (in some cases) differed from the average codon frequencies. We proposed that this approach allows more correct revealing of a set of optimal codons and plan to calculate interrelationship between the synonymous codon frequencies and mRNA translation efficiency for different plant species. We have developed a program providing user with information on average codon frequencies (taken from CUTG (http://www.kazusa.or.jp/codon/)) for some plant species commonly used in gene engineering experiments (Zea mays, Solanum tuberosum, Lycopersicon esculentum, Nicotiana tabacum, Spinacia oleracea, Hordeum vulgare, Triticum aestivum, Daucus carota, Sorghum bicolor, Medicago sativa, Pisum sativum, Oryza sativa). In addition, the program contains data on synonymous codons which frequencies were significantly different in High and Low expression genes (currently, these data are available for tomato, maize and potato).
Translation initiation & termination signals. It is well known that nucleotide context of translation initiation site influences the recognition of a start AUG codon. Nucleotides in 3 and +4 positions around AUG are considered to be particular important (Kozak, 2002). An efficient context can be species-specific. It is also considered that nucleotide frequencies in the context positions correlate with its translational efficiencies. We isolated full-length cDNAs of Zea mays (962), Solanum tuberosum (383), Lycopersicon esculentum (499), Nicotiana tabacum (806), Spinacia oleracea (118), Hordeum vulgare (310), Triticum aestivum (622), Daucus carota (111), Sorghum bicolor (43), Medicago sativa (126), Pisum sativum (343), Oryza sativa (1569) and calculated nucleotide frequencies in 3, 2, and 1 context positions. This information can be accessed through corresponding program. We also plan to analyze the 3-context of start AUG codons to reveal significant features influencing translation initiation. In this case optimization will demand the adjustment of nucleotide context with conservation of amino acids occupying N-end positions of a transgenic protein. An efficiency of translation termination signal can also be different. The usage of nonsense codons (UAA, UGA and UAG) is a species-specific feature. It was reported that highly expressed genes of some organisms tend to use or avoid certain stop codons. It was also considered that nucleotide context of stop codon can influence termination efficiency. We calculated frequencies of different nonsense codons and determined the most frequent variants. This information can be accessed through corresponding program. We also plan to analyze the 5-context of stop codons to reveal significant features influencing translation termination. In this case optimization will demand the adjustment of nucleotide context with conservation of amino acids occupying C-end positions of a transgenic protein. Program descriptionProgram provides user with information on species-specific synonymous codon frequencies as well as additional data on preferable codons in some organisms. User may select codons for elimination and replacement to optimize synonymous codon content. User can also obtain information on preferable context of translation initiation and termination signals. Implementation
- Optimization of translation initiation signal: the program will report the most frequent context of start AUG codon for organism selected. Example: Organism: Daucus carota
Start codon context:
Current version: GCTATG
Optimal version: AAAATG - Optimization of translation termination signal: the program will report the most frequent context of stop codon for organism selected. Example: Organism: Daucus carota
Stop codon context:
Current version: TAGCTA
Optimal version: TAAATA - Optimization of synonymous codon content: the program provides user with a list of synonymous codons and possible variants for their replacement. Average codon frequencies are available. For some species synonymous codons preferably used in high expression mRNA sample are marked with red colour and synonymous codons occurred at a significantly lower frequency in a High expression mRNA sample are marked with blue colour. Use can choose suboptimal codons, select the synonymous codons from the list (right column) to replace them, and click RUN. This allows to adjust the scale of optimization. If the user needs to maximize expression, it is possible to click the button all best this will automatically replace all codons with the most frequent synonymous codon. The output will contain CDS sequences before and after optimization (changed codons will be marked) as well as amino acid sequence of corresponding protein. LimitationsCurrent version of program does not take into account any other transgene characteristics. According to the limiting stage conception highly expressed mRNA has to have characteristics providing efficient translation at all the expression stages (initiation, elongation, termination; mRNA cytoplasmic stability). Thus, if some mRNAs contain other negative signals (e.g., stable secondary structure within 5'-UTR, specific signals of destabilzation (like ARE) or signal-like sequence segments controlling gene expression in recipient organism in unexpected way) change of adjustment of synonymous codon content and translation initiation & termination signals will not result in increase of a translation rate. Also, codon content optimization can result in generation of new signal-like sequences (splicing sites, etc.) although the probability of these events is likely to be low. We plan to make further development of prediction program to take all these potential problems into account. AcknowledgementsThis work was supported by the RAS program (Dynamics of Plant, Animal, and Human Gene Pools), Russian Foundation for Basic Research (grant No. 05-04-48207), and FASI (The development of complex software components for computational modeling in postgenomic system biology). We thank SD RAS Complex Integration Program (No. 59) and Ministry of Industry, Sciences and Technologies of the Russian Federation (grant Sc.Sh.-2275.2003.4) for partial support.
Contacts
|