pUGTdb: A comprehensive database of plant UDP-dependent glycosyltransferases.

Molecular plant(2023)

引用 7|浏览22
暂无评分
摘要
Plant UDP-dependent glycosyltransferases (UGTs), belonging to the carbohydrate-active enzyme glycosyltransferase 1 family (Louveau and Osbourn, 2019Louveau T. Osbourn A. The sweet side of plant-specialized metabolism.Cold Spring Harb. Perspect. Biol. 2019; 11: a034744https://doi.org/10.1101/cshperspect.a034744Crossref PubMed Scopus (30) Google Scholar), not only play important roles in adaptation to various environments (Cai et al., 2020Cai J. Jozwiak A. Holoidovsky L. Meijler M.M. Meir S. Rogachev I. Aharoni A. Glycosylation of N-Hydroxy-Pipecolic acid equilibrates between systemic acquired resistance response and plant growth.Mol. Plant. 2020; 14: 440-455https://doi.org/10.1016/j.molp.2020.12.018Abstract Full Text Full Text PDF PubMed Scopus (25) Google Scholar; Pastorczyk-Szlenkier and Bednarek, 2021Pastorczyk-Szlenkier M. Bednarek P. UGT76B1 controls the growth-immunity trade-off during systemic acquired resistance.Mol. Plant. 2021; 14: 544-546https://doi.org/10.1016/j.molp.2021.03.012Abstract Full Text Full Text PDF PubMed Scopus (3) Google Scholar) but also endow plant natural products with great pharmaceutical and ecological significance (Margolin et al., 2020Margolin E.A. Strasser R. Chapman R. Williamson A.L. Rybicki E.P. Meyers A.E. Engineering the plant secretory pathway for the production of next-generation pharmaceuticals.Trends Biotechnol. 2020; 38: 1034-1044https://doi.org/10.1016/j.tibtech.2020.03.004Abstract Full Text Full Text PDF PubMed Scopus (28) Google Scholar). In recent years, an increasing number of plant UGTs have been characterized to function in the biosynthesis of many bioactive compounds such as ginsenosides (Wei et al., 2015Wei W. Wang P. Wei Y. Liu Q. Yang C. Zhao G. Yue J. Yan X. Zhou Z. Characterization of panax ginseng UDP-glycosyltransferases catalyzing protopanaxatriol and biosyntheses of bioactive ginsenosides F1 and Rh1 in metabolically engineered yeasts.Mol. Plant. 2015; 8: 1412-1424Abstract Full Text Full Text PDF PubMed Scopus (120) Google Scholar), breviscapine (Liu et al., 2018Liu X. Cheng J. Zhang G. Ding W. Duan L. Yang J. Kui L. Cheng X. Ruan J. Fan W. et al.Engineering yeast for the production of breviscapine by genomic analysis and synthetic biology approaches.Nat. Commun. 2018; 9: 448https://doi.org/10.1038/s41467-018-02883-zCrossref PubMed Scopus (135) Google Scholar), and rubusoside (Xu et al., 2022Xu Y. Wang X. Zhang C. Zhou X. Xu X. Han L. Lv X. Liu Y. Liu S. Li J. et al.De novo biosynthesis of rubusoside and rebaudiosides in engineered yeasts.Nat. Commun. 2022; 13: 3040https://doi.org/10.1038/s41467-022-30826-2Crossref PubMed Scopus (17) Google Scholar). However, a majority of UGTs encoded in plant genomes still remain to be characterized. We constructed a comprehensive plant UGT database (pUGTdb, http://pugtdb.biodesign.ac.cn/) and then investigated the interaction mechanisms of substrates and sugar donors with the characterized UGTs. A web tool was also constructed for UGT virtual screening and sugar donor prediction of unknown UGTs. To obtain UGTs from unannotated genomes in the National Center for Biotechnology Information (NCBI) genome database, we developed a rapid annotation pipeline named GMind (Figure 1A) (Supplemental Methods 1–3). Briefly, we first obtained genomic regions of putative UGTs in plant genomes by blast using all annotated plant UGTs in the NCBI database as queries; second, the extracted genomic regions were independently de novo annotated by multiple gene annotation methods; third, the annotated UGTs were filtered by HMMER and evaluated using an accuracy score (Supplemental Methods 4 and 5); and finally, we combined the results from multiple methods and defined the optimal candidates as UGTs. Compared with the annotated UGTs in the NCBI genome database (Supplemental Method 6), we annotated an additional 28.5% UGTs (22.5% complete UGTs) (Supplemental Figure 1). Only 1% of the annotated UGTs in NCBI were missed by GMind due to extremely long introns (Supplemental Table 1; Supplemental Figure 2). In total, 110 702 UGTs were annotated by GMind from 574 unannotated plant genomes. In summary, a comprehensive plant UGT database (pUGTdb) was constructed by integrating UGTs from transcriptomes annotations (Supplemental Method 7), GMind annotations, NCBI genome annotations, and the known UGTs from other resources including the Carbohydrate-Active Enzymes Database and the Glycosyltransferase Database (Figure 1B). The pUGTdb contains 285 293 UGTs, nearly nine times more than those in the NCBI genome database (Supplemental Figure 3), according to the published literature and database collections; however, only 0.1% UGTs (381 characterized UGTs) have been functionally studied until now (Figure 1C; Supplemental Method 10). Gene family classification facilitates the functional study of unknown UGTs. Based on sequence similarity and phylogenetic relationship, all plant UGTs were divided into 90 UGT families (Supplemental Methods 8 and 9). About 78% of plant UGTs belong to 20 UGT families, which contain at least one of the functionally characterized UGTs (Figure 1C). Furthermore, we observed an interestingly positive correlation between the protein’s identity and the substrate’s molecular similarity by comparatively analyzing the characterized UGTs and the corresponding substrates (Supplemental Figure 4; R2 = 0.25). To further investigate the underlying mechanism of substrate identification in UGTs, we predicted the protein structures of all characterized UGTs by a very fast AlphaFold2 pipeline (Supplemental Method 11) and analyzed the structural features of substrates and their binding pockets (Supplemental Method 12). We found a much stronger positive correlation between the volumes of catalytic pockets and substrates (Figure 1D; R2 = 0.5174). Therefore, we provided a tool for plant UGT virtual screening by combining the substrate similarity, catalytic pocket’s volume, and substrate binding affinity (Supplemental Method 12; Supplemental Figure 5). To test the potential of our tool, we selected three recently reported UGTs (phloretin 4-O-glucosyltransferase [Xiong et al., 2022Xiong R.-L. Zhang J.-Z. Liu X.-Y. Deng J.-Q. Zhu T.-T. Ni R. Tan H. Sheng J.-Z. Lou H.-X. Cheng A.-X. Identification and characterization of two bibenzyl glycosyltransferases from the liverwort marchantia polymorpha.Antioxidants. 2022; 11: 735Crossref PubMed Scopus (2) Google Scholar], quercetin 3-O-rhamnosyltransferase [Ren et al., 2022Ren C. Guo Y. Xie L. Zhao Z. Xing M. Cao Y. Liu Y. Lin J. Grierson D. Zhang B. et al.Identification of UDP-rhamnosyltransferases and UDP-galactosyltransferase involved in flavonol glycosylation in Morella rubra.Horticulture Research. 2022; 9: uhac138https://doi.org/10.1093/hr/uhac138Crossref PubMed Scopus (3) Google Scholar], and resveratrol 3-O-glucosyltransferase [Liu et al., 2021Liu T. Liu Y. Li L. Liu X. Guo Z. Cheng J. Zhu X. Lu L. Zhang J. Fan G. et al.De novo biosynthesis of polydatin in Saccharomyces cerevisiae.J. Agric. Food Chem. 2021; 69: 5917-5925https://doi.org/10.1021/acs.jafc.1c01557Crossref PubMed Scopus (4) Google Scholar]) as examples. When using the substrates and all UGTs from the corresponding species as inputs, all the reported UGTs ranked as the top 10 candidates (Supplemental Figure 6). Overall, the configurations of substrate binding pockets provide a feasible clue for the prediction and identification of substrates for unknown UGTs, which will greatly save labor for experimental screening. The functional determination of unknown UGTs includes both substrate and sugar donor. To investigate the underlying mechanism of sugar donor identification in UGTs, we performed a comprehensive structural analysis for the characterized UGTs against their sugar donors (Supplemental Method 13; Supplemental Figure 10, and 11). We found 28 positions of residues that contribute to UDP–sugar binding and stabilization by hydrogen bond interactions (Supplemental Figure 12). Besides 19 positions interacting with UDP, the remaining nine positions were observed to surround sugars from the top, middle, and bottom (Figures 1E and 1F). At the top region, residues in positions 20, 141, and 142 mainly form hydrogen bonds with hydroxyl/carboxyl groups at C6 in three six-carbon sugars including UDP–glucose (UDG), UDP–galactose, and UDP–glucuronic acid (UGA). In the middle, residues in positions 372–375 mainly form hydrogen bonds with hydroxyl groups at C2 and C3 of sugars. Residues at positions 374 and 375 show interactions for all six kinds of sugar donors since they parallelly locate with the carbon skeleton of sugars. Finally, from the bottom, residues at positions 353 and 378 sustain the skeleton of sugars and mainly form hydrogen bonds with the hydroxyl C4–OH(S) of UDP-xylose, UDP-arabinose, UDG, and UGA, but they are rarely with C4–OH(R) of UDP–galactose and UDP-rhamnose because of the upward orientations of the hydroxyl groups. Therefore, our results indicated that residues from these three regions play different roles in stabilizing and identifying sugar donors and that it is possible to predict sugar donors for unknown UGTs according to amino acid composition in these regions. We developed a functional prediction tool for the UDP–sugar donors by integrating these key positions of residues mentioned above. In short, UGTs with known functions were embedded as feature vectors based on the key residues, and a regression model was built to fit those feature vectors to the sugar donor types (Supplemental Method 13; Supplemental Figure 13). Using the characterized UGTs as testing datasets, the tool obtained an average accuracy of 89.6%. The accuracy on the glucose donor reaches 95.5%, but the accuracy on the other sugar donors is only 63.4%, which might be mainly caused by the fact that only ∼17% of the characterized UGTs were not involved in glucose. Furthermore, we tested the ability of our tool for glucose donor prediction by designing mutants to use UDG instead of UGA as the sugar donor on two characterized UGTs (apigenin 7-O-glucuronosyltransferase [Liu et al., 2018Liu X. Cheng J. Zhang G. Ding W. Duan L. Yang J. Kui L. Cheng X. Ruan J. Fan W. et al.Engineering yeast for the production of breviscapine by genomic analysis and synthetic biology approaches.Nat. Commun. 2018; 9: 448https://doi.org/10.1038/s41467-018-02883-zCrossref PubMed Scopus (135) Google Scholar] and flavonoid 7-O-glucuronosyltransferase [Ono et al., 2010Ono E. Ruike M. Iwashita T. Nomoto K. Fukui Y. Co-pigmentation and flavonoid glycosyltransferases in blue Veronica persica flowers.Phytochemistry. 2010; 71: 726-735https://doi.org/10.1016/j.phytochem.2010.02.008Crossref PubMed Scopus (38) Google Scholar]). Our experiment verified that the mutants on both characterized UGTs remarkably increased their activities for using UDG as a sugar donor (Supplemental Figure 14; Supplemental Methods 14–18). At last, we predicted the sugar donors for all unknown UGTs in our database: UDG accounted for 94.7%, and other sugar donors accounted for about 5.3% (Supplemental Table 8), indicating the dominant role of UDG as a sugar donor in nature. In summary, we developed a genomic annotating pipeline for UGT mining (GMind) and constructed a comprehensive plant UGT database (Supplemental Method 19 and Supplemental Figure 15), which contains 285 293 plant UGTs from 2858 plants. We also investigated the underlying mechanism of substrate and sugar donor identification by UGTs and developed a web tool for UGT virtual screening and sugar donor prediction of unknown UGTs. The comprehensive platform of plant UGTs will be a useful data source for the community. This work was supported by grants from the National Key R&D Program of China (no. 2019YFA0905700), Tianjin Synthetic Biotechnology Innovation Capacity Improvement Project (TSBICIP-CXRC-015), China Postdoctoral Science Foundation (No. 2019M661032), and National Natural Science Foundation of China (No. 31901026).
更多
查看译文
关键词
plant,udp-dependent
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要