CodonBERT: Large Language Models for mRNA design and optimization

Sizhen Li, Saeed Moayedpour, Ruijiang Li, Michael Bailey,Saleh Riahi, Milad Miladi, Jacob Miner, Dinghai Zheng, Jun Wang, Akshay Balsubramani,Khang Tran,Minnie Zacharia, Monica Wu,Xiaobo Gu, Ryan Clinton, Carla Asquith, Joseph Skalesk,Lianne Boeglin, Sudha Chivukula,Anusha Dias, Fernando Ulloa Montoya, Vikram Agarwal, Ziv Bar-Joseph,Sven Jager

biorxiv(2023)

引用 0|浏览9
暂无评分
摘要
mRNA based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods including on a new flu vaccine dataset. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
mrna design,large language models,codonbert
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要