TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation
CoRR(2024)
摘要
Designing protein sequences with specific biological functions and structural
stability is crucial in biology and chemistry. Generative models already
demonstrated their capabilities for reliable protein design. However, previous
models are limited to the unconditional generation of protein sequences and
lack the controllable generation ability that is vital to biological tasks. In
this work, we propose TaxDiff, a taxonomic-guided diffusion model for
controllable protein sequence generation that combines biological species
information with the generative capabilities of diffusion models to generate
structurally stable proteins within the sequence space. Specifically, taxonomic
control information is inserted into each layer of the transformer block to
achieve fine-grained control. The combination of global and local attention
ensures the sequence consistency and structural foldability of
taxonomic-specific proteins. Extensive experiments demonstrate that TaxDiff can
consistently achieve better performance on multiple protein sequence generation
benchmarks in both taxonomic-guided controllable generation and unconditional
generation. Remarkably, the sequences generated by TaxDiff even surpass those
produced by direct-structure-generation models in terms of confidence based on
predicted structures and require only a quarter of the time of models based on
the diffusion model. The code for generating proteins and training new versions
of TaxDiff is available at:https://github.com/Linzy19/TaxDiff.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要