Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction

Soft Computing(2022)

引用 5|浏览4
暂无评分
摘要
One of the most important aspects for a deep interpretation of molecular biology is the precise annotation of protein functions. An overwhelming majority of proteins, across species, do not have sufficient supplementary information available, which causes them to stay uncharacterized. Contrastingly, all known proteins have one key piece of information available: their amino acid sequence. Therefore, for a wider applicability of algorithms, across different species proteins, researchers are motivated to make computational techniques that characterize proteins using their amino acid sequence. However, in case of computational techniques like deep learning algorithms, huge amount of labeled information is required to produce good results. The labeling process of data is time and resource consuming making labeled data scarce. Utilizing the characteristic to address the formerly mentioned issues of uncharacterized proteins and traditional deep learning algorithms, we propose a model called GOGAN, that operates on the amino acid sequence of a protein to predict its functions. Our proposed GOGAN model does not require any handcrafted features, rather it extracts automatically, all the required information from the input sequence. GOGAN model extracts features from the massively large unlabeled protein datasets. The term “Unlabeled data” is used for piece of information that have not been assigned labels to identify their characteristics or properties. The features extracted by GOGAN model can be utilized in other applications like gene variation analysis, gene expression analysis and gene regulation network detection. The proposed model is benchmarked on the Homo sapiens protein dataset extracted from the UniProt database. Experimental results show clear improvements in different evaluation metrics when compared with other methods. Overall, GOGAN achieves an F1 score of 72.1% with Hamming loss of 9.5%, using only the amino acid sequences of protein.
更多
查看译文
关键词
Protein function prediction,Sequence analysis,Deep learning,Generative adversarial networks,Gene ontology,Transfer learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要