WeChat Mini Program
Old Version Features

Model Performance and Interpretability of Semi-Supervised Generative Adversarial Networks to Predict Oncogenic Variants with Unlabeled Data

BMC Bioinformatics(2023)CCF CSCI 4区SCI 3区

Children’s Hospital of Philadelphia

Cited 4|Views51
Abstract
BACKGROUND:It remains an important challenge to predict the functional consequences or clinical impacts of genetic variants in human diseases, such as cancer. An increasing number of genetic variants in cancer have been discovered and documented in public databases such as COSMIC, but the vast majority of them have no functional or clinical annotations. Some databases, such as CiVIC are available with manual annotation of functional mutations, but the size of the database is small due to the use of human annotation. Since the unlabeled data (millions of variants) typically outnumber labeled data (thousands of variants), computational tools that take advantage of unlabeled data may improve prediction accuracy. RESULT:To leverage unlabeled data to predict functional importance of genetic variants, we introduced a method using semi-supervised generative adversarial networks (SGAN), incorporating features from both labeled and unlabeled data. Our SGAN model incorporated features from clinical guidelines and predictive scores from other computational tools. We also performed comparative analysis to study factors that influence prediction accuracy, such as using different algorithms, types of features, and training sample size, to provide more insights into variant prioritization. We found that SGAN can achieve competitive performances with small labeled training samples by incorporating unlabeled samples, which is a unique advantage compared to traditional machine learning methods. We also found that manually curated samples can achieve a more stable predictive performance than publicly available datasets. CONCLUSIONS:By incorporating much larger samples of unlabeled data, the SGAN method can improve the ability to detect novel oncogenic variants, compared to other machine-learning algorithms that use only labeled datasets. SGAN can be potentially used to predict the pathogenicity of more complex variants such as structural variants or non-coding variants, with the availability of more training samples and informative features.
More
Translated text
Key words
Generative adversarial networks,Variants annotation,Variants interpretation,Machine learning,Deep learning,Somatic variants
PDF
Bibtex
AI Read Science
AI Summary
AI Summary is the key point extracted automatically understanding the full text of the paper, including the background, methods, results, conclusions, icons and other key content, so that you can get the outline of the paper at a glance.
Example
Background
Key content
Introduction
Methods
Results
Related work
Fund
Key content
  • Pretraining has recently greatly promoted the development of natural language processing (NLP)
  • We show that M6 outperforms the baselines in multimodal downstream tasks, and the large M6 with 10 parameters can reach a better performance
  • We propose a method called M6 that is able to process information of multiple modalities and perform both single-modal and cross-modal understanding and generation
  • The model is scaled to large model with 10 billion parameters with sophisticated deployment, and the 10 -parameter M6-large is the largest pretrained model in Chinese
  • Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities We will continue the pretraining of extremely large models by increasing data to explore the limit of its performance
Try using models to generate summary,it takes about 60s
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper

要点】:本研究提出了一种半监督生成对抗网络(SGAN)方法,结合了标注与未标注数据,以预测癌症相关遗传变异的功能重要性,其独特优势在于能够有效利用大量未标注数据,提升检测新型致癌变异的能力。

方法】:研究采用的方法是半监督生成对抗网络(SGAN),该网络整合了来自临床指南和其它计算工具的标注数据特征以及未标注数据特征。

实验】:实验中,作者比较了不同算法、特征类型和训练样本大小等因素对预测准确性的影响,使用的数据集包括COSMIC和CiVIC等,结果显示,即使在小的标注训练样本情况下,SGAN通过结合未标注样本仍能达到竞争力表现,且手动注释的样本比公开可用的数据集能提供更稳定的预测性能。