Filterlap: Filtering False-Positive Mutation Calls Via A Label Propagation Framework

2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)(2019)

引用 0|浏览40
暂无评分
摘要
Benefiting from the recent advantages of genomic sequencing, detecting genomic mutations becomes a routine work in precise diagnoses and treatments for cancers. In clinical practices, many factors, such as tumor purity, clonal structure, etc., interfere the performance of calling mutations. The computational pipelines prefer to sensitively report the candidate calls, while a filter is applied for removing the false-positive calls. The existing filters rely on the whole genome/exome sequencing data, which can provide sufficient samples for training the filters. However, the gene-panel sequencing is more popular in clinical practices, but there is no practical filter for limited training samples. In light of this, we develop a semi-learning filter for gene-panel sequencing data, FilterLAP, which implemented via a label propagation framework. Given few labeled samples with a set of unlabeled ones, its basic idea is to predict the label information of unlabeled nodes from the label information of labeled nodes, and establishes a complete graph model by using the relationship between samples, by combining transductive inference with label propagation algorithm. For each node in the network, tags are propagated to adjacent nodes according to similarity and the probability distribution of similar nodes tends to be similar and can be divided into a class. We perform multiple sets of experiments on gene-panel sequencing data captured from Illumina platform. FilterLAP outperforms on both SNV and INDEL filtering, where the AUCs reach 0.90-0.97, and the average accuracies on overall mutation calls are over 90%. Comparing to GATK hard filters, FilterLAP present a 5% improvement on accuracy. These results demonstrate that the proposed method can better reduce the false positive mutation calls on gene-panel sequencing data. In addition, it is stable and efficient, which can be used as a practical tool for mutation call filtering for gene-panel sequencing data.
更多
查看译文
关键词
Cancer genomics, NGS data analysis, Gene-panel sequencing, Mutation Filter, transductive inference, Label propagation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要