RF4Del: A Random Forest approach for accurate deletion detection

biorxiv(2022)

引用 0|浏览2
暂无评分
摘要
Efficiently detecting genomic structural variants (SVs) is a key step to grasp the "missing heritability" underlying complex traits involved in major evolutionary processes such as speciation, phenotypic plasticity, and adaptive responses. Yet, the SV-based genotype/trait association studies are still largely overlooked mainly due to the lack of reliable detection methods. Here, we present a random forest (RF) method for accurate deletion identification: RF4Del. By relying on the analysis of the mapping profiles, data already available in most sequencing projects, RF4Del can easily and quickly call deletions. Several classic and ensemble learning strategies were carefully evaluated using proper benchmark data. To detect deletions, RF4Del was trained and tested on simulated data from the model species Drosophila melanogaster. The model consists of 13 features extracted from a mapping file. We show that RF4Del outperforms established SV callers (DELLY, Pindel) with higher overall performance (F1-score > 0.75; 6x-12x sequence coverage) and is less affected by low sequencing coverage and deletion size variations. RF4Del could learn from a compilation of sequence patterns linked to a given SV. Such models can then be combined to form a learning system able to detect all types of SVs in a given genome, beyond the one used in our study. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
random forest approach,rf4del,deletion,detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要