The influence of marker number and sequencing depth on the ability to identify mismatch repair deficient tumours

James Law,Richard Gallon, Ethan D Teare, Ivan Santibanez Koref,Rachel Phelps,John Burn,Michael S. Jackson,Mauro Santibanez‐Koref

bioRxiv (Cold Spring Harbor Laboratory)(2023)

引用 0|浏览2
暂无评分
摘要
1. Abstract Analysis of somatic mutation patterns is widely used to infer exposure to exogenous and endogenous mutagenic influences. This raises the question of the amount of sequence data required to detect factors of interest. A common use of mutation pattern analysis is the identification of increased microsatellite instability to uncover mismatch repair (MMR) defects in tumours and normal tissues. Here we explore the effects of sequencing depth and the number of loci analysed on the ability to detect MMR deficiency using artificial neural networks and publicly available amplicon sequencing data from colorectal tumours on 24 short quasi monomorphic microsatellites (up to 12 bp in length, PMID 31471937) split in a training (99 samples) and a test set (95 samples). We show that, at a sequencing depth of 200, pairs mononucleotide repeats can achieve discrimination between MMR proficient and deficient colorectal tumours similar to that obtained with the full 24 marker panel, with accuracies above 97% and ROC AUCs in excess of 99% in the test set. Our results indicate that for short monomorphic microsatellites considering the length distribution of the different alleles at each locus, representing these distributions as two-dimensional structures and including convolutional layers in the network can facilitate discrimination between MMR deficient and proficient tumour material. They also indicate that, despite the limitations imposed by amplification, sequencing accuracy and the limited divergence time between the sequences from one locus, high depth sequencing can be used to identify MMR deficiency from a limited number of loci. However, they also suggest that, for a fixed total number of reads per sample, increasing sensitivity by increasing the number of targets is more efficient than by increasing per target sequencing depth. These results are of interest for screening large numbers of samples and for assessing the impact MMR deficiency in different areas of the genome.
更多
查看译文
关键词
marker number,sequencing,mismatch,repair
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要