Improved accuracy of somatic variant detection from high throughput sequencing (HTS) data via site-specific noise estimation

E. Ignatova,M. Ivanov,V. Yakushina, V. Mileyko

ANNALS OF ONCOLOGY(2022)

引用 0|浏览0
暂无评分
摘要
BackgroundAccurate detection of subclonal somatic mutations in HTS data is often essential for appropriate treatment administration in cancer, especially referring to mutations associated with therapy resistance. Current approaches for data analysis are universal for library preparation kits and based on statistical models used to discriminate signal from noise based on alternative and reference allele counts at specific positions. Meanwhile previous studies demonstrated that noise is site-specific and varies based on library preparation/sequencing protocols.MethodsWe performed analysis of 438 datasets sequenced employing Ion Torrent platform previously analysed with Ion Torrent Variant Caller (ITVC). We used set of 833 unique hotspot variants across 47 oncogenes to calculate site-specific noise in each sample following beta distribution fitting. We employed a poisson beta test to discriminate signal from noise and call variants.ResultsWithin analysed hotspots, 277 variants were called with ITVC and 285 - with in-house software. The number of variants found by ITVC, but did not exceed site-specific noise level was 25. Among these, 14 were called by ITVC with alternative allele count of 5 or less and, thus, represents putative artefacts. There were 33 variants found with in-house software but missed with ITVC including two BRAF mutations across two lung adenocarcinoma (LUAD) patients, FGFR3 mutation in LUAD patient, two KRAS mutations in colorectal cancer (CRC) patient and 11 PIK3CA mutations across diverse cancer types. Majority of these variants were found in sites with deep coverage (median 2109, range 325-14216) and at subclonal allele frequency (AF) (median 1.4%, range 0.3-1.8%). One CRC patient with potential subclonal KRAS mutation missed in ITVC analysis results received anti-EGFR therapy based on the medical history. Archival tissue was not available for reanalysis, though he experienced disease progression in 3 month after anti-EGFR therapy initiation.ConclusionsIn conclusion, employing site-specific noise levels in HTS data analysis workflows allows to improve specificity and limit of detection to increase rate of subclonal mutation calling.Legal entity responsible for the studyThe authors.FundingHas not received any funding.DisclosureM. Ivanov, V. Yakushina, V. Mileyko: Financial Interests, Personal, Full or part-time Employment: Atlas oncologyDiagnostics. All other authors have declared no conflicts of interest. BackgroundAccurate detection of subclonal somatic mutations in HTS data is often essential for appropriate treatment administration in cancer, especially referring to mutations associated with therapy resistance. Current approaches for data analysis are universal for library preparation kits and based on statistical models used to discriminate signal from noise based on alternative and reference allele counts at specific positions. Meanwhile previous studies demonstrated that noise is site-specific and varies based on library preparation/sequencing protocols. Accurate detection of subclonal somatic mutations in HTS data is often essential for appropriate treatment administration in cancer, especially referring to mutations associated with therapy resistance. Current approaches for data analysis are universal for library preparation kits and based on statistical models used to discriminate signal from noise based on alternative and reference allele counts at specific positions. Meanwhile previous studies demonstrated that noise is site-specific and varies based on library preparation/sequencing protocols. MethodsWe performed analysis of 438 datasets sequenced employing Ion Torrent platform previously analysed with Ion Torrent Variant Caller (ITVC). We used set of 833 unique hotspot variants across 47 oncogenes to calculate site-specific noise in each sample following beta distribution fitting. We employed a poisson beta test to discriminate signal from noise and call variants. We performed analysis of 438 datasets sequenced employing Ion Torrent platform previously analysed with Ion Torrent Variant Caller (ITVC). We used set of 833 unique hotspot variants across 47 oncogenes to calculate site-specific noise in each sample following beta distribution fitting. We employed a poisson beta test to discriminate signal from noise and call variants. ResultsWithin analysed hotspots, 277 variants were called with ITVC and 285 - with in-house software. The number of variants found by ITVC, but did not exceed site-specific noise level was 25. Among these, 14 were called by ITVC with alternative allele count of 5 or less and, thus, represents putative artefacts. There were 33 variants found with in-house software but missed with ITVC including two BRAF mutations across two lung adenocarcinoma (LUAD) patients, FGFR3 mutation in LUAD patient, two KRAS mutations in colorectal cancer (CRC) patient and 11 PIK3CA mutations across diverse cancer types. Majority of these variants were found in sites with deep coverage (median 2109, range 325-14216) and at subclonal allele frequency (AF) (median 1.4%, range 0.3-1.8%). One CRC patient with potential subclonal KRAS mutation missed in ITVC analysis results received anti-EGFR therapy based on the medical history. Archival tissue was not available for reanalysis, though he experienced disease progression in 3 month after anti-EGFR therapy initiation. Within analysed hotspots, 277 variants were called with ITVC and 285 - with in-house software. The number of variants found by ITVC, but did not exceed site-specific noise level was 25. Among these, 14 were called by ITVC with alternative allele count of 5 or less and, thus, represents putative artefacts. There were 33 variants found with in-house software but missed with ITVC including two BRAF mutations across two lung adenocarcinoma (LUAD) patients, FGFR3 mutation in LUAD patient, two KRAS mutations in colorectal cancer (CRC) patient and 11 PIK3CA mutations across diverse cancer types. Majority of these variants were found in sites with deep coverage (median 2109, range 325-14216) and at subclonal allele frequency (AF) (median 1.4%, range 0.3-1.8%). One CRC patient with potential subclonal KRAS mutation missed in ITVC analysis results received anti-EGFR therapy based on the medical history. Archival tissue was not available for reanalysis, though he experienced disease progression in 3 month after anti-EGFR therapy initiation. ConclusionsIn conclusion, employing site-specific noise levels in HTS data analysis workflows allows to improve specificity and limit of detection to increase rate of subclonal mutation calling. In conclusion, employing site-specific noise levels in HTS data analysis workflows allows to improve specificity and limit of detection to increase rate of subclonal mutation calling.
更多
查看译文
关键词
somatic variant detection,improved accuracy,site-specific
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要