Use of DFT Distance Metrics for Classification of SARS-CoV-2 Genomes

JOURNAL OF COMPUTATIONAL BIOLOGY(2022)

引用 2|浏览2
暂无评分
摘要
In this work, we investigate using Fourier coefficients (FCs) for capturing useful information about viral sequences in a computationally efficient and compact manner. Specifically, we extract geographic submission location from SARS-CoV-2 sequence headers submitted to the GISAID Initiative, calculate corresponding FCs, and use the FCs to classify these sequences according to geographic location. We show that the FCs serve as useful numerical summaries for sequences that allow manipulation, identification, and differentiation via classical mathematical and statistical methods that are not readily applicable for character strings. Further, we argue that subsets of the FCs may be usable for the same purposes, which results in a reduction in storage requirements. We conclude by offering extensions of the research and potential future directions for subsequent analyses, such as the use of other series transforms for discreetly indexed signals such as genomes.
更多
查看译文
关键词
alignment-free methods, Fourier transform, genomic sequences, supervised learning, visualization of high-dimensional data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要