Probabilistic Approach for DNA Compression
Studies in Computational IntelligenceSoft Computing for Data Mining Applications(2009)
摘要
Rapid advancements in research in the field of DNA sequence discovery has led to a vast range of compression algorithms. The
number of bits required for storing four bases of any DNA sequence is two, but efficient algorithms have pushed this limit
lower. With the constant decrease in prices of memory and communication channel bandwidth, one often doubts the need of such
compression algorithms. The algorithm discussed in this chapter compresses the DNA sequence, and also allows one to generate
finite length sequences, which can be used to find approximate pattern matches. DNA sequences are mainly of two types, Repetitive
and Non-Repetitive. The compression technique used is meant for the non-repetitive parts of the sequence, where we make use
of the fact that a DNA sequence consists of only 4 characters. The algorithm achieves bit/base ratio of 1.3-1.4(dependent
on the database), but more importantly one of the stages of the algorithm can be used for efficient discovery of approximate
patterns.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要