Probabilistic Approach for DNA Compression

Studies in Computational IntelligenceSoft Computing for Data Mining Applications(2009)

引用 3|浏览3
暂无评分
摘要
Rapid advancements in research in the field of DNA sequence discovery has led to a vast range of compression algorithms. The number of bits required for storing four bases of any DNA sequence is two, but efficient algorithms have pushed this limit lower. With the constant decrease in prices of memory and communication channel bandwidth, one often doubts the need of such compression algorithms. The algorithm discussed in this chapter compresses the DNA sequence, and also allows one to generate finite length sequences, which can be used to find approximate pattern matches. DNA sequences are mainly of two types, Repetitive and Non-Repetitive. The compression technique used is meant for the non-repetitive parts of the sequence, where we make use of the fact that a DNA sequence consists of only 4 characters. The algorithm achieves bit/base ratio of 1.3-1.4(dependent on the database), but more importantly one of the stages of the algorithm can be used for efficient discovery of approximate patterns.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要