LIPT: A Reversible Lossless Text Transform to Improve Compression Performance

Data Compression Conference(2001)

引用 52|浏览4
暂无评分
摘要
Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (DMC), Prediction by Partial Matching (PPM), and Burrows-Wheeler Transform (BWT) based algorithms. We propose an alternative approach in this paper to develop a reversible transformation that can be applied to a source text that improves existing algorithm's ability to compress. The basic idea behind our approach is to encode every word in the input text file, which is also found in the English text dictionary that we are using, as a word in our transformed static dictionary. These transformed words give shorter length for most of the input words and also retain some context and redundancy. Thus we achieve some compression at the preprocessing stage as well as retain enough context and redundancy for the compression algorithms to give better results. Bzip2 with our proposed text transform, LIPT, gives 5.24% improvement in average BPC over Bzip2 without LIPT, and PPMD (a variant of PPM with order 5) with LIPT gives 4.46% improvement in average BPC over PPMD (with order 5) without LIPT, for a set of text files extracted from Calgary and Canterbury corpuses, and also from Project Gutenberg. Bzip2 with LIPT, although 79.12% slower than the original Bzip in compression time, achieves average BPC almost equal to that of original PPMD and is also 1.2% faster than the original PPMD in compression time.
更多
查看译文
关键词
dictionary based,lossless text transform,bzip2,improve compression performance,lipt,ppm 1. related work and background,reversible lossless text transform,compression algorithm,lossless compression,burrows wheeler transform
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要