A syllable-based method for Vietnamese text compression

IMCOM(2016)

引用 24|浏览5
暂无评分
摘要
Text compression is a technique to reduce the size of text file and increase the transfer rate as well as save storage space. Many approaches have been proposed to tackle this problem in several languages such as: English, Chinese, Turkey, Japanese, French, etc. In this paper, we propose a method to compress Vietnamese text using syllables based on morphology and dictionaries. Our method firstly splits a morphosyllable to a consonant and a syllable then we encode it based on dictionaries of consonants and syllables. In our method, based on characteristics of Vietnamese language with six tone-marks, we build six different dictionaries of syllables. We collect a testing set of 20 different text files with different sizes to demonstrate our system. Experimental results show that our system achieves good performance with the compression ratio around 73%. In comparison with WinZIP version 19.51 and WinRAR version 5.212, our method achieves a higher compression ratio while the size of text file is small. So that, our method can apply efficiency to compress for short text such as: SMS messages, text messages on social networks.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要