Improved recovery and reconstruction of DEFLATEd files

Digital Investigation: The International Journal of Digital Forensics & Incident Response(2013)

引用 7|浏览26
暂无评分
摘要
This paper presents a method for recovering data from files compressed with the DEFLATE algorithm where short segments in the middle of the file have been corrupted, yielding a mix of literal bytes, bytes aligned with literals across the corrupted segment, and co-indexed unknown bytes. An improved reconstruction algorithm based on long byte n-grams increases the proportion of reconstructed bytes by an average of 8.9% absolute across the 21 languages of the Europarl corpus compared to previously-published work, and the proportion of unknown bytes correctly reconstructed by an average of 20.9% absolute, while running in one-twelfth the time on average. Combined with the new recovery method, corrupted segments of 128-4096 bytes in the compressed bit-stream result in reconstructed output which differs from the original file by an average of less than twice the number of bytes represented by the corrupted segment. Both new algorithms are implemented in the trainable open-source ZipRec 1.0 utility program.
更多
查看译文
关键词
data recovery,deflate compression,file reconstruction,language models,zip archives
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要