Google Books: Making The Public Domain Universally Accessible

Adam Langley,Dan S. Bloomberg

DOCUMENT RECOGNITION AND RETRIEVAL XIV(2007)

引用 29|浏览16
暂无评分
摘要
Google Book Search is working with libraries and publishers around the world to digitally scan books. Some of those works are now in the public domain and, in keeping with Google's mission to make all the world's information useful and universally accessible, we wish to allow users to download them all.For users, it is important that the files are as small as possible and of printable quality. This means that a single codec for both text and images is impractical. We use PDF as a container for a mixture of JBIG2 and JPEG2000 images which are composed into a final set of pages.We discuss both the implementation of an open source JBIG2 encoder, which we use to compress text data, and the design of the infrastructure needed to meet the technical, legal and user requirements of serving many scanned works. We also cover the lessons learnt about dealing with different PDF readers and how to write files that work on most of the readers, most of the time.
更多
查看译文
关键词
Google,books,PDF,public domain,JBIG2,leptonica,Hausdorff,correlation,mixed raster,open source
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要