Document Representations

Encyclopedia of Social Network Analysis and Mining(2018)

引用 28|浏览23
暂无评分
摘要
Many document representations are in use. Each representation explicitly encodes different aspects of a document. External document representations, using standard file formats (such as JPEG, postscript, HTML, LaTeX), are used to communicate document-data between programs. Internal document representations are used within document analysis or document production s ftware, to store intermediate results in the transformation from the input to output document representation. These document representations are central to defining and solving document analysis problems. Issues that can be investigated inclu e defining equivalence of documents and distance between documents, mathematically characterizing the mapping between document representations, characterizing the external information needed to carry out these mappings, and characterizing the differences between the forward and inverse mappings that occur during document analysis and document production. From our ongoing investigation of these issues, we present a summary of internal document representations used in the table-recognition literature, and case studies of external document representations in the domains of circuit diagrams and text documents.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要