Constant-delay enumeration algorithms for document spanners over nested documents
arxiv(2020)
摘要
Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In recent years, the task of extracting data from large nested documents has become especially relevant. We model queries of this kind as Visibly Pushdown Transducers (VPT), a structure that extends visibly pushdown automata with outputs. Since processing a string through a VPT can generate a huge number of outputs, we are interested in the task of enumerating them one after another as efficiently as possible. This paper describes an algorithm that enumerates these elements with output-linear delay after preprocessing the string in a single pass. We show applications of this result on recursive document spanners over nested documents and show how our algorithm can be adapted to enumerate the outputs in this context.
更多查看译文
关键词
document spanners,constant-delay
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络