Weight Annotation in Information Extraction.

Log. Methods Comput. Sci.(2022)

引用 2|浏览48
暂无评分
摘要
We introduce annotated document spanners, which are document spanners that can annotate their output tuples with elements from a semiring. Such spanners are useful for modeling soft constraints, which are popular in practical information extraction tools. We introduce a finite automaton model for such spanners, which generalizes vset-automata and weighted automata, and prove that this model is closed under the relational algebra operations union, projection, natural join that have been considered in the work on provenance in databases. Concerning selection, we generalize a characterization of Fagin et al., proving that a string relation R is recognizable if and only if the regular spanners are closed under selection using R. Finally we consider evaluation and enumeration problems for annotated document spanners and provide a number of tractability- and intractability results. For achieving tractability, fundamental properties of the underlying semiring, such as positivity, are crucial.
更多
查看译文
关键词
Information extraction, regular document spanners, weighted automata, prove-nance semirings, K-relations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要