Learning Restricted Regular Expressions With Interleaving From Xml Data

CONCEPTUAL MODELING, ER 2018(2018)

引用 8|浏览39
暂无评分
摘要
The presence of a schema for XML documents has numerous advantages. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Therefore, it is essential to devise algorithms to learn a schema from XML documents. The fundamental task in XML schema learning is inferring restricted subclasses of regular expressions. Previous work in this direction lacks support of interleaving. In this paper, based on the analysis of real data, we first propose a new subclass of regular expressions with interleaving, named as Extended Subclass of Regular Expressions with Interleaving (ESIRE). Then, based on single occurrence automaton and maximum independent set, we propose an algorithm GenESIRE to infer ESIREs from a set of given samples. Finally, we conduct a series of experiments to analyze the inference effectness of GenESIRE. Experimental results show that regular expressions inferred by GenESIRE are more precise compared with other methods, measured by different indicators.
更多
查看译文
关键词
Schema learning, Regular expression, Interleaving
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要