Inferring Regular Expressions With Interleaving From Xml Data

WEB AND BIG DATA (APWEB-WAIM 2018), PT II(2018)

引用 0|浏览20
暂无评分
摘要
Document Type Definition (DTD) and XML Schema Definition (XSD) are two popular schema languages for XML. However, many XML documents in practice are not accompanied by a schema, or by a valid schema. Therefore, it is essential to devise efficient algorithms for schema learning. Schema learning can be reduced to the inference of restricted regular expressions. In this paper, we first propose a new subclass of restricted regular expressions called Various CHAin Regular Expression with Interleaving (VCHARE). Then based on single occurrence automaton (SOA) and maximum independent set (MIS), we introduce an inference algorithm GenVCHARE. The algorithm has been proved to infer a descriptive generalized VCHARE from a set of given sample. Finally, we conduct a series of experiments based on our data set crawled from the Web. The experimental results show that VCHARE can cover more content models than other existing subclasses of regular expressions. And, based on the data sets of DBLP, regular expressions inferred by GenVCHARE are more accurate and concise compared with other existing methods.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要