Learning K-Occurrence Regular Expressions With Interleaving

DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II(2019)

引用 6|浏览23
暂无评分
摘要
Since lacking valid schemas is a critical problem for XML and present research on interleaving for XML is also quite insufficient, in this paper we focus on the inference of XML schemas with interleaving. Previous researches have shown that the essential task in schema learning is inferring regular expressions from a set of given samples. Presently, the most powerful model to learn XML schemas is the k-occurrence regular expressions (k-OREs for short). However, there have been no algorithms that can learn k-OREs with interleaving. Therefore, we propose an entire framework which can support both k-OREs and interleaving. To the best of our knowledge, our work is the first to address these two inference problems at the same time. We first defined a new subclass of regular expressions named k-OIREs, and developed an inference algorithm iKOIRE to learn k-OIRE based on genetic algorithm and maximum independent set (MIS). We further conducted a series of experiments on large-scale real datasets, and evaluated the effectiveness of our work compared with both ongoing learning algorithms in academia and industrial tools in real world. The results reveal the high practicability and outstanding performance of our work, and indicate its promising prospects in application.
更多
查看译文
关键词
Regular expression, Language learning, Interleaving, XML schema
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要