Generating Concise Entity Matching Rules.

SIGMOD Conference(2017)

引用 67|浏览116
暂无评分
摘要
Entity matching (EM) is a critical part of data integration and cleaning. In many applications, the users need to understand why two entities are considered a match, which reveals the need for interpretable and concise EM rules. We model EM rules in the form of General Boolean Formulas (GBFs) that allows arbitrary attribute matching combined by conjunctions (∨), disjunctions (∧), and negations. (¬) GBFs can generate more concise rules than traditional EM rules represented in disjunctive normal forms (DNFs). We use program synthesis, a powerful tool to automatically generate rules (or programs) that provably satisfy a high-level specification, to automatically synthesize EM rules in GBF format, given only positive and negative matching examples. In this demo, attendees will experience the following features: (1) Interpretability -- they can see and measure the conciseness of EM rules defined using GBFs; (2) Easy customization -- they can provide custom experiment parameters for various datasets, and, easily modify a rich predefined (default) synthesis grammar, using a Web interface; and (3) High performance -- they will be able to compare the generated concise rules, in terms of accuracy, with probabilistic models (e.g., machine learning methods), and hand-written EM rules provided by experts. Moreover, this system will serve as a general platform for evaluating different methods that discover EM rules, which will be released as an open-source tool on GitHub.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要