Web Data Extraction using Hybrid Program Synthesis: A Combination of Top-down and Bottom-up Inference

SIGMOD/PODS '20: International Conference on Management of Data Portland OR USA June, 2020(2020)

引用 22|浏览34
暂无评分
摘要
Automatic synthesis of web data extraction programs has been explored in a variety of settings, but in practice there remain various robustness and usability challenges. In this work we present a novel program synthesis approach which combines the benefits of deductive and enumerative synthesis strategies, yielding a semi-supervised technique with which concise programs expressible in standard languages can be synthesized from very few examples. We demonstrate improvement over existing techniques in terms of overall accuracy, number of examples required, and program complexity. Our method has been deployed as a web extraction feature in the mass market Microsoft Power BI product.
更多
查看译文
关键词
web data extraction, program synthesis, wrapper induction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要