Challenges and Innovations in Building a Product Knowledge Graph.
SIGMOD/PODS '18: International Conference on Management of Data Houston Texas June, 2018(2018)
摘要
Knowledge graphs have been used to support a wide range of applications and enhance search results for multiple major search engines, such as Google and Bing. At Amazon we are building a Product Graph, an authoritative knowledge graph for all products in the world. The thousands of product verticals we need to model, the vast number of data sources we need to extract knowledge from, the huge volume of new products we need to handle every day, and the various applications in Search, Discovery, Personalization, Voice, that we wish to support, all present big challenges in constructing such a graph. In this talk we describe four scientific directions we are investigating in building and using such a knowledge graph. First, we have been developing advanced extraction technologies to harvest product knowledge from semi-structured sources on the web and from text product profiles. Our annotation-based extraction tool selects a few webpages (typically below 10 pages) from a website for annotations, and can derive XPaths to extract from the whole website with average precision and recall of 97% [1]. Our distantly supervised extraction tool, CERES, uses an existing knowledge graph to automatically generate (noisy) training labels, and can obtain a precision over 90% when extracting from long-tail websites in various languages [1]. Our OpenTag technique extends state-of-the-art techniques such as Recursive Neural Network (RNN) and Conditional Random Field with attention and active learning, to achieve over 90% precision and recall in extracting attribute values (including values unseen in training data) from product titles, descriptions, and bullets [3].
更多查看译文
关键词
Knowledge extraction,knowledge fusion,entity linkage,data cleaning,graph mining,human-in-the-loop
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络