PECOS: Prediction for Enormous and Correlated Output Spaces

arXiv (Cornell University)(2020)

引用 0|浏览0
暂无评分
摘要
Many large-scale applications amount to finding relevant results from an enormous output space of potential candidates. For example, finding the best matching product from a large catalog or suggesting related search phrases on a search engine. The size of the output space for these problems can range from millions to billions, and can even be infinite in some applications. Moreover, training data is often limited for the long-tail items in the output space. Fortunately, items in the output space are often correlated thereby presenting an opportunity to alleviate the data sparsity issue. In this paper, we propose the Prediction for Enormous and Correlated Output Spaces (PECOS) framework, a versatile and modular machine learning framework for solving prediction problems for very large output spaces, and apply it to the eXtreme Multilabel Ranking (XMR) problem: given an input instance, find and rank the most relevant items from an enormous but fixed and finite output space. We propose a three phase framework for PECOS: (i) in the first phase, PECOS organizes the output space using a semantic indexing scheme, (ii) in the second phase, PECOS uses the indexing to narrow down the output space by orders of magnitude using a machine learned matching scheme, and (iii) in the third phase, PECOS ranks the matched items using a final ranking scheme. The versatility and modularity of PECOS allows for easy plug-and-play of various choices for the indexing, matching, and ranking phases. We also develop very fast inference procedures which allow us to perform XMR predictions in real time; for example, inference takes less than 1 millisecond per input on the dataset with 2.8 million labels. The PECOS software is available at https://libpecos.org.
更多
查看译文
关键词
prediction,output,spaces
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要