PECOS: Prediction for Enormous and Correlated Output Spaces

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2022)

引用 64|浏览81
暂无评分
摘要
Different from traditional machine learning tasks and benchmarks, real-world problems are usually accompanied by enormous output spaces, from hundred thousands of diseases in medical diagnosis, to millions of items and billions of websites in product and web search engines. Unfortunately, conventional machine learning tools and libraries are incapable of efficiently and accurately tackling large-scale output spaces. To address this issue, PECOS (Prediction for Enormous and Correlated Output Spaces) [11] is a state-of-the-art and open-sourced machine learning library1, which not only provides high-level and user-friendly interfaces of both linear and deep learning models, but also supplies considerable flexibility for solving diverse machine learning problems. Specifically, PECOS eases complicated semantic indexing for organizing enormous output spaces, thereby efficiently training models and deriving predictions by magnitude orders on correlated output labels. As a powerful and useful framework, PECOS has already been adopted in various real- world large-scale products like semantic search in Amazon [1], as well as achieved state-of-the-art on public extreme multi-label classification (XMC) benchmarks [2, 11, 12 ] and various downstream applications [3, 7, 9]. In this tutorial, we will introduce several key functions and features of the PECOS library. By way of real-world examples, the attendees will learn how to efficiently train large-scale machine learning models for enormous output spaces, and obtain predictions in less than 1 millisecond for a data input with million labels, in the context of product recommendation and natural language processing. We will also show the flexibility of dealing with diverse machine learning problems and data formats with assorted built-in utilities in PECOS. By the end of the tutorial, we believe that attendees will be easily capable of adopting certain concepts to their own projects and address different machine learning problems with enormous output spaces
更多
查看译文
关键词
Extreme Multi-label Text Classification, Large Output Space Learning, Transform-ers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要