AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
The specific contributions of ECLARE include a framework for incorporating label graph information at massive scales, as well as critical design and algorithmic choices that enable collaborative learning using label correlation graphs with millions of labels

ECLARE: Extreme Classification with Label Graph Correlations

PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), pp.3721-3732, (2021)

Cited: 5|Views13
EI

Abstract

Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during training. Such rare labels hold the key to personalized recommendations that can delight and...More

Code:

Data:

0
Introduction
  • Extreme multi-label classification (XC) involves tagging a data point with the subset of labels most relevant to it, from an exavailable at https://github.com/Extreme-classification/ECLARE tremely large set of labels.
  • CCS CONCEPTS including product recommendation [28], related searches [15], related products [31], etc.
  • Stand to benefit significantly from utilizing label correlation data, by presenting ECLARE, an XC method that utilizes textual label descriptions and label correlation graphs over millions of labels KEYWORDS.
  • Extreme multi-label classification; product to product recommendation; label features; label metadata; large-scale learning to offer predictions that can be 2–14% more accurate than those offered by state-of-the-art XC methods, including those that utilize label metadata such as label text.
  • ECLARE: Extreme Classification with Label Graph Correlations.
Highlights
  • Extreme multi-label classification; product to product recommendation; label features; label metadata; large-scale learning to offer predictions that can be 2–14% more accurate than those offered by state-of-the-art XC methods, including those that utilize label metadata such as label text
  • Results indicate that ECLARE could be upto 2% and 10% more accurate as compared to LightGCN and graph convolution network (GCN)
  • This paper presents the architecture and accompanying training and prediction techniques for the ECLARE method to perform extreme multi-label classification at the scale of millions of labels
  • The specific contributions of ECLARE include a framework for incorporating label graph information at massive scales, as well as critical design and algorithmic choices that enable collaborative learning using label correlation graphs with millions of labels
  • The proposed approach greatly outperforms state-of-the-art XC methods on multiple datasets while still offering millisecond level prediction times even on the largest datasets
Methods
  • ECLARE DECAF Astec AttentionXML Slice MACH X-Transformer Siamese Bonsai Parabel DiSMEC XT AnneXML.
  • ECLARE DECAF Astec AttentionXML Slice MACH Siamese Bonsai Parabel DiSMEC XT AnneXML.
  • ECLARE DECAF Astec AttentionXML Slice MACH Bonsai Parabel DiSMEC XT AnneXML.
  • ECLARE could be up to 25% more accurate than ECLARE-PPR
  • This could be attributed to the highly sparse label correlation graph and justifies the importance of ECLARE’s label correlation graph as well as it’s careful negative sampling.
  • Results indicate that ECLARE could be upto 2% and 10% more accurate as compared to LightGCN and GCN.
  • In another variant of ECLARE, the refinement vectors z3 were removed (ECLARE-NoRefine).
  • Results indicate that ECLARE could be up to 10% more accurate as compared to ECLARE-NoRefine which indicates that the per-label refinement vectors are essential for accuracy. (3) Graph construction: To evaluate the efficacy of ECLARE’s graph construction, the authors compare it to ECLARE-Cooc where
Results
  • Results on benchmark datasets

    Tab 2 demonstrates that ECLARE can be significantly more accurate than existing XC methods.
  • To further understand the gains of ECLARE, the labels were divided into five bins such that each bin contained an equal number of positive training points (Fig 7).
  • This ensured that each bin had an equal opportunity to contribute to the overall accuracy.
  • ECLARE could be up to 14%, 15%, and 15% more accurate as compared to state-of-the-art methods in terms of P@1, PSP@1 and R@10 respectively.
  • To scale accurately to millions of labels, ECLARE makes several meticulous design choices.
  • To validate their importance, Tab 5 compares different variants of ECLARE:
Conclusion
  • This paper presents the architecture and accompanying training and prediction techniques for the ECLARE method to perform extreme multi-label classification at the scale of millions of labels.
  • The specific contributions of ECLARE include a framework for incorporating label graph information at massive scales, as well as critical design and algorithmic choices that enable collaborative learning using label correlation graphs with millions of labels.
  • ECLARE establishes a standard for incorporating label metadata into XC techniques
  • These findings suggest promising directions for further study including effective graph pruning for heavy tailed datasets, using higher order convolutions ( > 1) in a scalable manner, and performing collaborative learning with heterogeneous and even multi-modal label sets.
  • This has the potential to enable generalisation to settings where labels include textual objects such as webpages and documents, and videos, songs, etc
Tables
  • Table1: Dataset Statistics. A ‡ sign denotes information that was redacted for proprietary datasets. The first four rows are public datasets and the last two rows are proprietary datasets. Dataset names with an asterisk ∗ next to them correspond to product-to-category tasks whereas others are product-to-product tasks
  • Table2: Results on public benchmark datasets. ECLARE could offer 2-3.5% higher P@1 as well as upto 5% higher PSP@1 which focuses on rare labels. Additionally, ECLARE offered up to 3% better recall than leading XC methods
  • Table3: Results on proprietary product-to-product (P2P) recommendation datasets. ECLARE could offer significant gains – upto 14% higher P@1, 15% higher PSP@1 and 7% higher R@10 – than competing classifiers
  • Table4: An ablation study exploring the benefits of the GAME step for other XC methods. Although ECLARE still provides the leading accuracies, existing methods show consistent gains from the use of the GAME step
  • Table5: An ablation study exploring alternate design decisions. Design choices made by ECLARE for its components were found to be optimal among popular alternatives
  • Table6: A subjective comparison of the top 5 label predictions by ECLARE and other algorithms on the WikiSeeAlso-350K and P2PTitles-2M datasets. Predictions typeset in black color were a part of the ground truth whereas those in light gray color were not. ECLARE is able to offer precise recommendations for extremely rare labels missed by other methods. For instance, the label “Dog of Osu” in the first example is so rare that it occurred only twice in the training set. This label does not have any token overlaps with its document or co-occurring labels either. This may have caused techniques such as DECAF that rely solely on label text, to miss such predictions. The examples also establish that incorporating label co-occurrence allows ECLARE to infer the correct intent of a document or a user query. For instance, in the second example, all other methods, including DECAF, either incorrectly focus on the tokens “Academy Awards” in the document title and start predicting labels related to other editions of the Academy Awards, or else amorphous labels about entertainment awards in general. On the other hand, ECLARE is able to correctly predict other labels corresponding to award ceremonies held in the same year as the 85th Academy awards, as well as the rare label “List of . . . Best Foreign Language Film”. Similarly, in the third example, ECLARE correctly determines that the user is interested in faux fur coats and not necessarily in the brand Draper’s & Damon’s itself whereas methods such as DECAF that rely solely on label and document text, focus on the brand name alone and start predicting shirts and jackets of the same brand which are irrelevant to the user query
  • Table7: An ablation study showing loss of mutual information (lower is better) using various clustering strategies as well as fanouts. Lowering the number of metalabels = |C| hurts performance. Competing methods that do not use graph-augmented clustering offer poor LMI, especially MACH that uses random hashes to cluster labels
Download tables as Excel
Related work
  • Summary. XC algorithms proposed in literature employ a variety of label prediction approaches like tree, embedding, hashing and one-vs-all-based approaches [1, 2, 4, 6, 8, 10, 11, 15,16,17,18,19, 22, 26, 31, 35,36,37, 40, 41, 44,45,46,47, 49]. Earlier works learnt label classifiers using fixed representations for documents (typically bag-of-words) whereas contemporary approaches learn a document embedding architecture (typically using deep networks) jointly with the label classifiers. In order to operate with millions of labels, XC methods frequently have to rely on sub-linear time data structures for operations such as shortlisting labels, sampling hard negatives, etc. Choices include hashing [28], clustering [6, 36, 49], negative sampling [30], etc. Notably, most XC methods except DECAF [31], GLaS [10], and X-Transformer [6] do not incorporate any form of label metadata, instead treating labels as black-box identifiers.
Funding
  • AM is supported by a Google PhD Fellowship
Study subjects and analysis
rows are public datasets: 4
. Dataset Statistics. A ‡ sign denotes information that was redacted for proprietary datasets. The first four rows are public datasets and the last two rows are proprietary datasets. Dataset names with an asterisk ∗ next to them correspond to product-to-category tasks whereas others are product-to-product tasks. Results on public benchmark datasets. ECLARE could offer 2-3.5% higher P@1 as well as upto 5% higher PSP@1 which focuses on rare labels. Additionally, ECLARE offered up to 3% better recall than leading XC methods

Reference
  • R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.
    Google ScholarFindings
  • R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multilabel classification. ML (2019).
    Google ScholarFindings
  • K. Bhatia, K. Dahiya, H. Jain, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. http://manikvarma.org/downloads/XC/XMLRepository.html
    Findings
  • K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS.
    Google ScholarFindings
  • P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics (2017).
    Google ScholarLocate open access versionFindings
  • W-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.
    Google ScholarFindings
  • F. Chung. 2005. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics 9, 1 (2005), 1–19.
    Google ScholarLocate open access versionFindings
  • K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma. 2021. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM.
    Google ScholarFindings
  • I. S. Dhillon, S. Mallela, and R. Kumar. 2003. A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification. JMLR 3 (2003), 1265–1287.
    Google ScholarLocate open access versionFindings
  • C. Guo, A. Mousavi, X. Wu, Daniel N. Holtmann-Rice, S. Kale, S. Reddi, and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Neurips.
    Google ScholarFindings
  • V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain, and P. Rai. 2019. Distributional Semantics Meets Multi-Label Learning. In AAAI.
    Google ScholarFindings
  • W. Hamilton, Z. Ying, and J. Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024–1034.
    Google ScholarFindings
  • K. He, X. Zhang, S. Ren, and J. Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.
    Google ScholarLocate open access versionFindings
  • X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20).
    Google ScholarLocate open access versionFindings
  • H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM.
    Google ScholarFindings
  • H. Jain, Y. Prabhu, and M. Varma. 20Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD.
    Google ScholarFindings
  • V. Jain, N. Modhe, and P. Rai. 20Scalable Generative Models for Multi-label Learning with Missing Labels. In ICML.
    Google ScholarFindings
  • A. Jalan and P. Kar. 2019. Accelerating Extreme Classification via Adaptive Feature Agglomeration. IJCAI (2019).
    Google ScholarLocate open access versionFindings
  • K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML.
    Google ScholarFindings
  • A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the European Chapter of the Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Jeff Yuan, and Lluis Garcia-Pueyo. 2012. Supercharging Recommender Systems Using Taxonomies for Learning User Purchase Behavior. VLDB (June 2012).
    Google ScholarLocate open access versionFindings
  • S. Khandagale, H. Xiao, and R. Babbar. 2019. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification. CoRR (2019).
    Google ScholarLocate open access versionFindings
  • P. D. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014).
    Google ScholarLocate open access versionFindings
  • T. N. Kipf and M. Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
    Google ScholarLocate open access versionFindings
  • J. Klicpera, A. Bojchevski, and S. Günnemann. 2018. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In International Conference on Learning Representations.
    Google ScholarLocate open access versionFindings
  • J. Liu, W. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.
    Google ScholarFindings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A
    Google ScholarFindings
  • T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava. 2019. Extreme
    Google ScholarFindings
  • Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota. 2011. Response Prediction Using Collaborative Filtering with Hierarchies and Side-Information. In KDD.
    Google ScholarFindings
  • T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS.
    Google ScholarFindings
  • A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma. 2021. DECAF: Deep Extreme Classification with Label Features. In WSDM.
    Google ScholarFindings
  • T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR.
    Google ScholarFindings
  • A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS.
    Google ScholarFindings
  • A. Pal, C. Eksombatchai, Y. Zhou, B. Zhao, C. Rosenberg, and J. Leskovec. 2020. PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest. In KDD ’20 (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 2311–2320. https://doi.org/10.1145/
    Locate open access versionFindings
  • Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma.
    Google ScholarFindings
  • 2018. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM.
    Google ScholarFindings
  • [36] Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW.
    Google ScholarFindings
  • [37] Y. Prabhu and M. Varma. 2014. FastXML: A Fast, Accurate and Stable Treeclassifier for eXtreme Multi-label Learning. In KDD.
    Google ScholarFindings
  • [38] Noveen Sachdeva, Kartik Gupta, and Vikram Pudi. 2018. Attentive Neural Architecture Incorporating Song Features for Music Recommendation. In RecSys.
    Google ScholarFindings
  • [39] M. Schuster and K. Nakajima. 2012. Japanese and korean voice search. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5149–5152.
    Google ScholarLocate open access versionFindings
  • [40] W. Siblini, P. Kuntz, and F. Meyer. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML.
    Google ScholarFindings
  • [41] Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD.
    Google ScholarFindings
  • [42] P. Tang, M. Jiang, B. Xia, J. W. Pitera, Welser J., and N. V. Chawla. 2020. Multi-Label Patent Categorization with Non-Local Attention-Based Graph Convolutional Network. In AAAI, 2020.
    Google ScholarLocate open access versionFindings
  • [43] P. Velićković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. 2018. Graph Attention Networks. ICLR (2018).
    Google ScholarLocate open access versionFindings
  • [44] M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, and K. Dembczynski.
    Google ScholarFindings
  • 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS. [45] E.H. I. Yen, X. Huang, W. Dai, I. Ravikumar, P.and Dhillon, and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD. E.H. I. Yen, X. Huang, K. Zhong, P. Ravikumar, and I. S. Dhillon. 2016. PD-
    Google ScholarFindings
  • I. Yen, S. Kale, F. Yu, D. Holtmann R., S. Kumar, and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In ICML.
    Google ScholarFindings
  • R. Ying, R. He, K. Chen, P. Eksombatchai, W. Hamilton, and J. Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974–983.
    Google ScholarLocate open access versionFindings
  • R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu. 2018. AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. CoRR (2018).
    Google ScholarLocate open access versionFindings
  • H. Yu, P. Jain, P. Kar, and I. S. Dhillon. 2014. Large-scale Multi-label Learning with Missing Labels. In ICML.
    Google ScholarFindings
  • Z. Zhang, P. Cui, and W. Zhu. 2020. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering (2020).
    Google ScholarLocate open access versionFindings
Author
Noveen Sachdeva
Noveen Sachdeva
Sheshansh Agrawal
Sheshansh Agrawal
Sumeet Agarwal
Sumeet Agarwal
0
Your rating :

No Ratings

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn