AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
ECLARE: Extreme Classification with Label Graph Correlations
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), pp.3721-3732, (2021)
- Extreme multi-label classification (XC) involves tagging a data point with the subset of labels most relevant to it, from an exavailable at https://github.com/Extreme-classification/ECLARE tremely large set of labels.
- CCS CONCEPTS including product recommendation , related searches , related products , etc.
- Stand to benefit significantly from utilizing label correlation data, by presenting ECLARE, an XC method that utilizes textual label descriptions and label correlation graphs over millions of labels KEYWORDS.
- Extreme multi-label classification; product to product recommendation; label features; label metadata; large-scale learning to offer predictions that can be 2–14% more accurate than those offered by state-of-the-art XC methods, including those that utilize label metadata such as label text.
- ECLARE: Extreme Classification with Label Graph Correlations.
- Extreme multi-label classification; product to product recommendation; label features; label metadata; large-scale learning to offer predictions that can be 2–14% more accurate than those offered by state-of-the-art XC methods, including those that utilize label metadata such as label text
- Results indicate that ECLARE could be upto 2% and 10% more accurate as compared to LightGCN and graph convolution network (GCN)
- This paper presents the architecture and accompanying training and prediction techniques for the ECLARE method to perform extreme multi-label classification at the scale of millions of labels
- The specific contributions of ECLARE include a framework for incorporating label graph information at massive scales, as well as critical design and algorithmic choices that enable collaborative learning using label correlation graphs with millions of labels
- The proposed approach greatly outperforms state-of-the-art XC methods on multiple datasets while still offering millisecond level prediction times even on the largest datasets
- ECLARE DECAF Astec AttentionXML Slice MACH X-Transformer Siamese Bonsai Parabel DiSMEC XT AnneXML.
- ECLARE DECAF Astec AttentionXML Slice MACH Siamese Bonsai Parabel DiSMEC XT AnneXML.
- ECLARE DECAF Astec AttentionXML Slice MACH Bonsai Parabel DiSMEC XT AnneXML.
- ECLARE could be up to 25% more accurate than ECLARE-PPR
- This could be attributed to the highly sparse label correlation graph and justifies the importance of ECLARE’s label correlation graph as well as it’s careful negative sampling.
- Results indicate that ECLARE could be upto 2% and 10% more accurate as compared to LightGCN and GCN.
- In another variant of ECLARE, the refinement vectors z3 were removed (ECLARE-NoRefine).
- Results indicate that ECLARE could be up to 10% more accurate as compared to ECLARE-NoRefine which indicates that the per-label refinement vectors are essential for accuracy. (3) Graph construction: To evaluate the efficacy of ECLARE’s graph construction, the authors compare it to ECLARE-Cooc where
- Results on benchmark datasets
Tab 2 demonstrates that ECLARE can be significantly more accurate than existing XC methods.
- To further understand the gains of ECLARE, the labels were divided into five bins such that each bin contained an equal number of positive training points (Fig 7).
- This ensured that each bin had an equal opportunity to contribute to the overall accuracy.
- ECLARE could be up to 14%, 15%, and 15% more accurate as compared to state-of-the-art methods in terms of P@1, PSP@1 and R@10 respectively.
- To scale accurately to millions of labels, ECLARE makes several meticulous design choices.
- To validate their importance, Tab 5 compares different variants of ECLARE:
- This paper presents the architecture and accompanying training and prediction techniques for the ECLARE method to perform extreme multi-label classification at the scale of millions of labels.
- The specific contributions of ECLARE include a framework for incorporating label graph information at massive scales, as well as critical design and algorithmic choices that enable collaborative learning using label correlation graphs with millions of labels.
- ECLARE establishes a standard for incorporating label metadata into XC techniques
- These findings suggest promising directions for further study including effective graph pruning for heavy tailed datasets, using higher order convolutions ( > 1) in a scalable manner, and performing collaborative learning with heterogeneous and even multi-modal label sets.
- This has the potential to enable generalisation to settings where labels include textual objects such as webpages and documents, and videos, songs, etc
- Table1: Dataset Statistics. A ‡ sign denotes information that was redacted for proprietary datasets. The first four rows are public datasets and the last two rows are proprietary datasets. Dataset names with an asterisk ∗ next to them correspond to product-to-category tasks whereas others are product-to-product tasks
- Table2: Results on public benchmark datasets. ECLARE could offer 2-3.5% higher P@1 as well as upto 5% higher PSP@1 which focuses on rare labels. Additionally, ECLARE offered up to 3% better recall than leading XC methods
- Table3: Results on proprietary product-to-product (P2P) recommendation datasets. ECLARE could offer significant gains – upto 14% higher P@1, 15% higher PSP@1 and 7% higher R@10 – than competing classifiers
- Table4: An ablation study exploring the benefits of the GAME step for other XC methods. Although ECLARE still provides the leading accuracies, existing methods show consistent gains from the use of the GAME step
- Table5: An ablation study exploring alternate design decisions. Design choices made by ECLARE for its components were found to be optimal among popular alternatives
- Table6: A subjective comparison of the top 5 label predictions by ECLARE and other algorithms on the WikiSeeAlso-350K and P2PTitles-2M datasets. Predictions typeset in black color were a part of the ground truth whereas those in light gray color were not. ECLARE is able to offer precise recommendations for extremely rare labels missed by other methods. For instance, the label “Dog of Osu” in the first example is so rare that it occurred only twice in the training set. This label does not have any token overlaps with its document or co-occurring labels either. This may have caused techniques such as DECAF that rely solely on label text, to miss such predictions. The examples also establish that incorporating label co-occurrence allows ECLARE to infer the correct intent of a document or a user query. For instance, in the second example, all other methods, including DECAF, either incorrectly focus on the tokens “Academy Awards” in the document title and start predicting labels related to other editions of the Academy Awards, or else amorphous labels about entertainment awards in general. On the other hand, ECLARE is able to correctly predict other labels corresponding to award ceremonies held in the same year as the 85th Academy awards, as well as the rare label “List of . . . Best Foreign Language Film”. Similarly, in the third example, ECLARE correctly determines that the user is interested in faux fur coats and not necessarily in the brand Draper’s & Damon’s itself whereas methods such as DECAF that rely solely on label and document text, focus on the brand name alone and start predicting shirts and jackets of the same brand which are irrelevant to the user query
- Table7: An ablation study showing loss of mutual information (lower is better) using various clustering strategies as well as fanouts. Lowering the number of metalabels = |C| hurts performance. Competing methods that do not use graph-augmented clustering offer poor LMI, especially MACH that uses random hashes to cluster labels
- Summary. XC algorithms proposed in literature employ a variety of label prediction approaches like tree, embedding, hashing and one-vs-all-based approaches [1, 2, 4, 6, 8, 10, 11, 15,16,17,18,19, 22, 26, 31, 35,36,37, 40, 41, 44,45,46,47, 49]. Earlier works learnt label classifiers using fixed representations for documents (typically bag-of-words) whereas contemporary approaches learn a document embedding architecture (typically using deep networks) jointly with the label classifiers. In order to operate with millions of labels, XC methods frequently have to rely on sub-linear time data structures for operations such as shortlisting labels, sampling hard negatives, etc. Choices include hashing , clustering [6, 36, 49], negative sampling , etc. Notably, most XC methods except DECAF , GLaS , and X-Transformer  do not incorporate any form of label metadata, instead treating labels as black-box identifiers.
- AM is supported by a Google PhD Fellowship
- R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.
- R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multilabel classification. ML (2019).
- K. Bhatia, K. Dahiya, H. Jain, A. Mittal, Y. Prabhu, and M. Varma. 2016. The extreme classification repository: Multi-label datasets and code. http://manikvarma.org/downloads/XC/XMLRepository.html
- K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS.
- P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics (2017).
- W-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.
- F. Chung. 2005. Laplacians and the Cheeger inequality for directed graphs. Annals of Combinatorics 9, 1 (2005), 1–19.
- K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma. 2021. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM.
- I. S. Dhillon, S. Mallela, and R. Kumar. 2003. A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification. JMLR 3 (2003), 1265–1287.
- C. Guo, A. Mousavi, X. Wu, Daniel N. Holtmann-Rice, S. Kale, S. Reddi, and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Neurips.
- V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain, and P. Rai. 2019. Distributional Semantics Meets Multi-Label Learning. In AAAI.
- W. Hamilton, Z. Ying, and J. Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024–1034.
- K. He, X. Zhang, S. Ren, and J. Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.
- X. He, K. Deng, X. Wang, Y. Li, Y. Zhang, and M. Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20).
- H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM.
- H. Jain, Y. Prabhu, and M. Varma. 20Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD.
- V. Jain, N. Modhe, and P. Rai. 20Scalable Generative Models for Multi-label Learning with Missing Labels. In ICML.
- A. Jalan and P. Kar. 2019. Accelerating Extreme Classification via Adaptive Feature Agglomeration. IJCAI (2019).
- K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML.
- A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In Proceedings of the European Chapter of the Association for Computational Linguistics.
- Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Jeff Yuan, and Lluis Garcia-Pueyo. 2012. Supercharging Recommender Systems Using Taxonomies for Learning User Purchase Behavior. VLDB (June 2012).
- S. Khandagale, H. Xiao, and R. Babbar. 2019. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification. CoRR (2019).
- P. D. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014).
- T. N. Kipf and M. Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
- J. Klicpera, A. Bojchevski, and S. Günnemann. 2018. Predict then Propagate: Graph Neural Networks meet Personalized PageRank. In International Conference on Learning Representations.
- J. Liu, W. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A
- T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava. 2019. Extreme
- Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota. 2011. Response Prediction Using Collaborative Filtering with Hierarchies and Side-Information. In KDD.
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS.
- A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma. 2021. DECAF: Deep Extreme Classification with Label Features. In WSDM.
- T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In ICLR.
- A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS.
- A. Pal, C. Eksombatchai, Y. Zhou, B. Zhao, C. Rosenberg, and J. Leskovec. 2020. PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest. In KDD ’20 (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 2311–2320. https://doi.org/10.1145/
- Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma.
- 2018. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM.
-  Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW.
-  Y. Prabhu and M. Varma. 2014. FastXML: A Fast, Accurate and Stable Treeclassifier for eXtreme Multi-label Learning. In KDD.
-  Noveen Sachdeva, Kartik Gupta, and Vikram Pudi. 2018. Attentive Neural Architecture Incorporating Song Features for Music Recommendation. In RecSys.
-  M. Schuster and K. Nakajima. 2012. Japanese and korean voice search. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5149–5152.
-  W. Siblini, P. Kuntz, and F. Meyer. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML.
-  Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD.
-  P. Tang, M. Jiang, B. Xia, J. W. Pitera, Welser J., and N. V. Chawla. 2020. Multi-Label Patent Categorization with Non-Local Attention-Based Graph Convolutional Network. In AAAI, 2020.
-  P. Velićković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. 2018. Graph Attention Networks. ICLR (2018).
-  M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, and K. Dembczynski.
- 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS.  E.H. I. Yen, X. Huang, W. Dai, I. Ravikumar, P.and Dhillon, and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD. E.H. I. Yen, X. Huang, K. Zhong, P. Ravikumar, and I. S. Dhillon. 2016. PD-
- I. Yen, S. Kale, F. Yu, D. Holtmann R., S. Kumar, and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In ICML.
- R. Ying, R. He, K. Chen, P. Eksombatchai, W. Hamilton, and J. Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974–983.
- R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu. 2018. AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. CoRR (2018).
- H. Yu, P. Jain, P. Kar, and I. S. Dhillon. 2014. Large-scale Multi-label Learning with Missing Labels. In ICML.
- Z. Zhang, P. Cui, and W. Zhu. 2020. Deep learning on graphs: A survey. IEEE Transactions on Knowledge and Data Engineering (2020).