# Recovering Accurate Labeling Information from Partially Valid Data for Effective Multi-Label Learning

IJCAI, pp. 1373-1380, 2020.

EI

Keywords:

Label specIfic FeaTuresRanking LossHamming Lossunconstrained label propagationvictual Label SplittingMore(24+)

Weibo:

Abstract:

Partial Multi-label Learning (PML) aims to induce the multi-label predictor from datasets with noisy supervision, where each training instance is associated with several candidate labels but only partially valid. To address the noisy issue, the existing PML methods basically recover the ground-truth labels by leveraging the ground-truth...More

Code:

Data:

Introduction

- Partial Multi-label Learning (PML), a novel learning paradigm with noisy supervision, draws increasing attention from the machine learning community [Fang and Zhang, 2019; Sun et al, 2019].
- It refers to induce the multi-label predictor from PML datasets, with each training instance associated with multiple candidate labels that are only partially valid.
- The task of PML aims to induce a multi-label predictor f (x) : Rd → {0, 1}l from D

Highlights

- Partial Multi-label Learning (PML), a novel learning paradigm with noisy supervision, draws increasing attention from the machine learning community [Fang and Zhang, 2019; Sun et al, 2019]
- We can observe that PML3ER outperforms other three Partial Multi-label Learning methods in most cases, where PML3ER dominates the scenarios of Average Precision and Macro-F1 on different noise levels
- The performance gain over PARTICLE-Maximum A Posteriori indicates that using the information from non-candidate labels is beneficial for Partial Multi-label Learning
- We can see that PML3ER significantly performs better than the two traditional Multilabel Learning methods in most cases, since they directly use noisy candidate labels for training
- The win/tie/loss counts over 20 Partial Multi-label Learning datasets and 7 evaluation metrics are presented in Table 3
- Extensive experiments on Partial Multi-label Learning datasets indicate the superior performance of PML3ER

Methods

- The authors employed five baseline methods for comparison, including three PML methods and two traditional methods of Multilabel Learning (ML).
- The other version of [Xie and Huang, 2018], i.e., PMLlc, was neglected, since it performed worse than PML-fp in the early experiments.
- PARTICLE-MAP [Fang and Zhang, 2019]: A two-stage PML method with label propagation.
- The authors employ the public code4, and tune the parameters following the original paper.
- The other version of [Fang and Zhang, 2019], i.e., PARTICLE-VLS, was neglected, since it performed worse than PARTICLE-MAP in the early experiments

Results

- For each PML dataset, the authors randomly generate five 50%/50% training/test splits, and evaluate the average scores (±standard deviation) of comparing algorithms.
- The authors can observe that PML3ER outperforms other three PML methods in most cases, where PML3ER dominates the scenarios of AP and Macro-F1 on different noise levels.
- The authors can see that PML3ER significantly performs better than the two traditional ML methods in most cases, since they directly use noisy candidate labels for training.
- For each PML dataset and evaluation metric, the authors conduct a pairwise t-test to examine whether PML3ER is statistically different to baselines.

Conclusion

- Referring to Eq(6), the authors jointly learn the ground-truth confidence matrix C and label correlation matrix B by minimizing their reconstruction error of Y.
- The authors recover C using the information from candidate and non-candidate labels simultaneously.
- PML3ER performs an unconstrained label propagation procedure to estimate the label enrichment, which simultaneously involves the relevance degrees of candidate labels and irrelevance degrees of non-candidate labels.
- PML3ER jointly learns the ground-truth confidence and multi-label predictor given the label enrichment.
- Extensive experiments on PML datasets indicate the superior performance of PML3ER

Summary

## Introduction:

Partial Multi-label Learning (PML), a novel learning paradigm with noisy supervision, draws increasing attention from the machine learning community [Fang and Zhang, 2019; Sun et al, 2019].- It refers to induce the multi-label predictor from PML datasets, with each training instance associated with multiple candidate labels that are only partially valid.
- The task of PML aims to induce a multi-label predictor f (x) : Rd → {0, 1}l from D
## Methods:

The authors employed five baseline methods for comparison, including three PML methods and two traditional methods of Multilabel Learning (ML).- The other version of [Xie and Huang, 2018], i.e., PMLlc, was neglected, since it performed worse than PML-fp in the early experiments.
- PARTICLE-MAP [Fang and Zhang, 2019]: A two-stage PML method with label propagation.
- The authors employ the public code4, and tune the parameters following the original paper.
- The other version of [Fang and Zhang, 2019], i.e., PARTICLE-VLS, was neglected, since it performed worse than PARTICLE-MAP in the early experiments
## Results:

For each PML dataset, the authors randomly generate five 50%/50% training/test splits, and evaluate the average scores (±standard deviation) of comparing algorithms.- The authors can observe that PML3ER outperforms other three PML methods in most cases, where PML3ER dominates the scenarios of AP and Macro-F1 on different noise levels.
- The authors can see that PML3ER significantly performs better than the two traditional ML methods in most cases, since they directly use noisy candidate labels for training.
- For each PML dataset and evaluation metric, the authors conduct a pairwise t-test to examine whether PML3ER is statistically different to baselines.
## Conclusion:

Referring to Eq(6), the authors jointly learn the ground-truth confidence matrix C and label correlation matrix B by minimizing their reconstruction error of Y.- The authors recover C using the information from candidate and non-candidate labels simultaneously.
- PML3ER performs an unconstrained label propagation procedure to estimate the label enrichment, which simultaneously involves the relevance degrees of candidate labels and irrelevance degrees of non-candidate labels.
- PML3ER jointly learns the ground-truth confidence and multi-label predictor given the label enrichment.
- Extensive experiments on PML datasets indicate the superior performance of PML3ER

- Table1: Statistics of the original multi-label datasets. “#AL”: average label number of each instance
- Table2: Experimental results (mean ± std) in terms of RLoss, AP, Macro-F1, where the best performance is shown in boldface
- Table3: Win/tie/loss counts of pairwise t-test (at 0.05 significance level) between PML3ER and each comparing approach

Related work

- 2.1 Partial Multi-label Learning

Abundant researches towards Partial Label Learning (PLL) have been made, where each training instance is annotated with a candidate label set but only one is valid [Cour et al, 2011; Liu and Dietterich, 2012; Chen et al, 2014; Zhang et al, 2017; Yu and Zhang, 2017; Wu and Zhang, 2018; Gong et al, 2018; Chen et al, 2018; Feng and An, 2018; Feng and An, 2019b; Feng and An, 2019a; Wang et al, 2019]. The core idea of PLL follows the spirit of disambiguation, i.e., identifying the ground-truth label from the candidate label set for each instance. In some cases, PLL can be deemed as a special case of PML, where the ground-truth label number is fixed to one. Naturally, PML is more challenging than PLL, even the number of ground-truth labels is unknown.

The existing PML methods mainly recover the groundtruth labels by estimating the ground-truth confidences [Xie and Huang, 2018; Yu et al, 2018; Fang and Zhang, 2019; Sun et al, 2019]. Two PML methods are proposed [Xie and Huang, 2018], i.e., Partial Multi-label Learning with label correlation (PML-lc) and f eature prototype (PML-fp), which are upon the ranking loss objective weighted by ground-truth confidences. The other one [Sun et al, 2019], namely Partial Multi-label Learning by Low-Rank and Sparse decomposition (PML-LRS), trains the predictor with ground-truth confidences under the low-rank assumption. Besides them, the two-stage PML framework [Fang and Zhang, 2019], i.e., PARTIal multi-label learning via Credible Label Elicitation (PARTICLE), estimates the ground-truth confidences by label propagation, and then trains the predictor over candidate labels with high confidences only. Two traditional methods of victual Label Splitting (VLS) and Maximum A Posteriori (MAP) are used in its second stage, leading to two versions, i.e., PARTICLE-VLS and PARTICLE-MAP.

Reference

- [Boyd et al., 2011] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
- [Cai et al., 2010] Jian-Feng Cai, Emmanuel J. Candes, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010.
- [Chen et al., 2014] Yi-Chen Chen, Vishal M. Patel, Rama Chellappa, and P. Jonathon Phillips. Ambiguously labeled learning using dictionaries. IEEE Transactions on Information Forensics and Security, 9(12):2076–2088, 2014.
- [Chen et al., 2018] Ching-Hui Chen, Vishal M. Patel, and Rama Chellappa. Learning from ambiguously labeled face images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(7):1653–1667, 2018.
- [Cour et al., 2011] Timothee Cour, Benjamin Sapp, and Ben Taskar. Learning from partial labels. Journal of Machine Learning Research, 12:1501–1536, 2011.
- [Fang and Zhang, 2019] Jun-Peng Fang and Min-Ling Zhang. Partial multi-label learning via credible label elicitation. In AAAI Conference on Artificial Intelligence, pages 3518–3525, 2019.
- [Feng and An, 2018] Lei Feng and Bo An. Leveraging latent label distributions for partial label learning. In International Joint Conference on Artificial Intelligence, pages 2107–2113, 2018.
- [Feng and An, 2019a] Lei Feng and Bo An. Partial label learning by semantic difference maximization. In International Joint Conference on Artificial Intelligence, pages 2294–2300, 2019.
- [Feng and An, 2019b] Lei Feng and Bo An. Partial label learning with self-guided retraining. In AAAI Conference on Artificial Intelligence, pages 3542–3549, 2019.
- [Gayar et al., 2006] Neamat El Gayar, Friedhelm Schwenker, and Gunther Palm. A study of the robustness of kNN classifiers trained using soft labels. In International Conference on Artificial Neural Network in Pattern Recognition, pages 67–80, 2006.
- [Gong et al., 2018] Chen Gong, Tongliang Liu, Yuanyan Tang, Jian Yang, Jie Yang, and Dacheng Tao. A regularization approach for instance-based superset label learning. IEEE Transactions on Cybernetics, 48(3):967–978, 2018.
- [Hou et al., 2016] Peng Hou, Xin Geng, and Min-Ling Zhang. Multi-label manifold learning. In AAAI Conference on Artificial Intelligence, pages 1680–1686, 2016.
- [Jiang et al., 2006] Xiufeng Jiang, Zhang Yi, and Jian Cheng Lv. Fuzzy SVM with a new fuzzy membership function. Neural Computing & Applications, 15(3-4):268–276, 2006.
- [Li et al., 2015] Yu-Kun Li, Min-Ling Zhang, and Xin Geng. Leveraging implicit relative labeling-importance information for effective multi-label learning. In IEEE International Conference on Data Mining, pages 251–260, 2015.
- [Liu and Dietterich, 2012] Li-Ping Liu and Thomas G. Dietterich. A conditional multinomial mixture model for superset label learning. In Neural Information Processing Systems, pages 548–556, 2012.
- [Sun et al., 2019] Lijuan Sun, Songhe Feng, Tao Wang, Congyan Lang, and Yi Jin. Partial multi-label learning with low-rank and sparse decomposition. In AAAI Conference on Artificial Intelligence, pages 5016–5023, 2019.
- [Wang et al., 2019] Deng-Bao Wang, Li Li, and Min-Ling Zhang. Adaptive graph guided disambiguation for partial label learning. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 83–91, 2019.
- [Wu and Zhang, 2018] Xuan Wu and Min-Ling Zhang. Towards enabling binary decomposition for partial label learning. In International Joint Conference on Artificial Intelligence, pages 2868–2874, 2018.
- [Xie and Huang, 2018] Ming-Kun Xie and Sheng-Jun Huang. Partial multi-label learning. In AAAI Conference on Artificial Intelligence, pages 4302–4309, 2018.
- [Xu et al., 2018] Ning Xu, An Tao, and Xin Geng. Label enhancement for label distribution learning. In International Joint Conference on Artificial Intelligence, pages 2926– 2932, 2018.
- [Yu and Zhang, 2017] Fei Yu and Min-Ling Zhang. Maximum margin partial label learning. Machine Learning, 106:573–593, 2017.
- [Yu et al., 2018] Guoxian Yu, Xia Chen, Carlotta Domeniconi, Jun Wang, Zhao Li, Zili Zhang, and Xindong Wu. Feature-induced partial multi-label learning. In IEEE International Conference on Data Mining, pages 1398– 1403, 2018.
- [Zhang and Wu, 2015] Min-Ling Zhang and Lei Wu. LIFT: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1):107–120, 2015.
- [Zhang and Zhou, 2007] Min-Ling Zhang and Zhi-Hua Zhou. ML-kNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038–2048, 2007.
- [Zhang and Zhou, 2014] Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8):1819–1837, 2014.
- [Zhang et al., 2017] Min-Ling Zhang, Fei Yu, and CaiZhi Tang. Disambiguation-free partial label learning. IEEE Transactions on Knowledge and Data Engineering, 29(10):2155–2167, 2017.

Tags

Comments