Beyond Majority Voting: A Coarse-to-Fine Label Filtration for Heavily Noisy Labels.

IEEE transactions on neural networks and learning systems(2019)

引用 10|浏览98
暂无评分
摘要
Crowdsourcing has become the most appealing way to provide a plethora of labels at a low cost. Nevertheless, labels from amateur workers are often noisy, which inevitably degenerates the robustness of subsequent learning models. To improve the label quality for subsequent use, majority voting (MV) is widely leveraged to aggregate crowdsourced labels due to its simplicity and scalability. However, when crowdsourced labels are “heavily” noisy (e.g., 40% of noisy labels), MV may not work well because of the fact “garbage (heavily noisy labels) in, garbage ( full aggregated labels ) out.” This issue inspires us to think: if the ultimate target is to learn a robust model using noisy labels, why not provide partial aggregated labels and ensure that these labels are reliable enough for learning models? To solve this challenge by improving MV, we propose a coarse-to-fine label filtration model called double filter machine (DFM), which consists of a (majority) voting filter and a sparse filter serially. Specifically, the DFM refines crowdsourced labels from coarse filtering to fine filtering. In the stage of coarse filtering, the DFM aggregates crowdsourced labels by voting filter, which yields (quality-acceptable) full aggregated labels. In the stage of fine filtering, DFM further digs out a set of high-quality labels from full aggregated labels by sparse filter, since this filter can identify high-quality labels by the methodology of support selection. Based on the insight of compressed sensing, DFM recovers a ground-truth signal from heavily noisy data under a restricted isometry property. To sum up, the primary benefits of DFM are to keep the scalability by voting filter, while improve the robustness by sparse filter. We also derive theoretical guarantees for the convergence and recovery of DFM and reveal its complexity. We conduct comprehensive experiments on both the UCI simulated and the AMT crowdsourced datasets. Empirical results show that partial aggregated labels provided by DFM effectively improve the robustness of learning models.
更多
查看译文
关键词
Noise measurement,Compressed sensing,Crowdsourcing,Robustness,Aggregates,Mathematical model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要