SeCSeq: Semantic Coding for Sequence-to-Sequence based Extreme Multi-label Classification

user-5ebe28934c775eda72abcddd(2018)

引用 0|浏览38
暂无评分
摘要
Extreme multi-label classification (XMC) aims at assigning to an instance the most relevant subset of labels from a colossal label set. There has been some success in formulating the multi-label problem as sequence-to-sequence (Seq2Seq) learning, where the positive class labels of each input instance are used as the corresponding output sequence. Seq2Seq methods, nonetheless, have not yet been scalable to the XMC setting due to the softmax bottleneck. In this paper, we propose a semantic coding framework, namely SeCSeq, for a Seq2Seq approach to the XMC problem. To circumvent the softmax bottleneck, SeCSeq compresses labels into sequences of semantic-aware compact codes, on which Seq2Seq models are trained. For inference, the generated semantic codes are then decompressed into sequences of positive labels using ensemble techniques. Preliminary experiments on XMC benchmark datasets show that SeCSeq is competitive with the state-of-the-art while requiring significantly fewer model parameters.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要