A Novel Bovw Mimicking End-To-End Trainable Cnn Classification Framework Using Optimal Transport Theory

2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)(2019)

引用 3|浏览16
暂无评分
摘要
An end-to-end trainable convolutional neural network (CNN) framework which mimics bag of visual words (BoVW) is proposed for image classification. To this end, a new paradigm for histogram-like image representation is introduced and optimal transport (OT) distance is utilized for the similarity assessment. Any patch of an image is considered as a unique visual word and the image is represented as the uniform histogram of the visual words with the histogram bins associated to embedding vectors according to the semantic meanings of the corresponding visual words. Thus, in the CNN framework, the output of the last convolutional block is considered as the global representation of the image and the embeddings are inherently learned within the classification framework. With the proposed formulation, undesired quantization for the BoVW representation is no more required; moreover, the learned CNN features are naturally interpretable. The experiments on CIFAR-10, CIFAR-100 and SVHN datasets show that the replacement of the global pooling and fully connected layers with the proposed representation together with OT distance improves the baseline CNN framework.
更多
查看译文
关键词
Optimal transport, classification, image representation, convolutional network, bag of visual words
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要