Multimodal E-Commerce Product Classification Using Hierarchical Fusion

2022 IEEE 2nd Conference on Information Technology and Data Science (CITDS)(2022)

引用 2|浏览3
暂无评分
摘要
In this work, we present a multi-modal model for commercial product classification, that combines features extracted by multiple neural network models from textual (Camem-BERT and FlauBERT) and visual data (SE-ResNeXt-50), using simple fusion techniques. The proposed method significantly outperformed the performance of the unimodal models, as well as the reported performance of similar models on our specific task. We made experiments with multiple fusing techniques, and found, that the best preforming technique to combine the individual embedding of the unimodal network is based on the combination of concatenation and averaging the feature vectors. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label and multimodal classification problems.
更多
查看译文
关键词
Transformers,pretrained models,Ensemble,Ecommerce,Multi-modal,Fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要