Vision Outlooker-Based Hierarchical Food Classification.

Pranav Kathar, Rajshree Khandare,Manisha Das,Deep Gupta,Sneha Singh

TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)(2023)

引用 0|浏览0
In the modern world, where health concerns necessitate continual diet monitoring, the challenge of food image identification is crucial. Many machine learning models are available to automate the identification procedure. This is done predominantly with Convolutional Neural Networks (CNN) that help extract features for food images with different textures. But this comes with certain limitations such as diversity in food items, variation in the appearance of images, overfitting, and the inability to capture long-distance connections, which can result in inadequate feature representations. This paper attempts to explore Vision Transformers (ViTs) in an effort to overcome these limitations. ViTs are known for their attention mechanism, increased interpretability, better generalization, and robustness to adversarial cases. In this study, VOLO (Vision Outlooker for Visual Recognition), a contemporary vision transformer, improves learning by encoding fine-level information into the token representations. Also, a traditional flat classifier ceases to perform well because there are so many different cuisines and unique food items. Prediction systems with hierarchical classifiers were also developed to address this. Thus, the proposed method uses VOLO to accomplish hierarchical food classification. The experimental results support the proposed method's performance and contribution to an overall improvement in prediction accuracy.
Vision outlooker,Food images,Convolutional Neural Network,Classification
AI 理解论文
Chat Paper