Order-Prompted Tag Sequence Generation for Video Tagging.

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)(2023)

引用 0|浏览36
暂无评分
摘要
Video Tagging intends to infer multiple tags spanning relevant content for a given video. Typically, video tags are freely defined and uploaded by a variety of users, so they have two characteristics: abundant in quantity and disordered intra-video. It is difficult for the existing multilabel classification and generation methods to adapt directly to this task. This paper proposes a novel generative model, Order-Prompted Tag Sequence Generation (OP-TSG), according to the above characteristics. It regards video tagging as a tag sequence generation problem guided by sample-dependent order prompts. These prompts are semantically aligned with tags and enable to decouple tag generation order, making the model focus on modeling the tag dependencies. Moreover, the word-based generation strategy enables the model to generate novel tags. To verify the effectiveness and generalization of the proposed method, a Chinese video tagging benchmark CREATE-tagging, and an English image tagging benchmark Pexel-tagging are established. Extensive results show that OP-TSG is significantly superior to other methods, especially the results on rare tags improve by 3.3% and 3% over SOTA methods on CREATE-tagging and Pexel-tagging, and novel tags generated on CREATE-tagging exhibit a tag gain of 7.04%.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要