Fashion Parsing with Video Context

MM '14: 2014 ACM Multimedia Conference Orlando Florida USA November, 2014(2014)

引用 82|浏览45
暂无评分
摘要
In this paper, we explore how to utilize the video context to facilitate fashion parsing. Instead of annotating a large amount of fashion images, we present a general, affordable and scalable solution, which harnesses the rich contexts in easily available fashion videos to boost any existing fashion parser. First, we crawl a large unlabelled fashion video corpus with fashion frames. Then for each fashion video, the cross-frame contexts are utilized for human pose co-estimation, and then video co-parsing to obtain satisfactory fashion parsing results for all frames. More specifically, Sift Flow and super-pixel matching are used to build correspondences across frames, and these correspondences then con- textualize the pose estimations and fashion parsing in individual frames. Finally, these parsed video frames are used as the reference corpus for the non-parametric fashion parsing component of the whole solution. Extensive experiments on two benchmark fashion datasets as well as a newly collected challenging Fashion Icon (FI) dataset demonstrate the encouraging performance gain from our general pipeline for fashion parsing.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要