Towards open-universe image parsing with broad coverage

MVA(2013)

引用 23|浏览17
暂无评分
摘要
One of the main goals of computer vision is to develop algorithms that allow the computer to interpret an image not as a pattern of colors but as the semantic relationships that make up a real world three-dimensional scene. In this dissertation, I present a system for image parsing, or labeling the regions of an image with their semantic categories, as a means of scene understanding. Most existing image parsing systems use a fixed set of a few hundred hand-labeled images as examples from which they learn how to label image regions, but our world cannot be adequately described with only a few hundred images. A new breed of "open universe" datasets have recently started to emerge. These datasets not only have more images but are constantly expanding, with new images and labels assigned by users on the web. Here I present a system that is able to both learn from these larger datasets of labeled images and scale as the dataset expands, thus greatly broadening the number of class labels that can correctly be identified in an image. Throughout this work I employ a retrieval-based methodology: I first retrieve images similar to the query and then match image regions from this set of retrieved images. My system can assign to each image region multiple forms of meaning: for example, it can simultaneously label the wing of a crow as an animal, crow, wing, and feather. I also broaden the label coverage by using both region and detector based similarity measures to effectively match a broad range to label types. This work shows the power of retrieval-based systems and the importance of having a diverse set of image cues and interpretations.
更多
查看译文
关键词
image parsing,image cue,hundred image,image region,image region multiple form,hundred hand-labeled image,existing image,class label,new image,label coverage,broad coverage,Towards open-universe image
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要