Microsoft COCO: Common Objects in Context
ECCV (5), pp. 740-755, 2014.
Amazon’s Mechanical Turkobject recognitionnew datasetMicrosoft Common Objects in COntextinstance segmentationMore(4+)
We introduced a new dataset for detecting and segmenting objects found in everyday life in their natural environments
- One of the primary goals of computer vision is the understanding of visual scenes. Scene understanding involves numerous tasks including recognizing what objects are present, localizing the objects in 2D and 3D, determining the objects’ and scene’s attributes, characterizing relationships between objects and providing a semantic description of the scene.
- For instance the ImageNet dataset , which contains an unprecedented number of images, has recently enabled breakthroughs in both object classification and detection research , , .
- The Microsoft Common Objects in COntext (MS COCO) dataset contains 91 common object categories with 82 of them having more than 5,000 labeled instances, Fig. 6.
- A critical distinction between our dataset and others is the number of labeled instances per image which may aid in learning contextual information, Fig. 5.
- The PASCAL VOC  datasets contained 20 object categories spread over 11,000 images.
- Since the dataset was collected by finding images depicting various scene types, the number of instances per object category exhibits the long tail phenomenon.
- We ensure that each object category has a significant number of instances, Fig. 5.
- The first task in annotating our dataset is determining which object categories are present in each image, Fig. 3(a).
- In the stage all instances of the object categories in an image were labeled, Fig. 3(b).
- Each worker was asked to label at most 10 instances of a given category per image.
- The training task required workers to segment an object instance.
- For images containing 10 object instances or fewer of a given category, every instance was individually segmented.
- Fig. 4(b) re-examines precision and recall of AMT workers on category labeling on a much larger set of images.
- The amount of contextual information present in an image can be estimated by examining the average number of object categories and instances per image, Fig. 5(b, c).
- Fig. 9 shows results of this segmentation baseline for the DPM learned on the 20 PASCAL categories and tested on our dataset.
- Utilizing over 70,000 worker hours, a vast collection of object instances was gathered, annotated and organized to drive the advancement of object detection and segmentation algorithms.
- We describe and visualize our user interfaces for collecting non-iconic images, category labeling, instance spotting, instance segmentation, segmentation verification and crowd labeling.
- We define a single task for segmenting a single object instance labeled from the previous annotation stage.
- In the previous annotation stage, to ensure high coverage of all object instances, we used multiple workers to label all instances per image.
- We emphasize that crowd labeling is only necessary for images containing more than ten object instances of a given category.
- Presents a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding
- Presents a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN
- Introduces a new large-scale dataset that addresses three core research problems in scene understanding: detecting non-iconic views of objects, contextual reasoning between objects and the precise 2D localization of objects
- Richly-annotated dataset comprised of images depicting complex everyday scenes of common objects in their natural context
- To measure either kind of localization performance it is essential for the dataset to have every instance of every object category labeled and fully segmented