Computer Vision'Computer Vision' is an interdisciplinary field that deals with how computers can be made for gaining high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, ''e.g.'', in the forms of decisions. Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that can interface with other thought processes and elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems. Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, Computer vision, learning, indexing, motion estimation, and image restoration.
International Journal of Computer Vision, no. 2 (2020): 336-359
We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable. Our approach—Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of ...
Cited by3074BibtexViews486Links
0
0
Schmid Jan Fabian, Simon Stephan F., Mester Rudolf
BMVC, pp.134, (2020)
Ground texture based vehicle localization using feature-based methods is a promising approach to achieve infrastructure-free high-accuracy localization. In this paper, we provide the first extensive evaluation of available feature extraction methods for this task, using separat...
Cited by0BibtexViews66Links
0
0
Goel Abhinav, Tung Caleb,Lu Yung-Hsiang, Thiruvathukal George K.
Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with lim...
Cited by0BibtexViews84Links
0
0
Geiger Bernhard C.
We review the current literature concerned with information plane analyses of neural network classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical eviden...
Cited by0BibtexViews37Links
0
0
Registration is the process that computes the transformation that aligns sets of data. Commonly, a registration process can be divided into four main steps: target selection, feature extraction, feature matching, and transform computation for the alignment. The accuracy of the ...
Cited by0BibtexViews47Links
0
0
Ye Xin,Yang Yezhou
Visual Indoor Navigation (VIN) task has drawn increasing attentions from the data-driven machine learning communities especially with the recent reported success from learning-based methods. Due to the innate complexity of this task, researchers have tried approaching the probl...
Cited by0BibtexViews42Links
0
0
Chapel Marie-Neige, Bouwmans Thierry
During about 30 years, a lot of research teams have worked on the big challenge of detection of moving objects in various challenging environments. First applications concern static cameras but with the rise of the mobile sensors studies on moving cameras have emerged over time...
Cited by0BibtexViews50Links
0
0
arXiv: Computer Vision and Pattern Recognition, (2019)
Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection ...
Cited by45BibtexViews31Links
0
0
IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 4 (2018): 834-848
In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or `atrous convolution', as a powerf...
Cited by5079BibtexViews220Links
0
0
CVPR, (2018)
The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior researc...
Cited by3270BibtexViews178Links
0
0
IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 3 (2018): 611-625
Direct Sparse Odometry (DSO) is a visual odometry method based on a novel, highly accurate sparse and direct structure and motion formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model paramete...
Cited by966BibtexViews75Links
0
0
IEEE Access, (2018)
The recent breakthrough in artificial intelligence in the form of tabula-rasa learning of AlphaGo Zero owes a fair share to deep Residual Networks that were originally proposed for the task of image recognition
Cited by473BibtexViews62Links
1
0
computer vision and pattern recognition, pp.2261-2269, (2017)
Cited by9628BibtexViews96
0
0
ICCV, no. 2 (2017): 386-397
We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Fast...
Cited by7773BibtexViews317Links
0
0
ICCV, (2017): 2242-2251
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an ...
Cited by6352BibtexViews222Links
0
0
IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 12 (2017): 2481-2495
We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classificatio...
Cited by5896BibtexViews308Links
0
0
computer vision and pattern recognition, (2017)
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply ...
Cited by5412BibtexViews293Links
0
0
CVPR, (2017)
We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on s...
Cited by4803BibtexViews185Links
0
0
CVPR, (2017)
Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory i...
Cited by4484BibtexViews373Links
0
0
ICCV, no. 2 (2017): 318-327
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparse</italic> set of candidate object loc...
Cited by4245BibtexViews334Links
0
0
Keywords
Feature ExtractionImage SegmentationNeural NetworksComputer VisionImage ClassificationComputational ModelingNeural NetsDetectorsLearning (artificial Intelligence)Object Detection
Authors
Ross B. Girshick
Paper 11
Kaiming He
Paper 9
Piotr Dollár
Paper 8
Xiaoou Tang
Paper 5
Shaoqing Ren
Paper 4
Jeff Donahue
Paper 4
Jian Sun
Paper 4
Christian Szegedy
Paper 4
Dumitru Erhan
Paper 3