Saliency on a chip

msra(2007)

引用 23|浏览13
暂无评分
摘要
Selective-visual-attention algorithms have been successfully implemented in analog VLSI circuits.1 However, in addition to the usual issues of analog VLSI—such as the need to fine-tune a large number of biases—these implementations lack the spatial resolution and pre-processing capabilities to be truly useful for image-processing applications. Here we take an alternative approach and implement a neuro-mimetic algorithm for selective visual attention in digital hardware. The overarching aim of this project is to demonstrate the feasibility of using programmable logic for aiding the development and acceleration of image and video processing applications in vision. Saliency was picked as a design driver for this purpose so that the design flow could be understood and added to the neuromorphic engineer’s bag of tricks. The data-intensive and computationally challenging nature of the human visual attention system makes it an interesting algorithm for this study. Itti, Koch, and Niebur2 have suggested an influential model for saliency-based bottom-up visual attention and applied it successfully to a variety of visual tasks. The model attempts to represent visual attention in a computationally-efficient manner. The existing software implementation of this model, on a personal computer, runs at 30 frames per second (fps) at quarterVGA (320×240) resolution.3 Field-programmable gate arrays (FPGAs), on the other hand, offer an elegant solution to implementing the saliency computation in hardware, taking full advantage of the data parallelism available in the image processing operations.4 The reprogrammable nature of the FPGAs provides for a quick, cheap platform for prototyping and debugging, and greatly simplifies development. We wanted to be able to process a video stream at 30fps and at VGA resolution of 640×480 pixels (R, G, and B colors at 8bits/color/pixel), outputting the coordinates of the most salient pixel. Our current design exceeds that specification by processing composite video at 720×525 pixels and 30fps. The hardware was composed using modular elements that implement colorspace conversion, Gaussian filtering, interpolation, decimation, Figure 1. Block diagram of the FPGA saliency implementation.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要