Temporal Integration for Word-Wise Caption and Scene Text Identification

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)(2017)

引用 7|浏览19
暂无评分
摘要
Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.
更多
查看译文
关键词
Caption text,Scene text,Wavelet coefficients,Temporal frames,Caption text classification,Scene text classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要