Synthetic Data Generation for Semantic Segmentation of Lecture Videos.

ICFHR(2022)

引用 0|浏览11
暂无评分
摘要
Lecture videos have become a great resource for students and teachers. These videos are a vast information source, but most search engines only index them by their audio. To make these videos searchable by handwritten content, it is important to develop accurate methods for analyzing such content at scale. However, training deep neural networks to their full potential requires large-scale lecture video datasets. In this paper, we use synthetic data generation to improve binarization of lecture videos. We also use it to semantically segment pixels into background, speaker, text, mathematical expressions, and graphics. Our method for synthetic data generation renders content from multiple handwritten and typeset datasets, and blends it into real images using random tight layouts and the location of the people. In addition, we also propose a mixed data approach that trains networks on two detection tasks at once: person and text. Both binarization and semantic segmentation are carried out using fully convolutional neural networks with a typical encoder-decoder architecture and residual connections. Our experiments show that pretraining on both synthetic and mixed data leads to better performance than training with real data alone. While final results are promising, more work will be needed to reduce the domain shift between synthetic and real data. Our code and data are publicly available.
更多
查看译文
关键词
Semantic Segmentation, Lecture videos, Synthetic data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要