Method and system for enhancing speech signal of human speaker in video using visual information

user-5e9d449e4c775e765d44d7c9(2020)

引用 0|浏览10
暂无评分
摘要
A method and system for enhancing a speech signal is provided herein. The method may include the following steps: obtaining an original video, wherein the original video includes a sequence of original input images showing a face of at least one human speaker, and an original soundtrack synchronized with said sequence of images; and processing, using a computer processor, the original video, to yield an enhanced speech signal of said at least one human speaker, by detecting sounds that are acoustically unrelated to the speech of the at least one human speaker, based on visual data derived fromthe sequence of original input images.
更多
查看译文
关键词
Signal,Sequence (medicine),Face (geometry),Speech recognition,Central processing unit,Computer science
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要