Accurate and Practical Query-by-Example Using Multiple Deep Learning Models and Frame Compression Methods

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

引用 0|浏览1
暂无评分
摘要
Recently, studies of spoken term detection (STD) and spoken query STD (SQ-STD), also known as query-by-example (QbE), have been actively pursued. A representative method of QbE is posteriorgram matching using outputs of deep neural networks. However, that method requires much retrieval time and memory size. To address this difficulty, we proposed a maximum likelihood state sequence method (MLSS) for retrieval time reduction. This paper presents a proposal of two methods named "blank-cut (b-cut)" and "frame de-duplication (FDD)" to compress posteriorgram frames, by which we reduce retrieval times and memory sizes. Multiple matching scores are obtained using multiple deep learning models and architectures in the proposed methods. Then they are integrated. We achieved state-of-the-art retrieval accuracy as shown by evaluation experiments using two open test sets of about 30 hr of speech data. Furthermore, the proposed method achieved a retrieval time of less than 1 s and a memory requirement of about 1 GB. These results demonstrated the effectiveness of the proposed method.
更多
查看译文
关键词
blank-cut,frame de-duplication,query-by-example,maximum likelihood state sequence,spoken term detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要