Graph Regularization for Multi-lingual Topic Models

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020(2020)

引用 0|浏览32
暂无评分
摘要
Unsupervised multi-lingual language modeling has gained attraction in the last few years and poly-lingual topic models provide a mechanism to learn aligned document representations. However, training such models require translation-aligned data across languages, which is not always available. Also, in case of short texts like tweets, search queries, etc, the training of topic models continues to be a challenge. In this work, we present a novel strategy of creating a pseudo-parallel dataset followed by training topic models for sponsored search retrieval, that also mitigates the short text challenge. Our data augmentation strategy leverages easily available bipartite click-though graph that allows us to draw similar documents in different languages. The proposed methodology is evaluated on sponsored search system whose performance is measured on correctly matching the user intent, presented via the query, with ads provided by the advertiser. Our experiments substantiate the goodness of the method on EuroParl dataset and live search-engine traffic.
更多
查看译文
关键词
Cross-lingual Information Retrieval, Topic Models, Graph Regularization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要