C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content
Data Min. Knowl. Discov., Volume 30, Issue 5, 2015.
Cross-lingual text miningMultilingual topic modelingMultilingualityComparable dataCross-lingual knowledge transferMore(2+)
We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets with partially overlapping thematic content (e.g., aligned Wikipedia articles in two different languages). To this end, we develop a new bilingual probabilistic topic model called comparable bilingual latent Dirichlet allocation (C-BiLDA),...More
Full Text (Upload PDF)