C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

    Data Min. Knowl. Discov., Volume 30, Issue 5, 2015.

    Cited by: 8|Bibtex|Views17|Links
    EI
    Keywords:
    Cross-lingual text miningMultilingual topic modelingMultilingualityComparable dataCross-lingual knowledge transferMore(2+)

    Abstract:

    We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets with partially overlapping thematic content (e.g., aligned Wikipedia articles in two different languages). To this end, we develop a new bilingual probabilistic topic model called comparable bilingual latent Dirichlet allocation (C-BiLDA),...More

    Code:

    Data:

    Your rating :
    0

     

    Tags
    Comments