Classification of negative publication in mass media using topic modeling

Journal of Physics: Conference Series(2021)

引用 0|浏览1
暂无评分
摘要
Abstract The paper proposes a method for evaluating text documents by arbitrary criteria, combining the topic modeling on the text corpora and multiple-criteria decision making. The evaluation is based on an analysis of the corpora as follows: the conditional probability distribution of media by topics, properties and classes is calculated after the formation of the topic model of the corpora. Weights assigned by experts to each topic along with topic model can be applied to evaluate each document in the corpora according to each of the considered criteria and classes. The proposed method was applied to a corpus of 804829 news publications from 40 Kazakhstani sources published from 01.01.2018 to 31.12.2019, in order to classify negative information on socially significant topics. A BigARTM model was calculated (200 topics) and the proposed model was applied. Experiments confirm the general possibility of evaluating the sentiment of publications using the topic model of the text corpora, since ROC AUC score of 0.93 was achieved on the classification task.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要