Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records

SMC(2022)

引用 1|浏览4
暂无评分
摘要
The written notes in the Electronic Health Records contain a vast amount of information about patients. Implementing automated approaches for text classification tasks requires the automated methods to be well-interpretable, and topic models can be used for this goal as they can indicate what topics in a text are relevant to making a decision. We propose a new topic modeling algorithm, FLSA-E, and compare it with another state-of-the-art algorithm FLSA-W. In FLSA-E, topics are found by fuzzy clustering in a word embedding space. Since we use word embeddings as the basis for our clustering, we extend our evaluation with word-embeddings-based evaluation metrics. We find that different evaluation metrics favour different algorithms. Based on the results, there is evidence that FLSA-E has fewer outliers in its topics, a desirable property, given that within-topic words need to be semantically related.
更多
查看译文
关键词
more coherent topic modeling,electronic health records,spaces
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要