Topic extraction by clustering word embeddings on short online texts

ELEKTROTEHNISKI VESTNIK(2022)

引用 0|浏览3
暂无评分
摘要
We demonstrate our topic extraction method in which topics are treated as clusters of word embeddings. The OPTICS algorithm is used to find small and arbitrarily-shaped clusters of embeddings, produced by a fastText model. The result is a set of dominant and non-dominant domain-specific topics. The focus of the method is on short online posts which are difficult to analyze with traditional topic extraction approaches because of the word collocation scarcity. The method is tested on dataset of posts from Twitter, LinkedIn and company blogs related to industrial automation. The method significantly outperforms traditional topic extraction approaches by finding relevant and understandable topics with related tokens.
更多
查看译文
关键词
topic extraction, industrial automation, text mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要