A Review of Two Text-Mining Packages

AMERICAN STATISTICIAN(2012)

引用 32|浏览31
暂无评分
摘要
The purpose of this article is to review two text mining packages, namely, WordStat and SAS TextMiner. WordStat is developed by Provalis Research. SAS TextMiner is a product Of SAS. We review the features offered by each package on each of the following key steps in analyzing unstructured data: (1) data preparation. including importing and cleaning- (2) performing association analysis; and (3) presenting the findings, including illustrative quotes and graphs. We also evaluate each package on its ability to help researchers extract major themes from a dataset. Both packages offer a variety of features that effectively help researchers run associations and present results. However. in extracting themes from unstructured data, both packages were only marginally helpful. The researcher still needs to read the data and make all the difficult decisions. This finding stems from the fact that the software can search only for specific terms in documents or categorize documents based on common terms. Respondents, however. may use the same term or combination of terms to mean different things. This implies that a text mining approach, which is based on analysis units other than terms, may be more powerful in extracting themes, an idea we touch upon in the conclusion section.
更多
查看译文
关键词
clustering,correspondence analysis,theme extraction,unstructured data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要