Comparison of preprocessing approaches for text data in digital shop floor management systems

Procedia CIRP(2022)

引用 3|浏览3
暂无评分
摘要
In an increasing number of production companies shop floor management (SFM) is supported by digital systems. The data generated while working with these systems can be used for assistance systems to further enhance the value of digital SFM. Several assistance systems using text data from problem-solving processes have been suggested but had limited quality due to the domain specific language characteristics: short texts with spelling errors and the usage of synonyms. This research aims to quantify the improvement potentials of different preprocessing approaches on the quality of the assistance systems. For that and for comparison in the research community a public, labeled data set is needed. This paper introduces such a data set based on the characteristics identified in three real industry data sets. To overcome the problems in text processing of shop floor data (e.g. domain specific synonyms), several approaches are suggested, tested, and compared to a generic approach for text clustering. The study identifies best practices for the handling of shop floor text data and provides a data set with the goal of simplifying and stimulating research on this topic.
更多
查看译文
关键词
:text mining,natural language processing,data quality improvement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要