Labelling of Annotated Condition Monitoring Data Through Technical Language Processing

Proceedings of the Annual Conference of the Prognostics and Health Management Society(2023)

引用 0|浏览1
暂无评分
摘要
We propose a novel approach to facilitate supervised fault diagnosis on unlabelled but annotated industry datasets using human-centric technical language processing and weak supervision. Fault diagnosis through Condition Monitoring (CM) is vital for high safety and resource efficiency in the green transition and digital transformation of the process industry. Learning-based Intelligent Fault Diagnosis (IFD) methods are required to automate maintenance decisions and improve decision support for analysts. A major challenge is the lack of labelled industry datasets, limiting supervised IFD research to lab datasets. However, features learned from lab environments generalise poorly to field environments due to different signal distributions, artificial induction or acceleration of lab faults, and lab set-up properties such as average frequency profiles affecting learned features. In this study, we investigate how the unstructured free text fault annotations and maintenance work orders that are present in many industrial CM systems can be used for IFD through technical language processing, based on recent advances in natural language supervision. We introduce two distinct pipelines, one based on contrastive pre-training on large datasets, and one based on a small-data human-centric approach with unsupervised clustering methods. Finally, we showcase one example of the small-data fault classification implementation on a CM industry dataset with a SentenceBERT language model, kMeans clustering, and conventional signal processing methods. Fault class imbalance and time-shift uncertainty is overcome with weak supervision through aggregates of features, and human-centric clustering is used to integrate technical knowledge with the annotation-based fault classes. We show that our model can separate cable and sensor fault recordings from bearing-related fault recordings with an F1-score of 93. To our knowledge, this is the first system to classify faults in field industry CM data based only on associated unstructured fault annotations.
更多
查看译文
关键词
annotated condition monitoring data,condition monitoring,labelling,language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要