Application of Big Data Analytics and Machine Learning to Large-Scale Synchrophasor Datasets: Evaluation of Dataset ‘Machine Learning-Readiness’

IEEE Open Access Journal of Power and Energy(2022)

引用 2|浏览2
暂无评分
摘要
This manuscript presents a data quality analysis and holistic ‘machine learning-readiness’ evaluation of a representative set of large-scale, real-world phasor measurement unit (PMU) datasets provided under the United States Department of Energy-funded FOA 1861 research program. A major focus of this study is to understand the present-day suitability of large-scale, real-world synchrophasor datasets for application of commercially-available, off-the-shelf big data and supervised or semi-supervised machine learning (ML) tools and catalogue any major obstacles to their application. To this end, dataset quality is methodically examined through an interconnect-wide quantifications of basic bad data occurrences, a summary of several harder-to-detect data quality issues that can jeopardize successful application of machine learning, and an evaluation of the adequacy of event log labeling for supervised training of models used for online event classification. A global ‘six-point’ statistical analyses of several key dataset variables is demonstrated as a means by which to identify additional hard-to-detect data quality issues, also providing an example successful application of big data technology to extract insights regarding reasonable operational bounds of the US power system. Obstacles for application of commercial ML technologies are summarized, with a particular focus on supervised and semi-supervised ML. Lessons-learned are provided regarding challenges associated with present-day event labeling practices, large spatial scope of the dataset, and dataset anonymization. Finally, insight into efficacy of employed mitigation strategies are discussed, and recommendations for future work are made.
更多
查看译文
关键词
Phasor measurement units,synchrophasor datasets,statistical analysis,data quality analysis,label quality,nonymized datasets,supervised machine learning,big data,wide area monitoring system
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要