Semi-Random Forest Based on Representative Patterns for Noisy and Non-Stationary Data Stream

2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC)(2019)

Cited 2|Views3
No score
Noise often exists in the data stream, and the data distribution may change as time evolves, that is, concept drift, which makes the previous decision boundary of classifier is no longer suitable to new data, resulting in poor performance. To deal with these issues, this paper proposes a pattern-based classifier named Closed Frequent Pattern based Semi-Random Forest (CFPSRF), which adopts closed frequent patterns for the representation of the raw data to remove redundant information and noise. Meanwhile, a change measure for pattern sets is proposed, which measures the magnitude of distribution change by the mined patterns to determine whether the classifier needs to be updated. To evaluate the performance of CFPSRF, we perform experiments using real-world datasets and synthetic datasets respectively under MOA. The experimental results show that our method outperforms the related algorithms used for comparison in average classification accuracy, and can deal with the issues of concept drift and noise effectively.
Translated text
Key words
data stream,closed frequent pattern,semi random forest,concept drift,noise
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined