Unsupervised Log Anomaly Detection Based on Pre-training.

International Conference on Systems and Informatics(2023)

引用 0|浏览0
暂无评分
摘要
With the continuous development of software technology, the size and complexity of software systems and hardware systems are growing rapidly. Any failure can cause huge losses. It is very important to ensure the stability and reliability of the system. Logs are a rich, important and valuable source of information that can help debug the system and analyze the root cause. Transfer learning emerges as a possible solution to the problem of insufficient training data by learning from a source domain with abundant data knowledge or patterns and then applying them to different but related target domains to reduce the demand for training data. However, existing transfer learning-based log anomaly detection methods require the labeled datasets for fine-tuning. Therefore, unsupervised Log Anomaly Detection based on pre-training (LogUPT), is proposed to address the problem of fast and low-cost log anomaly detection among different software systems of the same architecture. The source system log data has labels, and the target system log has no labels. The source system logs are used to train the model as a pre-training; the target logs with higher confidence are filtered by the pre-training with unsupervised clustering method to construct pseudo-labels; the target logs with pseudo-labels are used to fine-tune (Fine-Tuning) the pre-training model to make the model more adaptive to the target system. Experiments on real datasets PageRank and WordCount show that when the number of log messages is only 8194, the F1 score of this method still reaches 85.5%.
更多
查看译文
关键词
Anomaly Detection,Training Data,F1 Score,System Reliability,Transfer Learning,Unsupervised Clustering,Unsupervised Methods,Target Domain,PageRank,Target System,Source Domain,Huge Losses,Valuable Source Of Information,Complex Software,Unsupervised Clustering Method,System Logs,Anomaly Detection Methods,Model Parameters,Model Performance,Large Amount Of Data,Pseudo Labels,Word Embedding,Fine-tuned Model,Operation And Maintenance,LSTM Model,Recall Rate,Higher Threshold,Detection Performance,System Anomalies,Logarithm Of The Number
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要