MAYOR: Machine Learning and Analytics for Automated Operations and Recovery

Rashid Mijumbi,Abhaya Asthana, Carlos Bernal, Manuel Castejón

2019 28th International Conference on Computer Communication and Networks (ICCCN)(2019)

引用 4|浏览5
暂无评分
摘要
Communications systems continuously generate a big number of alarms. Such alarms are usually monitored by network operations centers (NOCs) from where steps to resolve the causes are launched either automatically or through a ticketing system. In order to respond to a practical number of alarms in real-time, automation is a must. This problem is more so in virtualized infrastructure since the number of alarm generating entities in such networks is significantly increased because their monitoring has to be performed for both physical as well as virtual functions. In this paper, we propose MAYOR: a suite of machine learning and analytics algorithms for automated operations and recovery. MAYOR is made up of a model generation entity which uses long term historic data to determine alarm persistence times, clusters, and patterns. To this, we model alarm persistence time as a normal distribution, and use the resulting cumulative distribution function to determine the time with an appropriate confidence. Moreover, we use sequential pattern mining and linear correlation to create alarm clusters. Finally, decision trees are used to create patterns between alarms as association rules. In addition, the system also has an adaptation entity that uses realtime alarms to perform short term adaptations. MAYOR has been implemented and evaluated using real telecommunications network alarm data as well as NOC settings. Evaluations show that the proposed persistence times can reduce 20% of static ones by atleast 80%, and that at least 23% of alarms can be predicted 1 hour before they appear with an accuracy of at least 80%.
更多
查看译文
关键词
MAYOR,machine learning,automated operations,model generation entity,alarm persistence times,model alarm persistence time,resulting cumulative distribution function,alarm clusters,realtime alarms,telecommunications network alarm data,communications systems,big number,network operations centers,ticketing system,NOC
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要