Big data analysis of cloud storage logs using spark.

SYSTOR(2017)

引用 6|浏览102
暂无评分
摘要
We use Apache Spark analytics to investigate the logs of an operational cloud object store service to understand how it is being used. This investigation involves going over very large amounts of historical data (PBs of records in some cases) collected over long periods of time retroactively. Existing tools, such as Elasticsearch-Logstash-Kibana (ELK), are mainly used for presenting short-term metrics and can-not perform advanced analytics such as machine learning. A possible solution is to save for long periods only certain aggregations or calculations produced from the raw log data, such as averages or histograms, however these must be decided in advance, and cannot be changed retroactively since the raw data has already been discarded. Spark allows us to gain insights going over historical data collected over long periods of time and to apply the historical models on online data in a simple and efficient way.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要