Proactive experiment-driven learning for system management

Proactive experiment-driven learning for system management(2007)

引用 25|浏览13
暂无评分
摘要
The overall behavior of a system depends on a large number of factors related to the underlying hardware, system software, and running applications. In addition, system behavior may be influenced by interactions among these factors, where the impact of an individual factor on a system depends on the settings of other factors. A 'system knowledge base' that captures how different factors and multifactor interactions affect the end-to-end behavior of a system is a prerequisite for managing systems effectively. This dissertation addresses the hypothesis that we can learn such a knowledge base in an automatic, proactive, and timely manner by planning and conducting experiments. An experiment is a run of the system for a specific setting of the system's workload, resource allocation, and configuration. In this dissertation, we develop a general experiment-driven framework that incorporates: (a) policies for automatic planning of experiments to explore a large space of factors and interactions efficiently; and (b) mechanisms to conduct experiments for three important system domains: Web services, batch computing, and storage servers. The policies and mechanisms leverage techniques from design of experiments, active machine learning, and system virtualization to build a sufficiently accurate system knowledge base quickly. The dissertation makes the following contributions: (1) Quantifies the linear and nonlinear impact of a factor or an interaction on system behavior, and develops experiment-planning algorithms to estimate the impact of important factors and interactions in a system. We use this work to rank the factors and interactions that can affect the performance (e.g., throughput) of multitier Web services. (2) Develops experiment-planning algorithms to build models that predict the system behavior as a function of factors and interactions that affect this behavior. We explore a continuum of modeling alternatives ranging from a priori models to black-box models. We learn models to enable task and data placement of batch computing applications, and to predict performance measures of Web services like response time and throughput. (3) Develops policies to determine how long to run an experiment and how many times to repeat an experiment to attain target levels of confidence and accuracy in experimental results at low cost. We use the policies to benchmark storage servers by systematically mapping a storage server's saturation throughput across a range of server workloads and configurations. Our empirical evaluation with real and synthetic applications on physical as well as virtual hardware resources shows that our experiment-driven framework can learn an effective knowledge base by conducting only 1-5% of the total number of possible experiments.
更多
查看译文
关键词
system virtualization,storage server,system software,system knowledge base,accurate system knowledge base,overall behavior,Proactive experiment-driven,web service,important system domain,system behavior,system management,end-to-end behavior
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要