Fault-tolerant Scheduler for Shareable Virtualized GPU Resource
Poster at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)(2016)
摘要
Recently container-based virtualization is variously used to maximize utilization of computer resource, along with traditional Virtual Machine. However, different from traditional resources, GPU was hard to be shared by multiple containers. Lately, a GPU can be shared by multiple containers using volume share feature. In addition, high-end GPU like NVIDIA K20 supports Hyper-Q which allows multiple CPU processes to access a single GPU. Although, there still are problems exist because of GPU’s distinctive characteristics. Unlike system memory, GPU memory cannot be swappable. Also, GPU kernels in single Streaming Microprocessor cannot be switched during its running. These restrictions make hard to share GPU by multiple containers, and may result in deadlock situation. In this paper, we propose an interface for new fault-tolerant scheduler considering GPU memory usage. We have implemented this interface to restrict the usage of GPU memory for each container to prevent deadlock situation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络