Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023(2023)

引用 0|浏览5
暂无评分
摘要
Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of available memory resources and reduces data reuse. Other accelerators employ complex tiling during preprocessing or at runtime to determine the exact tile size based on its occupancy. This paper proposes a speculative tensor tiling approach, called overbooking, to improve buffer utilization by taking advantage of the distribution of nonzero elements in sparse tensors to construct larger tiles with greater data reuse. To ensure correctness, we propose a low-overhead hardware mechanism, Tailors, that can tolerate data overflow by design while ensuring reasonable data reuse. We demonstrate that Tailors can be easily integrated into the memory hierarchy of an existing sparse tensor algebra accelerator. To ensure high buffer utilization with minimal tiling overhead, we introduce a statistical approach, Swiftiles, to pick a tile size so that tiles usually fit within the buffer's capacity, but can potentially overflow, i.e., it overbooks the buffers. Across a suite of 22 sparse tensor algebra workloads, we show that our proposed overbooking strategy introduces an average speedup of 52.7x and 2.3x and an average energy reduction of 22.5x and 2.5x over ExTensor without and with optimized tiling, respectively.
更多
查看译文
关键词
Buffering Capacity,Sparse Tensor,Tensor Algebra,High Use,Energy Reduction,Data Reuse,Sparsity Pattern,Tile Size,Memory Hierarchy,Larger Size,Sample Distribution,Data Streams,Percentage Of Data,Position In Space,System Of Linear Equations,Low Occupancy,Uniform Shape,Low Overhead,Sparse Feature,Sparse Distribution,Maximum Occupancy,Tensor Of Size,Data Buffer,Compressed Format,Replacement Policy,High Sparsity,Einstein Summation,Runtime Cost,Race Conditions,Uncompressed
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要