Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation.

MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture Taipei Taiwan October, 2016(2016)

引用 60|浏览173
暂无评分
摘要
Cache capacity and memory bandwidth play critical roles in application performance, particularly for data-intensive applications from domains that include machine learning, numerical analysis, and data mining. Many of these applications are also tolerant to imprecise inputs and have loose constraints on the quality of output, making them ideal candidates for approximate computing. This paper introduces a novel approximate computing technique that decouples the format of data in the memory hierarchy from the format of data in the compute subsystem to significantly reduce the cost of storing and moving bits throughout the memory hierarchy and improve application performance. This asymmetric compute-memory extension to conventional architectures, ACME, adds two new instruction classes to the ISA - load-concise and store-concise - along with three small functional units to the micro-architecture to support these instructions. ACME does not affect exact execution of applications and comes into play only when concise memory operations are used. Through detailed experimentation we find that ACME is very effective at trading result accuracy for improved application performance. Our results show that ACME achieves a 1.3X speedup (up to 1.8X) while maintaining 99% accuracy, or a 1.1X speedup while maintaining 99.999% accuracy. Moreover, our approach incurs negligible area and power overheads, adding just 0.005% area and 0.1% power to a conventional modern architecture.
更多
查看译文
关键词
asymmetric compute-memory architecture,cache capacity,memory bandwidth,machine learning,numerical analysis,data mining,approximate computing technique,ACME
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要