Boosting Domain-Specific Debug Through Inter-frame Compression

2022 International Conference on Field-Programmable Technology (ICFPT)(2022)

引用 1|浏览10
暂无评分
摘要
Acceleration of machine learning models is proving to be an important application for FPGAs. Unfortunately, debugging such models during training or inference is difficult. Software simulations of a machine learning system may be of insufficient detail to provide meaningful debug insight, or may require infeasibly long run-times. Thus, it is often desirable to debug the accelerated model while it is running on real hardware. Effective on-chip debug often requires instrumenting a design with additional circuitry to store run-time data, consuming valuable chip resources. Previous work has developed methods to perform lossy compression of signals by exploiting machine learning specific knowledge, thereby increasing the amount of debug context that can be stored in an on-chip trace buffer. However, all prior work compresses each successive element in a signal of interest independently. Since debug signals may have temporal similarity in many machine learning applications there is an opportunity to further increase trace buffer utilization. In this paper, we present an architecture to perform lossless temporal compression in addition to the existing lossy elementwise compression. We show that, when applied to a typical machine learning algorithm in realistic debug scenarios, we are able to store twice as much information in an on-chip buffer while increasing the total area of the debug instrument by approximately 25%. The impact is that, for a given instrumentation budget, a significantly larger trace window is available during debug, possibly allowing a designer to narrow down the root cause of a bug faster.
更多
查看译文
关键词
Field-Programmable Gate Arrays,Debug,Instrumentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要