XFC: Enabling automatic and fast operator synthesis for mobile deep learning compilation

Journal of Systems Architecture(2023)

引用 1|浏览6
暂无评分
摘要
Deploying a deep learning model relies on highly optimized implementations for all tensor operators in the model, especially on mobile devices with limited hardware resources. To relieve the burden of manual optimization, researchers propose deep learning compilers to automatically optimize operators with auto-tuning and code generation techniques. The auto-tuning system typically constructs a large tuning space, from which various tensor programs are sampled and evaluated to find the best implementation. Unfortunately, this process is quite time-consuming, often requiring hours for a single operator. To address this issue, this paper presents XFC, a framework that enables fast performance tuning by operator synthesis. The key idea of XFC is to abstract the hand-tuning process and generate tensor programs by bottom-up and hierarchical construction. We implemented XFC based on TVM and conduct extensive experiments to verify its effectiveness. Experiments show that, for various operators and mobile devices, XFC significantly reduces the tuning time from hours to seconds (over 700× speedup) while ensuring comparable execution performance for compiled operators (11.5% performance loss on average).
更多
查看译文
关键词
Code generation and synthesis, Compiler techniques and optimizations, Deep learning systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要