BitNet: Scaling 1-bit Transformers for Large Language Models

Hongyu Wang,Shuming Ma, Dong Liu,Shaohan Huang, Huan Wang, Lei Ma, Yun Fan, Linghua Wang,Yi Wu,Furu Wei

arXiv (Cornell University)(2023)

引用 0|浏览20
暂无评分
摘要
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits.
更多
查看译文
关键词
large language models,language models,transformers
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要