Continual Pre-Training of Large Language Models: How to (re)warm Your Model?Kshitij Gupta,Benjamin Thérien,Adam Ibrahim,Mats L. Richter,Quentin Anthony,Eugene Belilovsky,Irina Rish,Timothée LesortCoRR(2023)引用 15|浏览97关键词Topic Modeling,Language Modeling,Sequence-to-Sequence Learning,Statistical Language ModelingAI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要