Chrome Extension
WeChat Mini Program
Use on ChatGLM

Towards Best Practices for Open Datasets for LLM Training

Stefan Baack,Stella Biderman, Kasia Odrozek,Aviya Skowron, Ayah Bdeir, Jillian Bommarito,Jennifer Ding, Maximilian Gahntz, Paul Keller, Pierre-Carl Langlais, Greg Lindahl, Sebastian Majstorovic,Nik Marda, Guilherme Penedo, Maarten Van Segbroeck, Jennifer Wang,Leandro von Werra, Mitchell Baker, Julie Belião, Kasia Chmielinski,Marzieh Fadaee, Lisa Gutermuth, Hynek Kydlíček, Greg Leppert, EM Lewis-Jong, Solana Larsen,Shayne Longpre, Angela Oduor Lungati, Cullen Miller, Victor Miller,Max Ryabinin, Kathleen Siminyu, Andrew Strait, Mark Surman, Anna Tumadóttir,Maurice Weber, Rebecca Weiss, Lee White,Thomas Wolf

CoRR(2025)

Cited 0|Views22
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined