Towards Best Practices for Open Datasets for LLM Training
Stefan Baack,Stella Biderman, Kasia Odrozek,Aviya Skowron, Ayah Bdeir, Jillian Bommarito,Jennifer Ding, Maximilian Gahntz, Paul Keller, Pierre-Carl Langlais, Greg Lindahl, Sebastian Majstorovic,Nik Marda, Guilherme Penedo, Maarten Van Segbroeck, Jennifer Wang,Leandro von Werra, Mitchell Baker, Julie Belião, Kasia Chmielinski,Marzieh Fadaee, Lisa Gutermuth, Hynek Kydlíček, Greg Leppert, EM Lewis-Jong, Solana Larsen,Shayne Longpre, Angela Oduor Lungati, Cullen Miller, Victor Miller,Max Ryabinin, Kathleen Siminyu, Andrew Strait, Mark Surman, Anna Tumadóttir,Maurice Weber, Rebecca Weiss, Lee White,Thomas Wolf CoRR(2025)
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper