Reflections on 30 Years of Language Resource Development and Sharing.

International Conference on Language Resources and Evaluation (LREC)(2022)

引用 0|浏览12
暂无评分
摘要
The Linguistic Data Consortium (LDC) was founded in 1992 to solve the problem that limitations in access to shareable data were impeding progress in Human Language Technology research and development. At the time, the US Defense Advanced Research Projects Agency had adopted the common task research management paradigm to impose additional rigor on their programs by providing shared objectives, data and evaluation methods. Early successes underscored the promise of this paradigm but also the need for a standing infrastructure to host and distribute the shared data. During LDC's initial five year grant, it became clear that the demand for linguistic data could not easily be met by the existing providers and that a dedicated data center could add capacity first for data collection and shortly thereafter for annotation. The expanding purview required expansions of LDC's technical infrastructure including systems support and software development. An open question for the center would be its role in research beyond data development, a question that has since been addressed. Over its 30 years history, LDC has performed multiple roles ranging from neutral, independent data provider to multisite programs, to creator of exploratory data in tight collaboration with system developers, to research group focused on data intensive investigations.
更多
查看译文
关键词
language resources, linguistic data, annotation, data centers, data intensive research
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要