OCR Post Correction for Endangered Language Texts
empirical methods in natural language processing, pp. 5931-5942, 2020.
Abstract:
There is little to no data available to build natural language processing models for most endangered languages. However, textual data in these languages often exists in formats that are not machine-readable, such as paper books and scanned images. In this work, we address the task of extracting text from these resources. We create a bench...More
Code:
Data:
Tags
Comments