OCR Post Correction for Endangered Language Texts

empirical methods in natural language processing, pp. 5931-5942, 2020.

Cited by: 0|Bibtex|Views10|Links

Abstract:

There is little to no data available to build natural language processing models for most endangered languages. However, textual data in these languages often exists in formats that are not machine-readable, such as paper books and scanned images. In this work, we address the task of extracting text from these resources. We create a bench...More

Code:

Data:

Your rating :
0

 

Tags
Comments