Training Schemes for the Transliteration of the Balinese Script Into the Latin Script on Palm Leaf Manuscript Images

2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR)(2018)

引用 0|浏览6
暂无评分
摘要
Considering the importance of the contents of the Balinese palm leaf manuscripts, transliteration system has to be developed in order to be able to read easily these manuscripts. The challenge comes from the fact that Balinese script is a syllabic script and the mapping between linguistic symbols and images of symbols is not straightforward. In addition, with a very limited training data availability, some adaptations of LSTM in the transliteration training scheme need to be designed, to be analyzed and to be evaluated. This paper contributes in proposing and evaluating some adapted segmentation free training schemes for the transliteration of the Balinese script into the Latin script from palm leaf manuscript images. We describe the generated synthetic dataset and the proposed training schemes at two different levels (word level and text line level) to transliterate the real word and text lines from palm leaf manuscript images. For word transliteration, in general, training schemes at word level perform better than training schemes at text line level. As comparison, the segmentation based transliteration method gives a very promising result. For text line transliteration, segmentation based transliteration method outperforms all segmentation free training schemes for the less degraded collections, while the segmentation free training schemes contributes in transliterating the text lines for more degraded manuscripts. Training at text line level with a pre-trained model at word level could give a better result in word transliteration while still keeping the optimal performances for text line transliteration.
更多
查看译文
关键词
transliteration, LSTM, synthetic data, Balinese script, palm leaf manuscript images
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要