Punctuation Prediction in Bangla Text

ACM Transactions on Asian and Low-Resource Language Information Processing(2023)

Cited 0|Views8
No score
Punctuation prediction is critical as it can enhance the readability of machine-transcribed speeches or texts significantly by adding appropriate punctuation. Furthermore, systems like Automatic Speech Recognizer (ASR) produce texts that are unpunctuated, making the readability difficult for humans and also hampers the performance of various natural language processing (NLP) tasks. Such NLP related tasks have been investigated thoroughly for English; however, very limited work is done for punctuation prediction in the Bangla language. In this study, we train a bidirectional recurrent neural network (BRNN) along with Attention model with a plausibly large Bangla dataset. Afterwards, we apply extensive postprocessing techniques for predicting punctuation more accurately with the employed model. Initially, we perform experimentation with a relatively imbalanced dataset, and our model shows promising results F1=56.9 for Period) in punctuation prediction. Later, we also investigate the model’s performance using a balanced Bangla dataset to achieve higher performance scores ( F1=62.2 for Question). Thus, the goal of this study is to propose an efficient approach that can predict punctuation in Bangla texts effectively. Our study also includes investigation on how our postprocessing techniques affect the prediction performance. Being an early attempt for the punctuation prediction in Bangla text, our work is expected to significantly contribute in the NLP field for the Bangla language, and will pave the way for future work with the Bangla language in this direction.
Translated text
Key words
Neural networks,punctuation prediction,natural language processing,BRNN
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined