Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction

empirical methods in natural language processing, pp. 7162-7169, 2020.

Cited by: 0|Bibtex|Views19|Links
Keywords:
novel language independent approachspan correctionneural machine translationtime costtext spanMore(9+)
Weibo:
We propose a novel language-independent approach to improve the efficiency of Grammatical Error Correction

Abstract:

We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC). ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. Then, ESC leverages a seq2...More

Code:

Data:

0
Introduction
  • Due to a growing number of error-corrected parallel sentences available in recent years, sequenceto-sequence models with the encoderdecoder architecture (Bahdanau et al, 2014; Sutskever et al, 2014; Luong et al, 2015) have become a popular solution to GEC, which take the source sentence as input and output the target sentence.
  • In contrast to conventional seq2seq approaches correcting the complete sentence, ESC only corrects the erroneous spans (see Figure 1(b)), which largely decreases the number of steps for decoding.
  • Experiments in both English and Chinese GEC benchmarks demonstrate the approach performs comparably to the state-of-the-art transformer based seq2seq model with less than 50% time cost for inference.
  • The authors' approach offers more flexibility to control correction, allowing them to adapt the precision-recall trade-off to various application scenarios
Highlights
  • Due to a growing number of error-corrected parallel sentences available in recent years, sequenceto-sequence models with the encoderdecoder architecture (Bahdanau et al, 2014; Sutskever et al, 2014; Luong et al, 2015) have become a popular solution to Grammatical Error Correction (GEC), which take the source sentence as input and output the target sentence
  • We propose a simple yet novel language-independent approach to improve the efficiency of GEC by dividing the task into
  • In Erroneous Span Detection (ESD), we use an efficient sequence tagging model to identify the text spans that are grammatically incorrect in the source sentence, as Figure 1(a) shows
  • In contrast to conventional seq2seq approaches correcting the complete sentence, Erroneous Span Correction (ESC) only corrects the erroneous spans (see Figure 1(b)), which largely decreases the number of steps for decoding
  • Compared to the Seq2seq implementation in Pytorch-fairseq, our approach saves over 50% time cost
  • Experiments in both English and Chinese GEC benchmarks demonstrate our approach performs comparably to the state-of-the-art transformer based seq2seq model with less than 50% time cost for inference
  • We propose a novel language-independent approach to improve the efficiency of GEC
Methods
Results
  • The non auto-regressive models like Levenshtein Transformer and LaserTagger are faster than the seq2seq baseline, their performance is not desirable.
  • Among the models without pretraining, the approach is the only one that performs comparably to the Seq2seq baseline with faster inference.
  • Table 2 compares the inference time of various approaches.
  • Compared to the Seq2seq implementation in Pytorch-fairseq, the approach saves over 50% time cost.
  • It is notable that among the implementations in Table 2, LaserTagger is the most efficient though its results are not good enough
Conclusion
  • The authors propose a novel language-independent approach to improve the efficiency of GEC.
  • The authors' approach performs comparably to the state-of-the-art seq2seq model with a considerable reduction in inference time, and can be adapted to other languages and offer more flexibility to control correction behavior.
  • Through the experiments in GEC, the authors verify the feasibility of span-specific decoding, which has been explored for text infilling (Raffel et al, 2019) and text rewriting.
  • It is inspiring and promising to be generalized to more rewriting tasks, which will be studied as the future work
Summary
  • Introduction:

    Due to a growing number of error-corrected parallel sentences available in recent years, sequenceto-sequence models with the encoderdecoder architecture (Bahdanau et al, 2014; Sutskever et al, 2014; Luong et al, 2015) have become a popular solution to GEC, which take the source sentence as input and output the target sentence.
  • In contrast to conventional seq2seq approaches correcting the complete sentence, ESC only corrects the erroneous spans (see Figure 1(b)), which largely decreases the number of steps for decoding.
  • Experiments in both English and Chinese GEC benchmarks demonstrate the approach performs comparably to the state-of-the-art transformer based seq2seq model with less than 50% time cost for inference.
  • The authors' approach offers more flexibility to control correction, allowing them to adapt the precision-recall trade-off to various application scenarios
  • Methods:

    5.1 Experimental Setting

    Following recent work in English GEC, the authors conduct experiments in the same setting with the restricted track of the BEA-2019 GEC shared task (Bryant et al, 2019), using FCE (Yannakoudakis et al, 2011), Lang-8 Corpus of Learner English (Mizumoto et al, 2011), NUCLE (Dahlmeier et al, 2013) and W&I+LOCNESS (Granger, 1998; Bryant et al, 2019) as training data.
  • The authors follow the setting of NLPCC-2018 Chinese GEC shared task (Zhao et al, 2018), using its official training3 and evaluation datasets.
  • For ESC, the authors train a Transformer model (Vaswani et al, 2017), using an encoder-decoder shared vocabulary of 32K Byte Pair Encoding (Sennrich et al, 2015) tokens for English and 8.4K Chinese character for Chinese.
  • The authors include more details of models, training and inference in the Appendix
  • Results:

    The non auto-regressive models like Levenshtein Transformer and LaserTagger are faster than the seq2seq baseline, their performance is not desirable.
  • Among the models without pretraining, the approach is the only one that performs comparably to the Seq2seq baseline with faster inference.
  • Table 2 compares the inference time of various approaches.
  • Compared to the Seq2seq implementation in Pytorch-fairseq, the approach saves over 50% time cost.
  • It is notable that among the implementations in Table 2, LaserTagger is the most efficient though its results are not good enough
  • Conclusion:

    The authors propose a novel language-independent approach to improve the efficiency of GEC.
  • The authors' approach performs comparably to the state-of-the-art seq2seq model with a considerable reduction in inference time, and can be adapted to other languages and offer more flexibility to control correction behavior.
  • Through the experiments in GEC, the authors verify the feasibility of span-specific decoding, which has been explored for text infilling (Raffel et al, 2019) and text rewriting.
  • It is inspiring and promising to be generalized to more rewriting tasks, which will be studied as the future work
Tables
  • Table1: Performance in English GEC benchmarks (i.e., CoNLL-14 and BEA-19 test). Seq2seq is our implemented seq2seq model based on Transformer (big) architecture, which is also the baseline for speed comparison (i.e., Faster/Slower in the table). The column Pretrained indicates whether the model is pretrained with synthetic or additional (e.g., Wikipedia revision logs) error-corrected data. indicates the models are implemented by us with the released codes of the original papers, trained and evaluated on the BEA-19 setting. The underlines indicate the scores are evaluated by us for the released model on the BEA-19 test data
  • Table2: Performance and total inference time of models without pretraining under various batch sizes (1/8/16/32) using 1 Nvidia V100 GPU with CUDA 10.2 in the English (CoNLL-14: 1,312 sentences) and Chinese (NLPCC18: 2,000 sentences) GEC test sets. The top group of models is implemented with Pytorch, while the bottom group is implemented with Tensorflow, thus their inference time cannot be compared. The performance of PIE in CoNLL-14 is not reported because it is pretrained with synthetic data and thus unfair to be compared here. Also, PIE has no result in NLPCC-18 because it is specific for English and difficult to be generalized to other languages
  • Table3: In-depth time cost (in second) analysis in CoNLL-14 which contains 1,312 test sentences. (base) and (large) indicate that the ESD models are fine-tuned from the Roberta base and large models respectively. ESD + seq2seq is implemented as follows: ESD first identifies the sentences that have grammatical errors, then the seq2seq baseline model only corrects these sentences. The column #Sent for ESC/seq2seq shows the actual number of sentences ESC/seq2seq processed. For the time cost in the brackets such as (23+114), the first term (e.g., 23) is the time cost by the ESD model while the last term is the cost (e.g., 114) by the other parts
  • Table4: As the probability threshold of ESD increases, precision increases while recall drops in CoNLL-14
  • Table5: Statistics of the datasets used for pretraining, fine-tuning and evaluation
  • Table6: Hyper-parameters values of ESD during the pretraining and fine-tuning
  • Table7: Hyper-parameters values of ESC during the pretraining and fine-tuning
  • Table8: The performance of ESD on the two official annotations for the CoNLL-14 shared task test dataset
  • Table9: Examples of our ESD & ESC approach in English for GEC. ESD first detects the grammatical incorrect text spans in the source sentence. Then the sentence with the erroneous span annotations (the Annotation row) are fed into the ESC model to generate the corresponding corrections (the Correction row) for the annotated spans. Finally, we replace the erroneous spans with the corresponding corrected text in ESC’s outputs (the Final Output row)
  • Table10: Examples of our ESD & ESC approach in Chinese for GEC
Download tables as Excel
Related work
  • Recently, many approaches have been proposed to improve GEC performance. However, except those adding synthetic erroneous data (Xie et al, 2018; Ge et al, 2018a; Grundkiewicz et al, 2019; Kiyono et al, 2019; Zhou et al, 2019) and Wikipedia revision logs (Lichtarge et al, 2019) for training, most methods cause an increase in latency. For example, language model and right-to-left (R2L) rescoring (Grundkiewicz et al, 2019; Kiyono et al, 2019) not only take time to rescore but also slow down the correction model with a larger beam size during inference; multi-round (iterative) decoding (Ge et al, 2018a,b; Lichtarge et al, 2019) needs to repeatedly run the model; BERT-fuse (Kaneko et al, 2020) adds extra computation for model fusion.

    In contrast to extensive studies on GEC performance, little work focuses on improving the efficiency of GEC models until the last years. source target

    As I’m new to here, I’m lost and don’t know where is to my hotel. As I’m new here, I’m lost and don’t know where my hotel is.

    (a) Detection 000100000001111

    Sequence Tagging Model

    As I’m new to here, I’m lost and don’t know where is to my hotel. (b) Correction

    my hotel is. Encoder Decoder
Funding
  • Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference
  • In contrast to conventional seq2seq approaches correcting the complete sentence, ESC only corrects the erroneous spans (see Figure 1(b)), which largely decreases the number of steps for decoding. Experiments in both English and Chinese GEC benchmarks demonstrate our approach performs comparably to the state-of-the-art transformer based seq2seq model with less than 50% time cost for inference
Reference
  • Abhijeet Awasthi, Sunita Sarawagi, Rasna Goyal, Sabyasachi Ghosh, and Vihari Piratla. 2019. Parallel iterative edit models for local sequence transduction. arXiv preprint arXiv:1910.02893.
    Findings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    Findings
  • Christopher Bryant, Mariano Felice, Øistein E Andersen, and Ted Briscoe. 2019. The bea-2019 shared task on grammatical error correction. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52–75.
    Google ScholarLocate open access versionFindings
  • Daniel Dahlmeier, Hwee Tou Ng, and Siew Mei Wu. 2013. Building a large annotated corpus of learner english: The nus corpus of learner english. In Proceedings of the eighth workshop on innovative use of NLP for building educational applications, pages 22–31.
    Google ScholarLocate open access versionFindings
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    Findings
  • Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding back-translation at scale. arXiv preprint arXiv:1808.09381.
    Findings
  • Tao Ge, Furu Wei, and Ming Zhou. 2018a. Fluency boost learning and inference for neural grammatical error correction. In Proceedings of the 56th Annual Meeting of the Association for Computational
    Google ScholarLocate open access versionFindings
  • Tao Ge, Furu Wei, and Ming Zhou. 2018b. Reaching human-level performance in automatic grammatical error correction: An empirical study. arXiv preprint arXiv:1807.01270.
    Findings
  • Sylviane Granger. 1998. The computer learner corpus: a versatile new source of data for SLA research. na.
    Google ScholarFindings
  • Roman Grundkiewicz, Marcin Junczys-Dowmunt, and Kenneth Heafield. 2019. Neural grammatical error correction systems with unsupervised pre-training on synthetic data. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 252–263.
    Google ScholarLocate open access versionFindings
  • Jiatao Gu, Changhan Wang, and Junbo Zhao. 2019. Levenshtein transformer. In Advances in Neural Information Processing Systems, pages 11179–11189.
    Google ScholarLocate open access versionFindings
  • Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, and Omer Levy. 2019. SpanBERT: Improving pre-training by representing and predicting spans. arXiv preprint arXiv:1907.10529.
    Findings
  • Masahiro Kaneko, Masato Mita, Shun Kiyono, Jun Suzuki, and Kentaro Inui. 2020. Encoder-decoder models can benefit from pre-trained masked language models in grammatical error correction. arXiv preprint arXiv:2005.00987.
    Findings
  • Masahiro Kaneko, Yuya Sakaizawa, and Mamoru Komachi. 2017. Grammatical error detection using error- and grammaticality-specific word embeddings. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
    Findings
  • Shun Kiyono, Jun Suzuki, Masato Mita, Tomoya Mizumoto, and Kentaro Inui. 2019. An empirical study of incorporating pseudo data into grammatical error correction. arXiv preprint arXiv:1909.00502.
    Findings
  • Jared Lichtarge, Chris Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, and Simon Tong. 2019. Corpora generation for grammatical error correction. arXiv preprint arXiv:1904.05780.
    Findings
  • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
    Findings
  • Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attentionbased neural machine translation. arXiv preprint arXiv:1508.04025.
    Findings
  • Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019.
    Google ScholarFindings
  • Tomoya Mizumoto, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. 2011. Mining revision log of language learning sns for automated japanese error correction of second language learners. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 147–155.
    Google ScholarLocate open access versionFindings
  • Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. Ground truth for grammatical error correction metrics. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 588–593.
    Google ScholarLocate open access versionFindings
  • Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. 2017. Jfleg: A fluency corpus and benchmark for grammatical error correction. arXiv preprint arXiv:1702.04066.
    Findings
  • Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. The conll-2014 shared task on grammatical error correction. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1–14.
    Google ScholarLocate open access versionFindings
  • Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. Gector–grammatical error correction: Tag, not rewrite. arXiv preprint arXiv:2005.12592.
    Findings
  • Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda. 2011. English gigaword fifth edition, linguistic data consortium. Google Scholar.
    Google ScholarLocate open access versionFindings
  • Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.
    Findings
  • Marek Rei. 2017. Semi-supervised multitask learning for sequence labeling. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
    Google ScholarLocate open access versionFindings
  • Marek Rei and Helen Yannakoudakis. 2017. Auxiliary objectives for neural error detection models. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications.
    Google ScholarLocate open access versionFindings
  • Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015. Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
    Findings
  • Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112.
    Google ScholarLocate open access versionFindings
  • Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
    Google ScholarLocate open access versionFindings
  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
    Google ScholarLocate open access versionFindings
  • Ziang Xie, Guillaume Genthial, Stanley Xie, Andrew Y Ng, and Dan Jurafsky. 2018. Noising and denoising natural language: Diverse backtranslation for grammar correction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 619–628.
    Google ScholarLocate open access versionFindings
  • Helen Yannakoudakis, Ted Briscoe, and Ben Medlock. 2011. A new dataset and method for automatically grading esol texts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 180–189. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Yi Zhang, Tao Ge, Furu Wei, Ming Zhou, and Xu Sun. 2019. Sequence-to-sequence pre-training with data augmentation for sentence rewriting. arXiv preprint arXiv:1909.06002.
    Findings
  • Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. arXiv preprint arXiv:1903.00138.
    Findings
  • Yuanyuan Zhao, Nan Jiang, Weiwei Sun, and Xiaojun Wan. 2018. Overview of the nlpcc 2018 shared task: grammatical error correction. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 439–445. Springer.
    Google ScholarLocate open access versionFindings
  • Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, and Ming Zhou. 2019. Improving grammatical error correction with machine translation pairs. arXiv preprint arXiv:1911.02825.
    Findings
  • Table 5 describes the details of datasets used for English GEC. Except the sythetic data, all the data can be found at the website4 of the BEA-19 shared task. The synthetic data is generated from English Wikipedia5, English Gigaword (Parker et al., 2011) and Newscrawl6 as the previous work (Ge et al., 4https://www.cl.cam.ac.uk/research/nl/ bea2019st/
    Locate open access versionFindings
Your rating :
0

 

Tags
Comments