Text Polishing with Chinese Idiom: Task, Datasets and Pre-trained Baselines.

ACM Trans. Asian Low Resour. Lang. Inf. Process.(2023)

Cited 0|Views6
No score
This work presents the task of text polishing, which generates a sentence that is more graceful than the input sentence while retaining its semantic meaning. Text polishing has great value in real usage and is an important component in modern writing assistance systems. However, the task is still not well studied in the literature. Further research in this important direction requires more formal task definitions, benchmark datasets, and powerful baseline models. In this work, we formulate the task as a context-dependent text generation problem and conduct a case study on the text polishing with Chinese idiom. To circumvent the difficulties of task data annotation, we propose a semi-automatic data construction pipeline based on human-machine collaboration, and establish a large-scale text polishing dataset consisting of 1.5 million instances. We propose two types of task-specific pre-training objectives for the text polishing task and implement a series of Transformer-based models pre-trained on a massive Chinese corpus as baselines. We conduct extensive experiments with the baseline models on the constructed text polishing datasets and have some major findings. The human evaluation further reveals the polishing ability of the final system.
Translated text
Key words
chinese idiom,text polishing,task,baselines,pre-trained
AI Read Science
Must-Reading Tree
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined