TNT: Text Normalization based Pre-training of Transformers for Content Moderation

Fei Tan,Yifan Hu,Changwei Hu,Keqian Li,Kevin Yen

Conference on Empirical Methods in Natural Language Processing（2020）

引用 13|浏览35

暂无评分

摘要

In this work, we present a new language pre-training model TNT (Text Normalization based pre-training of Transformers) for content moderation. Inspired by the masking strategy and text normalization, TNT is developed to learn language representation by training transformers to reconstruct text from four operation types typically seen in text manipulation: substitution, transposition, deletion, and insertion. Furthermore, the normalization involves the prediction of both operation types and token labels, enabling TNT to learn from more challenging tasks than the standard task of masked word recovery. As a result, the experiments demonstrate that TNT outperforms strong baselines on the hate speech classification task. Additional text normalization experiments and case studies show that TNT is a new potential approach to misspelling correction.

查看译文

关键词

text normalization,content,tnt,pre-training

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要