TOCAB: A Dataset for Chinese Abusive Language Processing

I Chung,Chuan-Jie Lin

2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI)(2021)

引用 0|浏览5
暂无评分
摘要
This paper introduced TOCAB, a larger dataset for Chinese abusive language detection and classification. This dataset contains 121,344 real sentences collected from a social media site. Several baseline systems built by machine learning or deep learning were proposed to test this benchmark. BERT is the best baseline system which achieves F1-scores of 0.886 in detection and 0.781 in classification....
更多
查看译文
关键词
Deep learning,Social networking (online),Conferences,Bit error rate,Data science,Benchmark testing,Data models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要