Automatic Identification of Chinese Paired Discourse Connectives.

ICSC(2023)

引用 0|浏览5
暂无评分
摘要
This paper describes our approach to automatically identify paired Discourse Connectives (DCs) in Chinese texts. Discourse Connectives (DCs) are terms that connect two text spans and signal the discourse relations between them. Most DCs consist of a consecutive words (eg. as a result); however paired DCs are composed of non-consecutive words that together signal the discourse relation (eg. on one hand … on the other hand). Although paired DCs are not common in English, they are very frequent in Chinese. The contribution of this paper in two-fold: First, we propose a methodology for the automatic identification of Chinese paired DCs. Second, we present a new corpus based on the Chinese Discourse Treebank (CDTB) [1] annotated with paired DCs. To identify paired DCs, we experimented with two main approaches: hypothesis testing and supervised machine learning. Although the hypothesis testing approaches led to lower than expected results, the simple machine learning models achieved F-scores between 72.5%–75.6% with no fine-tuning.
更多
查看译文
关键词
Discourse Connectives,Corpus Creation,Chinese Discourse Treebank,Machine Learning,Hypothesis Testing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要