CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation
CoRR(2024)
摘要
Metaphor is a prominent linguistic device in human language and literature,
as they add color, imagery, and emphasis to enhance effective communication.
This paper introduces a large-scale high quality annotated Chinese Metaphor
Corpus, which comprises around 28K sentences drawn from a diverse range of
Chinese literary sources, such as poems, prose, song lyrics, etc. To ensure the
accuracy and consistency of our annotations, we introduce a comprehensive set
of guidelines. These guidelines address the facets of metaphor annotation,
including identifying tenors, vehicles, and grounds to handling the
complexities of similes, personifications, juxtapositions, and hyperboles.
Breaking tradition, our approach to metaphor generation emphasizes grounds and
their distinct features rather than the conventional combination of tenors and
vehicles. By integrating "ground" as a CoT (Chain of Thoughts) input, we are
able to generate metaphors that resonate more with real-world intuition. We
test generative models such as Belle, Baichuan, and Chinese-alpaca-33B using
our annotated corpus. These models are able to generate creative and fluent
metaphor sentences more frequently induced by selected samples from our
dataset, demonstrating the value of our corpus for Chinese metaphor research.
The code is available in the
https://anonymous.4open.science/r/Chinese_Metaphor_Explanation-63F2.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要