Identifying Dictionary-Relevant Formulaic Sequences In Written And Spoken Corpora

INTERNATIONAL JOURNAL OF LEXICOGRAPHY(2020)

引用 2|浏览0
暂无评分
摘要
In view of the pervasiveness of formulaic language in human communication and the growing awareness of its relevance to modern lexicography, this study presents a corpus-driven identification, analysis and comparison of dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions related to interaction management and mitigation. The final evaluation of measures used in the identification process demonstrates their relative suitability for corpus-driven identification of dictionary-relevant formulaic expressions, with their precision varying in relation to corpus size and length of sequences under investigation.
更多
查看译文
关键词
phraseology, lexical bundles, corpus analysis, lexical frequency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要