Exploiting Lexicalized Statistical Patterns In Chinese Linguistic Analysis
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA(2013)
摘要
The web corpus has been used for linguistic analysis with the help of search engines. In this paper, we describe the concept of lexicalized patterns, which we exploit to obtain statistical information using the simple string matching strategy via search engines. We discuss the usage of lexicalized statistical patterns at three linguistic levels of Chinese analysis: lexical, syntactic and semantic. We develop a specialized search engine to get frequency counts for these patterns on SogouT(1) corpus. Experimental results show that lexicalized statistical patterns are effective on analyzing the cohesion of phrases, determining the phrasal category and discovering patient objects.
更多查看译文
关键词
Lexicalized statistical pattern, Chinese linguistic analysis, Web corpus, Natural language processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络