Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics

Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval(2019)

引用 0|浏览0
暂无评分
摘要
In this paper, word-frequency distribution of JACET 8000 basic words and text statistics were researched to compare and analyze differentials of English compositions (essays) written by native speakers and non-native speakers. As for the native speakers' essays, the Guiraud Index in each Level 2-8 to Average sentence length and Automated Readability Index had higher correlation coefficients. Meanwhile, on the non-native speakers' essays, the index values to Sentence count showed moderate correlation coefficients. It was observed that the productivity and readability of the compositions seem to depend on ranges of basic content words which native or non-native writers have acquired and can use in English. To verify the word-frequency distribution as proficiency rating measurement for non-native speakers, the estimation experiment was carried out based on a multiple-regression model using word-frequency distribution of 68 English compositions written by the non-native writers. The estimated scores of the learners showed a correlation score 0.475 to their actual TOEIC scores. These results confirmed the possibility of the word usage statistics for the objective evaluation of L2 (second language) learners' language proficiency.
更多
查看译文
关键词
Corpus-based language analysis, L2 writing ability evaluation, Text mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要