Compute the Term Contributed Frequency

Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference(2008)

引用 6|浏览0
暂无评分
摘要
In this paper, we propose an algorithm and data structure for computing the term contributed frequency (tcf) for all N-grams in a text corpus. Although term frequency is one of the standard notions of frequency in corpus-based natural language processing (NLP), there are some problems regarding the use of the concept to N-grams approaches such as the distortion of phrase frequencies. We attempt to overcome this drawback by building a DAG containing the proposed data structure and using it to retrieve more reliable term frequencies. Our proposed algorithm and data structure are more efficient than traditional term frequency extraction approaches and portable to various languages.
更多
查看译文
关键词
directed acyclic graph,algorithm design and analysis,data structures,natural language processing,data structure,time frequency analysis,text analysis,data mining,information retrieval,term frequency,directed graphs,computational modeling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要