Benchmarking top-k keyword and top-k document processing with T2K2 and T2K2D2.

Future Generation Comp. Syst.(2018)

引用 6|浏览37
暂无评分
摘要
Top- k k keyword and top- k k document extraction are very popular text analysis techniques. Top- k k keywords and documents are often computed on-the-fly, but they exploit weighted vocabularies that are costly to build. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present T 2 2 K 2 2 , a top- k k keywords and documents benchmark, and its decision support-oriented evolution T 2 2 K 2 2 D 2 2 . Both benchmarks feature a real tweet dataset and queries with various complexities and selectivities. They help evaluate weighting schemes and database implementations in terms of computing performance. To illustrate our benchmarks’ relevance and genericity, we successfully ran performance tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand.
更多
查看译文
关键词
Top-k ,Top-k documents,Text analytics,Benchmarking,Weighting schemes,Database systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要