Effective string processing and matching for author disambiguation

KDD' 13: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Chicago Illinois August, 2013(2013)

引用 42|浏览2
暂无评分
摘要
Track 2 in KDD Cup 2013 aims at determining duplicated authors in a data set from Microsoft Academic Search. This type of problems appears in many large-scale applications that compile information from different sources. This paper describes our solution developed at National Taiwan University to win the first prize of the competition. We propose an effective name matching framework and realize two implementations. An important strategy in our approach is to consider Chinese and non-Chinese names separately because of their different naming conventions. Post-processing including merging results of two predictions further boosts the performance. Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board.
更多
查看译文
关键词
author disambiguation,different naming convention,important strategy,different source,effective string processing,kdd cup,public leader board,large-scale application,effective name,name matching,microsoft academic search,deduplication,national taiwan university,private leader board,feature engineering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要