Efficiently Identifying Duplicated Chinese Company Names in Large-Scale Registration Database.

Shaowu Liu,Jiyong Wei, Shouwei Wang

ADMA(2012)

引用 2|浏览17
暂无评分
摘要
It is always a challenge for large E-commerce platforms to audit mass information in real time manner, especially to identify multi-registrations efficiently. In this paper, we design a novel method for detecting multiregistrations in Chinese E-commerce platforms. In the proposed method, company names in Chinese are first divided into regional attribute, template attribute and the key attribute according to most companies' naming rules, by utilizing the Chinese word segmentation technology. This greatly narrows down the searching range with the extracted key attribute. Then, the similarity between the company names are computed by a dynamic threshold-based string matching algorithm. Finally, the company names with high similarity are detected. This method is evaluated by using the dataset from a real E-commerce company, and the results show this method has better accuracy, efficiency and scalability, compared with other methods. The proposed method is more precision and more time-saving than artificial means, therefore, it can save a lot of human cost for B2B industry. © Springer-Verlag 2012.
更多
查看译文
关键词
data/text mining,information audition,multi-registration detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要