Topic information collection based on Hidden Markov Model

Journal of Networks(2013)

引用 4|浏览28
暂无评分
摘要
Specific-subject oriented information collection is one of the key technologies of vertical search engines, which directly affects the speed and relevance of search results. The topic information collection algorithm is widely used for its accuracy. The Hidden Markov Model (HMM) is used to learn and judge the relevance between the Uniform Resource Locator (URL) and the topic information. The Rocchio method is used to construct the prototype vectors relevant to the topic information, and the HMM is used to learn the preferred browsing paths. The concept maps including the semantics of the webpage are constructed and the web's link structures can be decided. The validity of the algorithm is proved by the experiment at last. Comparing with the Best-First algorithm, this algorithm can get more information pages and has higher precision ratio. © 2013 ACADEMY PUBLISHER.
更多
查看译文
关键词
Crawler,Hidden Markov Model,Precision ratio,Prototype vector,Recall ratio,Topic information collection,URL (Uniform Resource Locator)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要