Traffic information extraction from a blogging platform using knowledge-based approaches and bootstrapping.

CIVTS(2014)

引用 2|浏览11
暂无评分
摘要
In this paper we propose a strategy to use messages posted in a blogging platform for real-time sensing of traffic-related information. Specifically, we use the data that appear in a blog, in Portuguese language, which is managed by a Brazilian daily newspaper on its online edition. We propose a framework based on two modules to infer the location and traffic condition from unstructured, non georeferenced short post in Portuguese. The first module relates to name-entity recognition (NER). It automatically recognizes three classes of named-entities (NEs) from the input post (LOCATION, STATUS and DATE). Here, a bootstrapping approach is used to expand the initially given list of locations, identifying new locations as well as locations corresponding to spelling variants and typographical errors of the known locations. The second module relates to relation extraction (RE). It extracts binary and ternary relations between such entities to obtain relevant traffic information. In our experiments, the NER module has yielded a F-measure of 96%, while the RE module resulted in 87%. Also, results show that our bootstrapping approach identifies 1; 058 new locations when 10, 000 short posts are analyzed.
更多
查看译文
关键词
vectors,knowledge based systems,data mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要