POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset

PROCEEDINGS OF THE 12TH WORKSHOP ON GEOGRAPHIC INFORMATION RETRIEVAL (GIR'18)(2018)

引用 11|浏览73
暂无评分
摘要
Many services that perform information retrieval for Points of Interest (POI) utilize a Lucene-based setup with spatial filtering. While this type of system is easy to implement it does not make use of semantics but relies on direct word matches between a query and reviews leading to a loss in both precision and recall. To study the challenging task of semantically enriching POIs from unstructured data in order to support open-domain search and question answering (QA), we introduce a new dataset POIReviewQA. It consists of 20k questions (e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each question we sampled 10 reviews, and annotated each sentence in the reviews whether it answers the question and what the corresponding answer is. To test a system's ability to understand the text we adopt an information retrieval evaluation by ranking all the review sentences for a question based on the likelihood that they answer this question. We build a Lucene-based baseline model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a challenging problem for future research by the GIR community. The result technology can help exploit the thematic content of web documents and social media for characterisation of locations.
更多
查看译文
关键词
POI, Search, Question Answering, Semantic Enrichment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要