One Blade for One Purpose: Advancing Math Information Retrieval using Hybrid Search

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023(2023)

引用 5|浏览29
暂无评分
摘要
Neural retrievers have been shown to be effective for math-aware search. Their ability to cope with math symbol mismatches, to represent highly contextualized semantics, and to learn effective representations are critical to improving math information retrieval. However, the most effective retriever for math remains impractical as it depends on token-level dense representations for each math token, which leads to prohibitive storage demands, especially considering that math content generally consumes more tokens. In this work, we try to alleviate this efficiency bottleneck while boosting math information retrieval effectiveness via hybrid search. To this end, we propose MABOWDOR, a Math-Aware Best-of-Worlds Domain Optimized Retriever, which has an unsupervised structure search component, a dense retriever, and optionally a sparse retriever on top of a domain-adapted backbone learned by context-enhanced pretraining, each addressing a different need in retrieving heterogeneous data from math documents. Our hybrid search outperforms the previous state-of-the-art math IR system while eliminating efficiency bottlenecks. Our system is available at https://github.com/approach0/pya0.
更多
查看译文
关键词
math information retrieval,neural retriever,hybrid search
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络