MathUSE: Mathematical information retrieval system using universal sentence encoder model

JOURNAL OF INFORMATION SCIENCE(2024)

引用 0|浏览1
暂无评分
摘要
In the scientific field, mathematical formulae are a significant factor in communicating the ideas and the fundamental principles of any scientific knowledge. Nowadays, the scientific research community generates a huge number of documents that comprise both textual and mathematical formulae. For the retrieval of textual information, numerous retrieval systems are present that generate excellent results. Nevertheless, these textual information retrieval systems are insufficient to handle the structure and scripting styles of the mathematical formulae. The recent past has perceived the research, which intends to retrieve the textual and mathematical formulae, but their impoverished results are symptomatic to the scope of improvement. In this article, we have implemented the formula-embedding approach, which encodes the formulae into fixed dimensional embedding vectors. For encoding of formula, we have used universal sentence encoder-based sentence-embedding model, which relies on transformer architecture and deep averaging network. The proposed models take the latex formula as an input and produce an output of fixed dimensional embedding representation. To achieve more promising results, the transformer model follows stacked self-attentions, point-wise fully connected layers and positional encoding for both the encoder and decoder. The obtained results have been compared with state-of-the-art existing approaches, and the comparison study revealed that the proposed approach offers better retrieval accuracy in terms of nDCG' = 0.217, MAP' = 0.178 and P@10 = 0.378 measures.
更多
查看译文
关键词
Deep averaging network,embedding,information retrieval,transformer architecture
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要