An Overview of Microsoft Academic Service (MAS) and Applications

WWW (Companion Volume), pp. 243-246, 2015.

Cited by: 361|Bibtex|Views189|Links
EI
Keywords:
bing dialogRecommender systemsweb searchAcademic searchknowledge baseMore(7+)
Weibo:
Two sources are considered for seeding the discovery process: the entities which are currently labelled as field of study type in the knowledge base; the entities that are identified by name-matching the keyword attributes in paper entities

Abstract:

In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph...More

Code:

Data:

0
Introduction
  • Recent years have witnessed a paradigm shift in how the knowledge on the Web is made available to the users.
  • Academic search; Recommender systems; Entity conflation
  • The second application, described in Section 3.2, demonstrates how a recommendation system can take advantage of the relationships across different types of entities to offer heterogeneous suggestions.
Highlights
  • Recent years have witnessed a paradigm shift in how the knowledge on the Web is made available to the users
  • The trend is highly visible in the evolution of the Web search engine
  • Our work aims at leveraging this model in addressing the information needs in areas where the sheer amount of information available through a multitude of channels has exceeded the human capacity in processing them
  • For field of study (FOS) entity, the data are already present in the in-house knowledge base, the majority are not marked with the “field of study” entity type
  • Two sources are considered for seeding the discovery process: (1) the entities which are currently labelled as field of study type in the knowledge base; (2) the entities that are identified by name-matching the keyword attributes in paper entities
  • In Section 3.1, we describe the academic search engine based on the Bing Dialog model that can (1) serve constrained academic queries, and, (2) suggests other queries with same prefix
Results
  • The authors model the real-life academic communication activities as a heterogeneous graph consisting of six types of entities: field of study, author, institution, paper, venue and event.
  • For paper and author entities, the authors collect data primarily from two types of sources: (1) feeds from publishers (e.g. ACM and IEEE), and (2) web-pages indexed by Bing.
  • For field of study (FOS) entity, the data are already present in the in-house knowledge base, the majority are not marked with the “field of study” entity type.
  • The authors' goal is to label the FOS entities in the in-house knowledge base when their type is missing.
  • Two sources are considered for seeding the discovery process: (1) the entities which are currently labelled as FOS type in the knowledge base; (2) the entities that are identified by name-matching the keyword attributes in paper entities.
  • The authors leverage the in-house knowledge base related entity relationship, which is calculated based on the entity contents, hyperlinks, and web-click signals, to identify the new FOS candidates.
  • Entity name Papers Authors Institutions Journals Conference series Conference instances Fields of study
  • In Section 3.1, the authors describe the academic search engine based on the Bing Dialog model that can (1) serve constrained academic queries, and, (2) suggests other queries with same prefix.
  • The authors have leveraged the Bing Dialog for serving academic search queries.
  • The data was modeled to showcase the Academic Paper Entity structure (e.g. Paper entity containing Title, Authors, Fields of Study, etc.), with views constructed to give a clean, easy to read format.
  • In Fig. 2(a) the portal is suggesting several fields of study that are related to ‘artificial intelligence’ even when the actual query is incomplete.
Conclusion
  • Given a field of study, find out the most prominent authors, the most influential papers, the potential publishing venues and the upcoming events.
  • For other less well-known entities with much less query frequency, e.g. scholars who pioneered in a research domain, it is challenging to catch the relationship through sparse web-click signals.
  • In order to discover such connections , the authors utilize other types of “co-occurrence” in the academic contents: e.g. co-authorship - authors collaborated on the same paper and co-venue - people published in the same sets of conferences/journals etc..
Summary
  • Recent years have witnessed a paradigm shift in how the knowledge on the Web is made available to the users.
  • Academic search; Recommender systems; Entity conflation
  • The second application, described in Section 3.2, demonstrates how a recommendation system can take advantage of the relationships across different types of entities to offer heterogeneous suggestions.
  • The authors model the real-life academic communication activities as a heterogeneous graph consisting of six types of entities: field of study, author, institution, paper, venue and event.
  • For paper and author entities, the authors collect data primarily from two types of sources: (1) feeds from publishers (e.g. ACM and IEEE), and (2) web-pages indexed by Bing.
  • For field of study (FOS) entity, the data are already present in the in-house knowledge base, the majority are not marked with the “field of study” entity type.
  • The authors' goal is to label the FOS entities in the in-house knowledge base when their type is missing.
  • Two sources are considered for seeding the discovery process: (1) the entities which are currently labelled as FOS type in the knowledge base; (2) the entities that are identified by name-matching the keyword attributes in paper entities.
  • The authors leverage the in-house knowledge base related entity relationship, which is calculated based on the entity contents, hyperlinks, and web-click signals, to identify the new FOS candidates.
  • Entity name Papers Authors Institutions Journals Conference series Conference instances Fields of study
  • In Section 3.1, the authors describe the academic search engine based on the Bing Dialog model that can (1) serve constrained academic queries, and, (2) suggests other queries with same prefix.
  • The authors have leveraged the Bing Dialog for serving academic search queries.
  • The data was modeled to showcase the Academic Paper Entity structure (e.g. Paper entity containing Title, Authors, Fields of Study, etc.), with views constructed to give a clean, easy to read format.
  • In Fig. 2(a) the portal is suggesting several fields of study that are related to ‘artificial intelligence’ even when the actual query is incomplete.
  • Given a field of study, find out the most prominent authors, the most influential papers, the potential publishing venues and the upcoming events.
  • For other less well-known entities with much less query frequency, e.g. scholars who pioneered in a research domain, it is challenging to catch the relationship through sparse web-click signals.
  • In order to discover such connections , the authors utilize other types of “co-occurrence” in the academic contents: e.g. co-authorship - authors collaborated on the same paper and co-venue - people published in the same sets of conferences/journals etc..
Tables
  • Table1: Counts of various entities in MAS corpus
Download tables as Excel
Reference
  • Google inclusion guidelines. In http://www.google.com/intl/en/scholar/inclusion.html#indexing.
    Locate open access versionFindings
  • Microsoft academic data. In http://datamarket.azure.com/dataset/mrc/microsoftacademic, November 2013.
    Findings
  • A. Acharya, A. Verstak, H. Suzuki, S. Henderson, M. Iakhiaev, C. C. Lin, and N. Shetty. Rise of the rest: The growing impact of non-elite journals. CoRR, 2014.
    Google ScholarFindings
  • R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT 2004 Workshops, 2005.
    Google ScholarLocate open access versionFindings
  • X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohman, S. Sun, and W. Zhang. Knowledge vault: A web-based approach to probabilistic knowledge fusion. In KDD 2014.
    Google ScholarLocate open access versionFindings
  • J. Huang, S. Ertekin, and C. L. Giles. Efficient name disambiguation for large-scale databases. In PKDD, 2006.
    Google ScholarLocate open access versionFindings
  • B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–17, Apr. 1998.
    Google ScholarLocate open access versionFindings
  • V. Larivière, G. A. Lozano, and Y. Gingras. Are elite journals declining? JASIST, 65(4):649–655, 2014.
    Google ScholarLocate open access versionFindings
  • X. Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han. Cluscite: effective citation recommendation by information network-based clustering. In SIGKDD’14. ACM, 2014.
    Google ScholarLocate open access versionFindings
  • H. Shum. Integrating microsoft academic search into cortana (keynote). In Microsoft Research Faculty Summit, 2014.
    Google ScholarLocate open access versionFindings
  • H. Shum, Y. Kuo, and K. Wang. Bing dialog model: Intent, knowledge and user interaction. In Microsoft Research Faculty Summit, July 2010.
    Google ScholarLocate open access versionFindings
  • Y. Song, J. Huang, I. G. Councill, J. Li, and C. L. Giles. Efficient topic-based unsupervised name disambiguation. In JCDL, June 2007.
    Google ScholarLocate open access versionFindings
  • T. Strohman, W. B. Croft, and D. Jensen. Recommending citations for academic papers. In SIGIR, 2007.
    Google ScholarLocate open access versionFindings
  • K. Wang. Bing dialog: Towards richer interactions with web search. In ACM SIGIR, July 20
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments