An Overview of Microsoft Academic Service (MAS) and Applications

    WWW (Companion Volume), pp. 243-246, 2015.

    Cited by: 277|Bibtex|Views49|Links
    EI
    Keywords:
    bing dialogRecommender systemsweb searchAcademic searchknowledge baseMore(7+)
    Wei bo:
    Two sources are considered for seeding the discovery process: the entities which are currently labelled as field of study type in the knowledge base; the entities that are identified by name-matching the keyword attributes in paper entities

    Abstract:

    In this paper we describe a new release of a Web scale entity graph that serves as the backbone of Microsoft Academic Service (MAS), a major production effort with a broadened scope to the namesake vertical search engine that has been publicly available since 2008 as a research prototype. At the core of MAS is a heterogeneous entity graph...More

    Code:

    Data:

    0
    Introduction
    • Recent years have witnessed a paradigm shift in how the knowledge on the Web is made available to the users.
    • Academic search; Recommender systems; Entity conflation
    • The second application, described in Section 3.2, demonstrates how a recommendation system can take advantage of the relationships across different types of entities to offer heterogeneous suggestions.
    Highlights
    • Recent years have witnessed a paradigm shift in how the knowledge on the Web is made available to the users
    • The trend is highly visible in the evolution of the Web search engine
    • Our work aims at leveraging this model in addressing the information needs in areas where the sheer amount of information available through a multitude of channels has exceeded the human capacity in processing them
    • For field of study (FOS) entity, the data are already present in the in-house knowledge base, the majority are not marked with the “field of study” entity type
    • Two sources are considered for seeding the discovery process: (1) the entities which are currently labelled as field of study type in the knowledge base; (2) the entities that are identified by name-matching the keyword attributes in paper entities
    • In Section 3.1, we describe the academic search engine based on the Bing Dialog model that can (1) serve constrained academic queries, and, (2) suggests other queries with same prefix
    Results
    • The authors model the real-life academic communication activities as a heterogeneous graph consisting of six types of entities: field of study, author, institution, paper, venue and event.
    • For paper and author entities, the authors collect data primarily from two types of sources: (1) feeds from publishers (e.g. ACM and IEEE), and (2) web-pages indexed by Bing.
    • For field of study (FOS) entity, the data are already present in the in-house knowledge base, the majority are not marked with the “field of study” entity type.
    • The authors' goal is to label the FOS entities in the in-house knowledge base when their type is missing.
    • Two sources are considered for seeding the discovery process: (1) the entities which are currently labelled as FOS type in the knowledge base; (2) the entities that are identified by name-matching the keyword attributes in paper entities.
    • The authors leverage the in-house knowledge base related entity relationship, which is calculated based on the entity contents, hyperlinks, and web-click signals, to identify the new FOS candidates.
    • Entity name Papers Authors Institutions Journals Conference series Conference instances Fields of study
    • In Section 3.1, the authors describe the academic search engine based on the Bing Dialog model that can (1) serve constrained academic queries, and, (2) suggests other queries with same prefix.
    • The authors have leveraged the Bing Dialog for serving academic search queries.
    • The data was modeled to showcase the Academic Paper Entity structure (e.g. Paper entity containing Title, Authors, Fields of Study, etc.), with views constructed to give a clean, easy to read format.
    • In Fig. 2(a) the portal is suggesting several fields of study that are related to ‘artificial intelligence’ even when the actual query is incomplete.
    Conclusion
    • Given a field of study, find out the most prominent authors, the most influential papers, the potential publishing venues and the upcoming events.
    • For other less well-known entities with much less query frequency, e.g. scholars who pioneered in a research domain, it is challenging to catch the relationship through sparse web-click signals.
    • In order to discover such connections , the authors utilize other types of “co-occurrence” in the academic contents: e.g. co-authorship - authors collaborated on the same paper and co-venue - people published in the same sets of conferences/journals etc..
    Summary
    • Recent years have witnessed a paradigm shift in how the knowledge on the Web is made available to the users.
    • Academic search; Recommender systems; Entity conflation
    • The second application, described in Section 3.2, demonstrates how a recommendation system can take advantage of the relationships across different types of entities to offer heterogeneous suggestions.
    • The authors model the real-life academic communication activities as a heterogeneous graph consisting of six types of entities: field of study, author, institution, paper, venue and event.
    • For paper and author entities, the authors collect data primarily from two types of sources: (1) feeds from publishers (e.g. ACM and IEEE), and (2) web-pages indexed by Bing.
    • For field of study (FOS) entity, the data are already present in the in-house knowledge base, the majority are not marked with the “field of study” entity type.
    • The authors' goal is to label the FOS entities in the in-house knowledge base when their type is missing.
    • Two sources are considered for seeding the discovery process: (1) the entities which are currently labelled as FOS type in the knowledge base; (2) the entities that are identified by name-matching the keyword attributes in paper entities.
    • The authors leverage the in-house knowledge base related entity relationship, which is calculated based on the entity contents, hyperlinks, and web-click signals, to identify the new FOS candidates.
    • Entity name Papers Authors Institutions Journals Conference series Conference instances Fields of study
    • In Section 3.1, the authors describe the academic search engine based on the Bing Dialog model that can (1) serve constrained academic queries, and, (2) suggests other queries with same prefix.
    • The authors have leveraged the Bing Dialog for serving academic search queries.
    • The data was modeled to showcase the Academic Paper Entity structure (e.g. Paper entity containing Title, Authors, Fields of Study, etc.), with views constructed to give a clean, easy to read format.
    • In Fig. 2(a) the portal is suggesting several fields of study that are related to ‘artificial intelligence’ even when the actual query is incomplete.
    • Given a field of study, find out the most prominent authors, the most influential papers, the potential publishing venues and the upcoming events.
    • For other less well-known entities with much less query frequency, e.g. scholars who pioneered in a research domain, it is challenging to catch the relationship through sparse web-click signals.
    • In order to discover such connections , the authors utilize other types of “co-occurrence” in the academic contents: e.g. co-authorship - authors collaborated on the same paper and co-venue - people published in the same sets of conferences/journals etc..
    Tables
    • Table1: Counts of various entities in MAS corpus
    Download tables as Excel
    Reference
    • Google inclusion guidelines. In http://www.google.com/intl/en/scholar/inclusion.html#indexing.
      Locate open access versionFindings
    • Microsoft academic data. In http://datamarket.azure.com/dataset/mrc/microsoftacademic, November 2013.
      Findings
    • A. Acharya, A. Verstak, H. Suzuki, S. Henderson, M. Iakhiaev, C. C. Lin, and N. Shetty. Rise of the rest: The growing impact of non-elite journals. CoRR, 2014.
      Google ScholarFindings
    • R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT 2004 Workshops, 2005.
      Google ScholarLocate open access versionFindings
    • X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohman, S. Sun, and W. Zhang. Knowledge vault: A web-based approach to probabilistic knowledge fusion. In KDD 2014.
      Google ScholarLocate open access versionFindings
    • J. Huang, S. Ertekin, and C. L. Giles. Efficient name disambiguation for large-scale databases. In PKDD, 2006.
      Google ScholarLocate open access versionFindings
    • B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–17, Apr. 1998.
      Google ScholarLocate open access versionFindings
    • V. Larivière, G. A. Lozano, and Y. Gingras. Are elite journals declining? JASIST, 65(4):649–655, 2014.
      Google ScholarLocate open access versionFindings
    • X. Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han. Cluscite: effective citation recommendation by information network-based clustering. In SIGKDD’14. ACM, 2014.
      Google ScholarLocate open access versionFindings
    • H. Shum. Integrating microsoft academic search into cortana (keynote). In Microsoft Research Faculty Summit, 2014.
      Google ScholarLocate open access versionFindings
    • H. Shum, Y. Kuo, and K. Wang. Bing dialog model: Intent, knowledge and user interaction. In Microsoft Research Faculty Summit, July 2010.
      Google ScholarLocate open access versionFindings
    • Y. Song, J. Huang, I. G. Councill, J. Li, and C. L. Giles. Efficient topic-based unsupervised name disambiguation. In JCDL, June 2007.
      Google ScholarLocate open access versionFindings
    • T. Strohman, W. B. Croft, and D. Jensen. Recommending citations for academic papers. In SIGIR, 2007.
      Google ScholarLocate open access versionFindings
    • K. Wang. Bing dialog: Towards richer interactions with web search. In ACM SIGIR, July 20
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments