AI helps you reading Science
AI generates interpretation videos
AI extracts and analyses the key points of the paper to generate videos automatically
AI parses the academic lineage of this thesis
AI extracts a summary of this paper
ArnetMiner: extraction and mining of academic social networks
KDD, pp.990-998, (2008)
https://aminer.org/citation, Citation Network Dataset. The data set is designed for research purpose only. The citation data is extracted from DBLP, ACM, and other sources.
- Extraction and mining of academic social networks aims at providing comprehensive services in the scientific research field.
- In an academic social network, people are not only interested in searching for different types of information, but are interested in finding semantics-based information.
- The social information obtained from user-entered profiles or by extraction using heuristics is sometimes incomplete or inconsistent; 2) Lack of a unified approach to efficiently model the academic network.
- Different types of information in the academic network were modeled individually, dependencies between them cannot be captured accurately
- Extraction and mining of academic social networks aims at providing comprehensive services in the scientific research field
- Compared with the previous topic modeling work, in this paper, we propose a unified topic model to simultaneously model the topical aspects of different types of information in the academic network
- We describe the architecture and the main features of the ArnetMiner system
- We further propose a unified topic model to simultaneously model the different types of information in the academic network
- The modeling results have been applied to expertise search and association search
- The authors describe the architecture and the main features of the ArnetMiner system.
- The authors propose a unified tagging approach to researcher profiling.
- About a half million researcher profiles have been extracted into the system.
- The system has integrated more than one million papers.
- The authors propose a probabilistic framework to deal with the name ambiguity problem in the integration.
- The authors further propose a unified topic model to simultaneously model the different types of information in the academic network.
- The authors conduct experiments for evaluating each of the proposed approaches.
- Experimental results indicate that the proposed methods can achieve a high performance
- Table1: Content features, Pattern features, and term features
- Table2: Relationships between papers
- Table3: Data set for name disambiguation
- Table4: Results on name disambiguation (%)
- Table5: Five topics discovered by ACT1 on the Arnetminer data. Each topic is shown with the top 8 words and their corresponding probabilities. Top 6 authors and top 6 conferences are shown with each topic. The titles are our interpretation of the topics
- Table6: Performance of six expertise search approaches (%)
- Table7: Top 5 representative words and top 5 authors associated to two conferences found by ACT1
- Table8: Top 5 representative words and top 5 conferences associated to two researchers found by ACT1
- 2.1 Person Profile Extraction
Several research efforts have been made for extracting person profiles. For example, Yu et al  propose a two-stage extraction method for identifying personal information from resumes. The first stage segments a resume into different types of blocks and the second stage extracts the detailed information such as Address and Email from the identified blocks. However, the method formalizes the profile extraction as several separate steps and conducts extraction in a more or less ad-hoc manner.
A few efforts also have been placed on the extraction of contact information from emails or from the Web. For example, Kristjansson et al  have developed an interactive information extraction system to assist the user to populate a contact database from emails. In comparison, profile extraction consists of contact information extraction as well as other different subtasks.
- The work is supported by the National Natural Science Foundation of China (90604025, 60703059), Chinese National Key Foundation Research and Development Plan (2007CB310803), and Chinese Young Faculty Research Funding (20070003093)
- It is also supported by IBM Innovation funding
- L. A. Adamic and E. Adar. How to search a social network. Social Networks, 27:187–203, 2005.
- C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 50:5–43, 2003.
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.
- K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. of SIGIR’06, pages 43–55, 2006.
- S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of KDD’04, pages 59–68, 2004.
- R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. In Proc. of WWW’05, pages 463–470, 2005.
- D. M. Blei and J. D. McAuliffe. Supervised topic models. In Proc. of NIPS’07, 2007.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
- D. Brickley and L. Miller. Foaf vocabulary specification. In Namespace Document, http://xmlns.com/foaf/0.1/, September 2004.
- C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR’04, pages 25–32, 2004.
- F. Ciravegna. An adaptive algorithm for information extraction from web-related texts. In Proc. of IJCAI’01 Workshop, August 2001.
- C. Cortes and V. Vapnikn. Support-vector networks. Machine Learning, 20:273–297, 1995.
- N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC’05, pages 199–205, 2005.
- H. Han, L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proc. of JCDL’04, pages 296–305, 2004.
- H. Han, H. Zha, and C. L. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proc. of JCDL’05, pages 334–343, 2005.
- T. Hofmann. Collaborative filerting via gaussian probabilistic latent semantic analysis. In Proc.of SIGIR’03, pages 259–266, 1999.
- T. Hofmann. Probabilistic latent semantic indexing. In Proc.of SIGIR’99, pages 50–57, 1999.
- H. Kautz, B. Selman, and M. Shah. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63–65, 1997.
- T. Kristjansson, A. Culotta, P. Viola, and A. McCallum. Interactive information extraction with constrained conditional random fields. In Proc. of AAAI’04, 2004.
- J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML’01, 2001.
- A. McCallum. Multi-label text classification with a mixture model trained by em. In Proc. of AAAI’99 Workshop, 1999.
- D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In Proc. of KDD’07, pages 500–509, 2007.
- T. Minka. Estimating a dirichlet distribution. In Technique Report, http://research.microsoft.com/minka/papers/dirichlet/, 2003.
- Z. Nie, Y. Ma, S. Shi, J.-R. Wen, and W.-Y. Ma. Web object retrieval. In Proc. of WWW’07, pages 81–90, 2007.
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proc. of UAI’04, 2004.
- M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proc. of SIGKDD’04, 2004.
- Y. F. Tan, M.-Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proc. of JCDL’06, pages 314–315, 2006.
- J. Tang, D. Zhang, and L. Yao. Social network extraction of academic researchers. In Proc. of ICDM’07, pages 292–301, 2007.
- X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proc. of SIGIR’06, pages 178–185, 2006.
- E. Xun, C. Huang, and M. Zhou. A unified statistical model for the identification of english basenp. In Proc. of ACL’00, 2000.
- X. Yin, J. Han, and P. Yu. Object distinction: Distinguishing objects with identical names. In Proc. of ICDE’2007, pages 1242–1246, 2007.
- K. Yu, G. Guan, and M. Zhou. Resume information extraction with cascaded hybrid model. In Proc. of ACL’05, pages 499–506, 2005.