Measurement and analysis of online social networks

    Internet Measurement Comference, pp. 29-42, 2007.

    Cited by: 3545|Bibtex|Views58|Links
    EI
    Keywords:
    social networkdensely connected coreonline social network graphpopular online social networkaccessible user linkMore(10+)
    Wei bo:
    We have presented an analysis of the structural properties of online social networks using data sets collected from four popular sites

    Abstract:

    Online social networking sites like Orkut, YouTube, and Flickr are among the most popular sites on the Internet. Users of these sites form a social network, which provides a powerful means of sharing, organizing, and finding content and contacts. The popularity of these sites provides an opportunity to study the characteristics of online ...More

    Code:

    Data:

    Introduction
    • The Internet has spawned different types of information sharing systems, including the Web.
    • Unlike the Web, which is largely organized around content, online social networks are organized around users.
    • The authors begin with a brief overview of online social networks.
    • The authors describe a simple experiment the authors conducted to estimate how often the links between users are used to locate content in a social networking site like Flickr.
    • Online social networks have existed since the beginning of the Internet.
    • The graph formed by email users who exchange messages with each other forms an online social network.
    • It has been difficult to study this network at large scale due to its distributed nature
    Highlights
    • The Internet has spawned different types of information sharing systems, including the Web
    • The social networks we study in this paper exist in the databases of online social networking sites
    • Since the focus of this paper is to investigate the structure of online social networks, we focus on the large weakly connected component (WCC) of the corresponding graphs in the rest of this paper
    • Our measurements indicate that online social networks have a high degree of reciprocity, a tight core that consists of highdegree nodes, and a strong positive correlation in link degrees for connected users
    • We have presented an analysis of the structural properties of online social networks using data sets collected from four popular sites
    • For each network, the top 1% of nodes ranked by indegree has a more than 65% overlap with the top 1% of nodes ranked by outdegree
    • We have focused exclusively on the user graph of social networking sites; many of these sites allow users to host content, which in turn can be linked to other users and content
    Methods
    • The authors describe the data presented in this paper and the methodology the authors used to collect it.
    • The authors chose to crawl the user graphs by accessing the public web interface provided by the sites.
    • This methodology gives them access to large data sets from multiple sites.
    • Since the focus of this paper is to investigate the structure of online social networks, the authors focus on the large weakly connected component (WCC) of the corresponding graphs in the rest of this paper.
    • The nodes not included in the WCC tend to be either part of very small, isolated clusters or are not connected to other users at all
    Results
    • Example studies have used samples of 0.3% of Orkut users [4], less than 1% of LiveJournal communities [8], and 0.08% of MySpace users [4].
    • For each network, the top 1% of nodes ranked by indegree has a more than 65% overlap with the top 1% of nodes ranked by outdegree
    Conclusion
    • The authors' measurements indicate that online social networks have a high degree of reciprocity, a tight core that consists of highdegree nodes, and a strong positive correlation in link degrees for connected users.
    • What do these findings mean for developers?
    • Establishing the structure and dynamics of the content graph is an open problem, the solution to which will enable them to understand how content is introduced in these systems, how data gains popularity, how users interact with popular versus personal data, and so on
    Summary
    • Introduction:

      The Internet has spawned different types of information sharing systems, including the Web.
    • Unlike the Web, which is largely organized around content, online social networks are organized around users.
    • The authors begin with a brief overview of online social networks.
    • The authors describe a simple experiment the authors conducted to estimate how often the links between users are used to locate content in a social networking site like Flickr.
    • Online social networks have existed since the beginning of the Internet.
    • The graph formed by email users who exchange messages with each other forms an online social network.
    • It has been difficult to study this network at large scale due to its distributed nature
    • Methods:

      The authors describe the data presented in this paper and the methodology the authors used to collect it.
    • The authors chose to crawl the user graphs by accessing the public web interface provided by the sites.
    • This methodology gives them access to large data sets from multiple sites.
    • Since the focus of this paper is to investigate the structure of online social networks, the authors focus on the large weakly connected component (WCC) of the corresponding graphs in the rest of this paper.
    • The nodes not included in the WCC tend to be either part of very small, isolated clusters or are not connected to other users at all
    • Results:

      Example studies have used samples of 0.3% of Orkut users [4], less than 1% of LiveJournal communities [8], and 0.08% of MySpace users [4].
    • For each network, the top 1% of nodes ranked by indegree has a more than 65% overlap with the top 1% of nodes ranked by outdegree
    • Conclusion:

      The authors' measurements indicate that online social networks have a high degree of reciprocity, a tight core that consists of highdegree nodes, and a strong positive correlation in link degrees for connected users.
    • What do these findings mean for developers?
    • Establishing the structure and dynamics of the content graph is an open problem, the solution to which will enable them to understand how content is introduced in these systems, how data gains popularity, how users interact with popular versus personal data, and so on
    Tables
    • Table1: High-level statistics of our social networking site crawls
    • Table2: Power-law coefficient estimates (α) and corresponding Kolmogorov-Smirnov goodness-of-fit metrics (D). The Flickr, LiveJournal, and YouTube networks are well approximated by a power-law
    • Table3: Average path length, radius, and diameter of the studied networks. The path length between random nodes is very short in social networks
    • Table4: The observed clustering coefficient, and ratio to random Erdos-Reyni graphs as well as random power-law graphs
    • Table5: Table of the high-level properties of network groups including the fraction of users which use group features, average group size, and average group clustering coefficient
    Download tables as Excel
    Related work
    • In this section we describe studies of social networks, information networks, as well as work on complex network theory.

      3.1 Social networks

      Sociologists have studied many of the properties of social networks. Milgram [34] shows that the average path length between two Americans is 6 hops, and Pool and Kochen [46] provide an analysis of the small-world effect. The influential paper by Granovetter [20] argues that a social network can be partitioned into ‘strong’ and ‘weak’ ties, and that the strong ties are tightly clustered. For an overview of social network analysis techniques, we refer the reader to the book by Wasserman and Faust [51].

      As online social networks are gaining popularity, sociologists and computer scientists are beginning to investigate their properties. Adamic et al [3] study an early online social network at Stanford University, and find that the network exhibits small-world behavior, as well as significant local clustering. Liben-Nowell et al [32] find a strong correlation between friendship and geographic location in social networks by using data from LiveJournal. Kumar et al [26] examine two online social networks and find that both possess a large strongly connected component. Girvan and Newman observe that users in online social networks tend to form tightly knit groups [18]. Backstrom et al [8] examine snapshots of group membership in LiveJournal, and present models for the growth of user groups over time. We were able to verify these properties on a much larger scale.
    Funding
    • This research was supported in part by US National Science Foundation grant ANI-0225660
    Reference
    • Stanford WebBase Project. http://www-diglib.stanford.edu/~testbed/doc2/WebBase.
      Findings
    • L. A. Adamic. The Small World Web. In Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries (ECDL’99), Paris, France, Sep 1999.
      Google ScholarLocate open access versionFindings
    • L. A. Adamic, O. Buyukkokten, and E. Adar. A social network caught in the Web. First Monday, 8(6), 2003.
      Google ScholarFindings
    • Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of Topological Characteristics of Huge Online Social Networking Services. In Proceedings of the 16th international conference on World Wide Web (WWW’07), Banff, Canada, May 2007.
      Google ScholarLocate open access versionFindings
    • R. Albert, H. Jeong, and A.-L. Barabasi. The Diameter of the World Wide Web. Nature, 401:130, 1999.
      Google ScholarLocate open access versionFindings
    • L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world networks. Proceedings of the National Academy of Sciences (PNAS), 97:11149–11152, 2000.
      Google ScholarLocate open access versionFindings
    • A. Awan, R. A. Ferreira, S. Jagannathan, and A. Grama. Distributed uniform sampling in real-world networks. Technical Report CSD-TR-04-029, Purdue University, 2004.
      Google ScholarFindings
    • L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), Philadelphia, PA, Aug 2006.
      Google ScholarLocate open access versionFindings
    • A.-L. Barabasi and R. Albert. Emergence of Scaling in Random Networks. Science, 286:509–512, 1999.
      Google ScholarLocate open access versionFindings
    • L. Becchetti, C. Castillo, D. Donato, and A. Fazzone. A Comparison of Sampling Techniques for Web Graph Characterization. In Proceedings of the Workshop on Link Analysis (LinkKDD’06), Philadelphia, PA, Aug 2006.
      Google ScholarLocate open access versionFindings
    • V. Braitenberg and A. Schuz. Anatomy of a Cortex: Statistics and Geometry. Springer-Verlag, Berlin, 1991.
      Google ScholarFindings
    • A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph Structure in the Web: Experiments and Models. In Proceedings of the 9th International World Wide Web Conference (WWW’00), Amsterdam, May 2000.
      Google ScholarLocate open access versionFindings
    • A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data, Jun 2007. http://arxiv.org/abs/0706.1062v1.
      Findings
    • d. boyd. Friends, Friendsters, and Top 8: Writing community into being on social network sites. First Monday, 11(12), 2006.
      Google ScholarLocate open access versionFindings
    • P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae Debrecen, 5:290–297, 1959.
      Google ScholarLocate open access versionFindings
    • M. Faloutsos, P. Faloutsos, and C. Faloutsos. On Power-Law Relationships of the Internet Topology. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’99), Cambridge, MA, Aug 1999.
      Google ScholarLocate open access versionFindings
    • S. Garriss, M. Kaminsky, M. J. Freedman, B. Karp, D. Mazieres, and H. Yu. Re: Reliable Email. In Proceedings of the 3rd Symposium on Networked Systems Design and Implementation (NSDI’06), San Jose, CA, May 2006.
      Google ScholarLocate open access versionFindings
    • M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences (PNAS), 99:7821–7826, 2002.
      Google ScholarLocate open access versionFindings
    • Google Co-op. http://www.google.com/coop/.
      Findings
    • M. Granovetter. The Strength of Weak Ties. American Journal of Sociology, 78(6), 1973.
      Google ScholarLocate open access versionFindings
    • J. Kleinberg. Authoritative Sources in a Hyperlinked Environment. Journal of the ACM, 46:604–632, 1999.
      Google ScholarLocate open access versionFindings
    • J. Kleinberg. Navigation in a Small World. Nature, 406:845–845, 2000.
      Google ScholarLocate open access versionFindings
    • J. Kleinberg. The Small-World Phenomenon: An Algorithmic Perspective. In Proceedings of the 32nd ACM Symposium on Theory of Computing (STOC’00), Portland, OR, May 2000.
      Google ScholarLocate open access versionFindings
    • J. Kleinberg and S. Lawrence. The Structure of the Web. Science, 294:1849–1850, 2001.
      Google ScholarLocate open access versionFindings
    • J. M. Kleinberg and R. Rubinfeld. Short paths in expander graphs. In IEEE Symposium on Foundations of Computer Science (FOCS’96), Burlington, VT, Oct 1996.
      Google ScholarLocate open access versionFindings
    • R. Kumar, J. Novak, and A. Tomkins. Structure and Evolution of Online Social Networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), Philadelphia, PA, Aug 2006.
      Google ScholarLocate open access versionFindings
    • R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for Emerging Cyber-Communities. Computer Networks, 31:1481–1493, 1999.
      Google ScholarLocate open access versionFindings
    • S. Lee, R. Sherwood, and B. Bhattacharjee. Cooperative peer groups in NICE. In Proceedings of the Conference on Computer Communications (INFOCOM’03), San Francisco, CA, Mar 2003.
      Google ScholarLocate open access versionFindings
    • S. H. Lee, P.-J. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73, 2006.
      Google ScholarLocate open access versionFindings
    • L. Li and D. Alderson. Diversity of graphs with highly variable connectivity. Physics Review E, 75, 2007.
      Google ScholarLocate open access versionFindings
    • L. Li, D. Alderson, J. C. Doyle, and W. Willinger. Towards a Theory of Scale-Free Graphs: Definitions, Properties, and Implications. Internet Mathematics, 2(4):431–523, 2006.
      Google ScholarLocate open access versionFindings
    • D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic Routing in Social Networks. Proceedings of the National Academy of Sciences (PNAS), 102(33):11623–11628, 2005.
      Google ScholarLocate open access versionFindings
    • P. Mahadevan, D. Krioukov, K. Fall, and A. Vahdat. Systematic Topology Analysis and Generation Using Degree Correlations. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’06), Pisa, Italy, August 2006.
      Google ScholarLocate open access versionFindings
    • S. Milgram. The small world problem. Psychology Today, 2(60), 1967.
      Google ScholarLocate open access versionFindings
    • A. Mislove, K. P. Gummadi, and P. Druschel. Exploiting social networks for Internet search. In Proceedings of the 5th Workshop on Hot Topics in Networks (HotNets-V), Irvine, CA, Nov 2006.
      Google ScholarLocate open access versionFindings
    • M. Molloy and B. Reed. A critical point for random graphs with a given degree distribution. Random Structures and Algorithms, 6, 1995.
      Google ScholarLocate open access versionFindings
    • M. Molloy and B. Reed. The size of the giant component of a random graph with a given degree sequence. Combinatorics, Probability and Computing, 7, 1998.
      Google ScholarLocate open access versionFindings
    • R. Morselli, B. Bhattacharjee, J. Katz, and M. A. Marsh. Keychains: A Decentralized Public-Key Infrastructure. Technical Report CS-TR-4788, University of Maryland, 2006.
      Google ScholarFindings
    • MozillaCoop. http://www.mozilla.com.
      Findings
    • MySpace is the number one website in the U.S. according to Hitwise. HitWise Press Release, July, 11, 2006. http://www.hitwise.com/press-center/hitwiseHS2004/social-networking-june-2006.php.
      Findings
    • M. E. J. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences (PNAS), 98:409–415, 2001.
      Google ScholarLocate open access versionFindings
    • M. E. J. Newman. Mixing patterns in networks. Physics Review E, 67, 2003.
      Google ScholarLocate open access versionFindings
    • L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University, 1998.
      Google ScholarFindings
    • PayPerPost. http://www.payperpost.com.
      Findings
    • A. G. Phadke and J. S. Thorp. Computer relaying for power systems. John Wiley & Sons, Inc., New York, NY, USA, 1988.
      Google ScholarFindings
    • I. Pool and M. Kochen. Contacts and influence. Social Networks, 1:1–48, 1978.
      Google ScholarLocate open access versionFindings
    • D. Rezner. The Power and Politics of Weblogs. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (CSCW’04), Chicago, IL, Nov 2004.
      Google ScholarLocate open access versionFindings
    • G. Siganos, S. L. Tauro, and M. Faloutsos. Jellyfish: A Conceptual Model for the AS Internet Topology. Journal of Communications and Networks, 8(3):339–350, 2006.
      Google ScholarLocate open access versionFindings
    • Skype. http://www.skype.com.
      Findings
    • StumbleUpon. http://www.stumbleupon.com.
      Findings
    • S. Wasserman and K. Faust. Social Networks Analysis: Methods and Applications. Cambridge University Press, Cambridge, UK, 1994.
      Google ScholarFindings
    • D. Watts and S. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.
      Google ScholarLocate open access versionFindings
    • W. Willinger, D. Alderson, and L. Li. A pragmatic approach to dealing with high-variability in network measurements. In Proceedings of the 2nd ACM/Usenix Internet Measurement Conference (IMC’04), Taormina, Italy, Oct 2004.
      Google ScholarLocate open access versionFindings
    • Yahoo! MyWeb. http://myweb2.search.yahoo.com.
      Findings
    • H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman. SybilGuard: Defending against Sybil attacks via social networks. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’06), Pisa, Italy, August 2006.
      Google ScholarLocate open access versionFindings
    Your rating :
    0

     

    Tags
    Comments