WebScalding: A Framework for Big Data Web Services

BIGDATASERVICE '15 Proceedings of the 2015 IEEE First International Conference on Big Data Computing Service and Applications(2015)

引用 4|浏览15
暂无评分
摘要
CareerBuilder (CB) currently has 50 million active resumes and 2 million active job postings. Our team has been working to provide the most relevant jobs for job seekers and resumes for employers and recruiters. These goals often lead to Big Data problems. In this paper, we introduce WebScalding, a Big Data framework designed and developed to solve some of the common large scale data challenges at CB. The WebScalding framework raises the level of abstraction of Twitter's Scalding framework to adapt to CB's unique challenges. The WebScalding framework helps users by ensuring that: 1) All internal web services are available as cascading pipe operations, 2) These pipe operations can read from our common data sources and create a pipe assembly and, 3) The pipe assembly such created can be executed in the CB Hadoop cluster as well as local machines without making any changes. We describe WebScalding using three case studies taken from actual internal projects that explain how data scientists at CB not well versed in Big Data tools and methodologies leverage WebScalding to design, implement, and test Big Data applications. We also compare the execution time of a WebScalding program with its sequential Python counterpart to illustrate the super linear speed up of WebScalding programs.
更多
查看译文
关键词
Big Data,Internet,Web services,data handling,parallel processing,social networking (online),Big Data Web services,CB Hadoop cluster,CareerBuilder,Twitter scalding framework,WebScalding framework,cascading pipe operations,pipe assembly,sequential Python
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要