Zooming in on NYC taxi data with Portal.

BiDu-Posters@VLDB(2018)

引用 23|浏览21
暂无评分
摘要
In this paper we develop a methodology for analyzing transportation data at different levels of temporal and geographic granularity, and apply our methodology to the TLC Trip Record Dataset, made publicly available by the NYC Taxi u0026 Limousine Commission. This data is naturally represented by a set of trajectories, annotated with time and with additional information such as passenger count and cost. We analyze TLC data to identify hotspots, which point to lack of convenient public transportation options, and popular routes, which motivate ride-sharing solutions or addition of a bus route. Our methodology is based on using a system called Portal, which implements efficient representations and principled analysis methods for evolving graphs. Portal is implemented on top of Apache Spark, a popular distributed data processing system, is inter-operable with other Spark libraries like SparkSQL, and supports sophisticated kinds of analysis of evolving graphs efficiently. Portal is currently under development in the Data, Responsibly Lab at Drexel. We plan to release Portal in the open source in Fall 2017.
更多
查看译文
关键词
nyc taxi data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要