AggFirstJoin: Optimizing Geo-Distributed Joins using Aggregation-Based Transformations.


Today, data is generated in a geographically distributed manner in a wide variety of domains such as social networks, e-commerce, search engines, online advertisements, audio and video streaming, energy, smart cities, IoT sensors etc. Consequently, this data is stored across geographically distributed edges and data centers (DCs) near to the end-users and end-devices, the very sources of this data. Analyzing this geographically distributed data is challenging primarily due to two reasons: 1) constrained and costly WAN bandwidth links which connect the geo-distributed edges and DCs (henceforth collectively called as sites) [1], and 2) limited compute availability at each site (especially the edges) [2].
