GeaFlow: A Graph Extended and Accelerated Dataflow System.

Zhenxuan Pan, Tao Wu, Qingwen Zhao, Qiang Zhou, Zhiwei Peng, Jiefeng Li,Qi Zhang,Guanyu Feng,Xiaowei Zhu

Proc. ACM Manag. Data(2023)

引用 0|浏览23
暂无评分
摘要
GeaFlow is a distributed dataflow system optimized for streaming graph processing, and has been widely adopted at Ant Group, serving various scenarios ranging from risk control of financial activities to analytics on social networks and knowledge graphs. It is built on top of a base with full-fledged stateful stream processing capabilities, extended with a series of graph-aware optimizations to address the space explosion and programming complexity issues of conventional join-based approaches. We propose new state backends and streaming operators that facilitate processing on dynamic graph-structured datasets, reducing space consumed by states. We develop a hybrid domain-specific language that embeds Gremlin into SQL, supporting both table and graph abstractions over streaming data. In addition to streaming workloads, GeaFlow is also extensively used for some batch processing jobs. In the largest deployments to date, GeaFlow is able to process tens of millions of events per second and manage hundreds of terabytes of states.
更多
查看译文
关键词
graph extended
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要