Graph Feature Management: Impact, Challenges and Opportunities

GRADES-NDA@SIGMOD(2023)

引用 0|浏览14
暂无评分
摘要
Graph features are crucial to many applications such as recommender systems and risk management systems. The process to obtain useful graph features involves ingesting data from various upstream data sources, defining the desired graph features for the required applications, constructing a feature engineering workflow to compute the features, and storing and managing the resulting features for downstream tasks (e.g., graph AI and graph BI) and for future reuse. To the majority of users, especially SMEs and non-tech companies, this process poses daunting challenges as it requires users to not only learn various methods (e.g., graph analytical algorithms, non-GNN graph embeddings, GNNs) to define graph features and program their computation, but also learn many infrastructures (e.g., upstream databases, downstream ML systems, graph analytics systems) to compute, manage and use the graph features in production. These challenges have significantly restricted the wider applications of graph technologies such as graph AI and graph BI currently in industry. The current solution provided by major graph database vendors (e.g., Amazon Neptune, Neo4j, Tiger-Graph) is to connect various upstream and downstream systems to their own graph database, which is used to compute and manage graph features. However, such a solution ties users to a specific graph infrastructure that may not be the preferred infrastructure and may even require them to re-develop their applications on a new infrastructure. In addition, a specific graph database or infrastructure often does not have the best performance for all workloads and certainly does not support the computation of all types of graph features. As a result, the existing solution limits users' flexibility in choosing their own infrastructure and their productivity in developing their applications. In Part 1 of this talk, I will introduce various types of graph features and their applications. Then I will present some trends in using graph databases for graph feature computation and management, analyze the limitations of the existing methods, and identify the requirements of a graph feature management solution that is practical and highly usable to average users. In Part 2 of this talk, I will introduce our ongoing project that aims at providing a highly usable graph feature platform. Our solution decouples graph feature logic specification and management (i.e., how features are defined, coded and managed) from the generation and execution of the workflow for feature computation (i.e., execution plan generation and the actual execution), so that users can flexibly select different infrastructures suitable for the computation of specific types of graph features. It also manages the upstream, downstream and feature engineering and serving infrastructures, so as to free users from tedious tasks associated with deploying infrastructures and connecting them in a feature engineering dataflow. Thus, users can focus on creating and delivering innovative feature workflow logic. Finally, I will also highlight some possible future directions about graph feature management.
更多
查看译文
关键词
graph feature management,graph machine learning,graph databases,graph analytics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要