Jump Filter: A Dynamic Sketch for Big Data Governance.

International Journal of Software and Informatics(2023)

引用 0|浏览22
暂无评分
摘要
PDF HTML XML Export Cite reminder Jump Filter: A Dynamic Sketch for Big Data Governance DOI: 10.21655/ijsi.1673-7288.00296 Author: Affiliation: Clc Number: Fund Project: Article | Figures | Metrics | Reference | Related | Cited by | Materials | Comments Abstract:With the rapid development of information technology, the volume of data maintains exponential growth, and the value of data is hard to mine. This brings significant challenges to the efficient management and control of each link in the data life cycle, such as data collection, cleaning, storage, and sharing. Sketch uses a hash table/matrix/bit vector to track the core characteristics of data, such as frequency, cardinality, and membership. This mechanism makes the sketch itself metadata, which has been widely used in sharing, transmission, update, and other scenarios. The rapid flow characteristic of big data has spawned dynamic sketches. The existing dynamic sketches have the advantage of expanding or shrinking the capacity with the size of the data stream by dynamically maintaining a list of probabilistic data structures in a chain or tree structure. However, there are problems with the excessive space overhead and time overhead increasing with the increase in the dataset cardinality. This paper designs a dynamic sketch for big data governance on the basis of the advanced jump consistent hash. This method can simultaneously achieve the space overhead that grows linearly with the dataset cardinality and the constant time overhead of data processing and analysis, effectively supporting the demanding big data processing and analysis tasks for big data governance. The validity and efficiency of the proposed method are verified by the comparison with traditional methods on various synthetic and natural datasets. Reference Related Cited by
更多
查看译文
关键词
jump,data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要