Sliding Window Sampling over Data Stream – a Solution Based on Devil’s Staircases

2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)(2023)

引用 0|浏览3
暂无评分
摘要
The paper concerns sampling from a data stream {$S_{i}$}: at a moment t the sampler should hold a value $S_{t-j}$, where j$\in${0,$\ldots$,n-1} should be chosen according to an a priori specified probability distribution D on {0,$\ldots$,n-1}, where D as well as the window size n are fixed and do not depend on t. We assume that the sampler has a constant size memory, while n might be large, so the sampler cannot remember the last n values of the stream except for a few. The problem is that the window of the last n elements changes at each step and when we have to resample, then almost all values from which we have to choose are already forgotten. The case of uniform distribution D has been considered by Braverman, Ostrovsky, and Zaniolo in 2013. We present an alternative generic approach based on specific Markov chains called devil’s staircases. Unlike the previous solution, it is not limited to the uniform distribution: it generates a sample according to any admissible distribution in the window of size n and uses memory of size $\mathrm{O}(1)$. We provide sufficient conditions for the distribution D to be admissible. Although the class of such distributions is quite wide from the point of view of practical applications, we show some natural limitations for this class.
更多
查看译文
关键词
data stream,sliding window,random sampling,Markov chain,devil’s staircase
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要