Modelling Correlated Bernoulli Data Part I: Theory and Run Lengths

arxiv(2022)

引用 0|浏览1
暂无评分
摘要
Binary data are very common in many applications, and are typically simulated independently via a Bernoulli distribution with a single probability of success. However, this is not always the physical truth, and the probability of a success can be dependent on the outcome successes of past events. Presented here is a novel approach for simulating binary data where, for a chain of events, successes (1) and failures (0) cluster together according to a distance correlation. The structure is derived from de Bruijn Graphs - a directed graph, where given a set of symbols, V, and a 'word' length, m, the nodes of the graph consist of all possible sequences of V of length m. De Bruijn Graphs are a generalisation of Markov chains, where the 'word' length controls the number of states that each individual state is dependent on. This increases correlation over a wider area. To quantify how clustered a sequence generated from a de Bruijn process is, the run lengths of letters are observed along with run length properties.
更多
查看译文
关键词
correlated,modelling,lengths
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要