ROhAN : Row-order agnostic null models for statistically-sound knowledge discovery

Maryam Abuissa, Alexander Lee,Matteo Riondato

DATA MINING AND KNOWLEDGE DISCOVERY(2023)

引用 0|浏览25
暂无评分
摘要
We introduce a novel class of null models for the statistical validation of results obtained from binary transactional and sequence datasets. Our null models are Row-Order Agnostic (ROA) , i.e., do not consider the order of rows in the observed dataset to be fixed, in stark contrast with previous null models, which are Row-Order Enforcing (ROE) . We present ROhAN , an algorithmic framework for efficiently sampling datasets from ROA models according to user-specified distributions, which is a necessary step for the resampling-based statistical hypothesis tests employed to validate the results. ROhAN uses Metropolis-Hastings or rejection sampling to build on top of existing or future ROE sampling procedures. Our experimental evaluation shows that ROA models are very different from ROE ones, impacting the statistical validation, and that ROhAN is efficient, mixes fast, and scales well as the dataset grows.
更多
查看译文
关键词
Hypothesis testing,Pattern mining,Sequences,Transactions
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要