COUNTATA: dataset labeling using pattern counts

Hosted Content(2020)

引用 14|浏览129
暂无评分
摘要
AbstractInformation regarding the counts of attributes combination is central to the profiling of a data set. It may reveal bias; it can help determine fitness for use. While counts of individual attribute values may be stored in some data set profiles, there are too many combinations of attributes for it to be practical to store counts for each combination. To this end, we present the notion of storing a "label" of limited size that can be used to obtain good estimates for these counts. A label contains information regarding the count of selected patterns-attributes values combinations-in the data. We define an estimation function, that uses this label to estimate the count of every pattern. Intuitively, there is a trade-off between the label size and its estimation error. We propose a demonstration of Countata, a system that allows the user to examine this trade-off as well as the label's count information. We will demonstrate the usefulness of Countata using real-life data, and illustrate the effectiveness of our estimation paradigm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要