A Fractal Approach for Selecting an Appropriate Bin Size for Cell-Based Diversity Estimation.

JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES(2002)

引用 20|浏览5
暂无评分
摘要
A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number of points considered, and the dimensionality of the feature space. The peak of this distribution represents the optimal bin size for a given data set and sample size. Although box counting can be performed in an algorithmic ally efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要