A novel distance that reduces information loss in continuous characters with few observations

Gerardo A. Lo Valvo,Oscar E. R. Lehmann,Diego Balseiro

PALAEONTOLOGIA ELECTRONICA(2023)

引用 0|浏览0
暂无评分
摘要
The calculation of pairwise distances is a fundamental step in many statistical analyses in biology and paleontology. The most commonly used distances work with a single observation per object and character, but there are scenarios where multiple observations are available per object. In these situations, the information for the char-acter spans an interval, and pairs of objects can have overlapping intervals, which fur-ther complicates the distance calculation. Some coefficients can deal with this wealth of information but are either too coarse to provide detailed results or too computation-ally demanding for even moderately large data sets. Here, we present the Distance Between Intervals (DBI) as a novel semi-metric distance that can accommodate both singular and multiple observations per object by analyzing them as intervals. The DBI ranges from 0 to 1 when there is an overlap between the objects and from 1 to infinity when there is no overlap between them. It is easy to calculate and can be applied to a wide variety of data types. Both simulated and empirical test cases show that the DBI correctly ranks pairs of objects by their level of overlap and non-overlap, while other distances struggle to do it. Therefore, the DBI can provide a finer level of definition than other available distances for empirical data sets, while generally agreeing with the broad results they provide. An implementation of DBI is provided for the R program-ming language.
更多
查看译文
关键词
novel distance,continuous characters,information loss,few observations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要