A Bayesian Nonparametric Approach to Species Sampling Problems with Ordering
arXiv (Cornell University)(2022)
摘要
Species-sampling problems (SSPs) refer to a vast class of statistical
problems calling for the estimation of (discrete) functionals of the unknown
species composition of an unobservable population. A common feature of SSPs is
their invariance with respect to species labeling, which is at the core of the
Bayesian nonparametric (BNP) approach to SSPs under the popular Pitman-Yor
process (PYP) prior. In this paper, we develop a BNP approach to SSPs that are
not "invariant" to species labeling, in the sense that an ordering or ranking
is assigned to species' labels. Inspired by the population genetics literature
on age-ordered alleles' compositions, we study the following SSP with ordering:
given an observable sample from an unknown population of individuals belonging
to species (alleles), with species' labels being ordered according to weights
(ages), estimate the frequencies of the first r order species' labels in an
enlarged sample obtained by including additional unobservable samples. By
relying on an ordered PYP prior, we obtain an explicit posterior distribution
of the first r order frequencies, with estimates being of easy implementation
and computationally efficient. We apply our approach to the analysis of genetic
variation, showing its effectiveness in estimating the frequency of the oldest
allele, and then we discuss other potential applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要