Descriptive Inference Using Large, Unrepresentative Nonprobability Samples: an Introduction for Ecologists
ECOLOGY(2024)
UK Ctr Ecol & Hydrol
Abstract
In the age of big data, it is essential to remember that the size of a dataset is not all that matters. This is particularly true where the goal is to draw inferences about some wider population, in which case it is far more important that the data are representative of that population. It is possible to adjust unrepresentative samples so that they more closely resemble the population in terms of “auxiliary variables”. If the auxiliaries predict sample inclusion and/or the variable of interest well, then the adjusted sample estimates will be closer to the truth. Several survey sampling techniques exist to perform such adjustments, but most are not familiar to ecologists. We applied five types of adjustment—subsampling, quasi-randomisation, poststratification, superpopulation modelling, and multilevel regression and poststratification—to a simple two-part biodiversity monitoring problem. The first part was to estimate mean occupancy of the plant Calluna vulgaris in Great Britain in two time-periods (1987-1999 and 2010-2019); the second was to estimate the difference between the two (i.e. the trend). Calluna vulgaris is an attractive case study because we have good estimates of its true distribution in both time-periods. We estimated the means and trend using large, but (originally) unrepresentative, samples. Compared to the unadjusted estimates, the means and trends estimated using most adjustment methods were more accurate, although their uncertainty intervals generally did not cover the true values. Quasi-randomisation performed especially poorly, and we explain why. Most adjustments were far more successful at bringing the distributions of the auxiliary variables in the samples closer to those in the population than they were at improving the estimates of population means and trends. This implies that the major challenge for adjusting unrepresentative samples in biodiversity monitoring is assembling a suitable set of auxiliary variables (i.e. predictors of sample inclusion and the variable of interest). This challenge will be particularly acute for poorly studied taxa and those whose habitat requirements or sampling biases are not reflected in available data.
MoreTranslated text
Key words
bias,biodiversity monitoring,nonprobability samples,weighting
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined