Machine Learning Methods for "Small-n, Large-p" Problems: Understanding the Complex Drivers of Modern-Day Slavery

Research Square (Research Square)(2021)

引用 0|浏览0
暂无评分
摘要
Abstract 40 million people are estimated to be in some form of modern slavery across the globe. Understanding the factors that make any particular individual or geographical region vulnerable to such abuse is essential for the development of effective interventions and policy. Efforts to isolate and assess the importance of individual drivers statistically are impeded by two key challenges: data scarcity and high dimensionality. The hidden nature of modern slavery restricts available datapoints; and the large number of candidate variables that are potentially predictive of slavery inflates the feature space exponentially. The result is a highly problematic "small-n, large-p' setting, where overfitting and multi-collinearity can render more traditional statistical approaches inapplicable. Recent advances in non-parametric computational methods, however, offer scope to overcome such challenges. We present an approach that combines non-linear machine learning models and strict cross-validation methods with novel variable importance techniques, emphasising the importance of stability of model explanations via Rashomon-set analysis. This approach is used to model the prevalence of slavery in 48 countries, with results bringing to light the importance predictive factors - such as a country's capacity to protect the physical security of women, which has previously been under-emphasized in the literature. Out-of-sample estimates of slavery prevalence are then made for countries where no survey data currently exists.
更多
查看译文
关键词
slavery,machine learning,modern-day
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要