Prediction using hierarchical data: Applications for automated detection of cervical cancer

Statistical Analysis and Data Mining(2015)

引用 29|浏览26
暂无评分
摘要
AbstractAlthough the Papanicolaou smear has been successful in decreasing cervical cancer incidence in the developed world, there exist many challenges for implementation in the developing world. Quantitative cytology, a semi-automated method that quantifies cellular image features, is a promising screening test candidate. The nested structure of its data measurements of multiple cells within a patient provides challenges to the usual classification problem. Here we perform a comparative study of three main approaches for problems with this general data structure: i extract patient-level features from the cell-level data, ii use a statistical model that accounts for the hierarchical data structure, and iii classify at the cellular level and use an ad hoc approach to classify at the patient level. We apply these methods to a dataset of 1728 patients, with an average of 2600 cells collected per patient and 133 features measured per cell, predicting whether a patient had a positive biopsy result. The best approach we found was to classify at the cellular level and count the number of cells that had a posterior probability greater than a threshold value, with estimated 61% sensitivity and 89% specificity on independent data. Recent statistical learning developments allowed us to achieve high accuracy.
更多
查看译文
关键词
cross-validation,DNA ploidy,L1-regularized logistic regression,multilevel classification,quantitative cytology,variable selection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要