Boost recall in QSO selection from highly imbalanced photometric datasets
arxiv(2023)
摘要
Context. The identification of bright QSOs is of great importance to probe
the intergalactic medium and address open questions in cosmology. Several
approaches have been adopted to find such sources in currently available
photometric surveys, including machine learning methods. However, the rarity of
bright QSOs at high redshifts compared to contaminating sources (such as stars
and galaxies) makes the selection of reliable candidates a difficult task,
especially when high completeness is required. Aims. We present a novel
technique to boost recall (i.e., completeness within the considered sample) in
the selection of QSOs from photometric datasets dominated by stars, galaxies,
and low-z QSOs (imbalanced datasets). Methods. Our method operates by
iteratively removing sources whose probability of belonging to a noninteresting
class exceeds a user-defined threshold, until the remaining dataset contains
mainly high-z QSOs. Any existing machine learning method can be used as
underlying classifier, provided it allows for a classification probability to
be estimated. We applied the method to a dataset obtained by cross-matching
PanSTARRS1, Gaia, and WISE, and identified the high-z QSO candidates using both
our method and its direct multi-label counterpart. Results. We ran several
tests by randomly choosing the training and test datasets, and achieved
significant improvements in recall which increased from 50% to 85% for QSOs
with z>2.5, and from 70% to 90% for QSOs with z>3. Also, we identified a sample
of 3098 new QSO candidates on a sample of 2.6x10^6 sources with no known
classification. We obtained follow-up spectroscopy for 121 candidates,
confirming 107 new QSOs with z>2.5. Finally, a comparison of our candidates
with those selected by an independent method shows that the two samples overlap
by more than 90% and that both methods are capable of achieving a high level of
completeness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要