The probabilistic random forest applied to the QUBRICS survey: improving the selection of high-redshift quasars with synthetic data

MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY(2022)

引用 2|浏览29
暂无评分
摘要
Several recent works have focused on the search for bright, high-z quasars (QSOs) in the South. Among them, the QUasars as BRIght beacons for Cosmology in the Southern hemisphere (QUBRICS) survey has now delivered hundreds of new spectroscopically confirmed QSOs selected by means of machine learning algorithms. Building upon the results obtained by introducing the probabilistic random forest (PRF) for the QUBRICS selection, we explore in this work the feasibility of training the algorithm on synthetic data to improve the completeness in the higher redshift bins. We also compare the performances of the algorithm if colours are used as primary features instead of magnitudes. We generate synthetic data based on a composite QSO spectral energy distribution. We first train the PRF to identify QSO5 among stars and galaxies, then separate high-z quasar from low-z contaminants. We apply the algorithm on an updated data set, based on SkyMapper DR3, combined with Gaia eDR3, 2MASS, and WISE magnitudes. We find that employing colours as features slightly improves the results with respect to the algorithm trained on magnitude data. Adding synthetic data to the training set provides significantly better results with respect to the PRF trained only on spectroscopically confirmed QSO5. We estimate, on a testing data set, a completeness of similar to 86 per cent and a contamination of similar to 36 per cent. Finally, 206 PRF-selected candidates were observed: 149/206 turned out to be genuine QSOs with z > 2.5, 41 with z < 2.5, 3 galaxies and 13 stars. The result confirms the ability of the PRF to select high-z quasars in large data sets.
更多
查看译文
关键词
methods: data analysis, methods: statistical, astronomical data bases: miscellaneous, surveys, quasars: general
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要