An active learning framework and assessment of inter-annotator agreement facilitate automated recogniser development for vocalisations of a rare species, the southern black-throated finch (Poephila cincta cincta)

John M. van Osta, Brad Dreis, Ed Meyer,Laura F. Grogan,J. Guy Castley

ECOLOGICAL INFORMATICS(2023)

引用 0|浏览3
暂无评分
摘要
The application of machine learning methods has led to major advances in the development of automated recognisers used to analyse bioacoustics data. To further improve the performance of automated call recognisers, we investigated the development of efficient data annotation strategies and how best to address uncertainty around ambiguous vocalisations. These challenges present a particular problem for species whose vocalisations are rare in field recordings, where collecting enough training data can be problematic and a species' vocalisations may be poorly documented. We provide an open access solution to address these challenges using two strategies. First, we applied an active learning framework to iteratively improve a convolutional neural network (CNN) model able to automate call identification for a target rare bird species, the southern black-throated finch (Poephila cincta cincta). We collected 9098 h of unlabelled audio recordings from a field study in the Desert Uplands Bioregion of Queensland, Australia, and used active learning to prioritise human annotation effort towards data that would best improve model fit. Second, we progressed methods for managing ambiguous vocalisations by applying machine learning methods more commonly used in medical image analysis and natural language processing. Specifically, we assessed agreement among human annotators and the CNN model (i.e. inter-annotator agreement) and used this to determine realistic performance outcomes for the CNN model and to identify areas where inter-annotator agreement may be improved. We also applied a classification approach that allowed the CNN model to classify sounds into an 'uncertain' category, which replicated a requirement of human-annotation and facilitated the comparison of human-model annotation performance. We found that active learning was an efficient strategy to build a CNN model where there was limited labelled training data available, and target calls were extremely rare in the unlabelled data. As few as five active learning iterations, generating a final labelled dataset of 1073 target calls and 5786 non-target sounds, were required to train a model to identify the target species with comparable performance to experts in the field. Assessment of inter-annotator agreement identified a bias in our model to align predictions most closely with those of the primary annotator and identified significant differences in inter-annotator agreement among subsets of our acoustic data. Our results highlight the use of inter-annotator agreement to understand model performance and identify areas for improvement in data annotation. We also show that excluding ambiguous vocalisations during data annotation results in an overestimation of model performance, an important consideration for datasets with inter-annotator disagreement.
更多
查看译文
关键词
Bioacoustics,Machine learning,Annotator agreement,Call recognition,Active learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要