Annotating Documents using Active Learning Methods for a Maintenance Analysis Application.

AIPR(2020)

引用 0|浏览1
暂无评分
摘要
The aircraft cargo industry still maintains vast amounts of the maintenance history of aircraft components in electronic (i.e. scanned) but unsearchable images. For a given supplier, there can be hundreds of thousands of image documents only some of which contain useful information. Using supervised machine learning techniques has been shown to be effective in recognising these documents for further information extraction. A well known deficiency of supervised learning approaches is that annotating sufficient documents to create an effective model requires valuable human effort. This paper first shows how to obtain a representative sample from a supplier's corpus. Given this sample of unlabelled documents an active learning approach is used to select which documents to annotate first using a normalised certainty measure derived from a soft classifier's prediction distribution. Finally the accuracy of various selection approaches using this certainty measure are compared along each iteration of the active learning cycle. The experiments show that a greedy selection method using the uncertainty measure can significantly reduce the number of annotations required for a certain accuracy. The results provide valuable information for users and more generally illustrate an effective deployment of a machine learning application.
更多
查看译文
关键词
Active Learning, Document Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要