Adversarial Active Learning with Guided BERT Feature Encoding.
Recent advances in BERT-based models has significantly improved the performance of many applications on text data, such as text classification, question answering, e-commerce search and recommendation system, etc. However, the labelling of text data is often complex and time-consuming. While active learning can interactively query and label the data, the effectiveness of existing active learning methods is mostly limited by static text embedding approaches and by the insufficiency of training data. To address this critical problem, in this research we propose a BERT-based adversarial semi-supervised active learning (B-ASAL) model. In our approach, we use generative adversarial modelling and semi-supervised learning to guide the fine-tuning of the BERT and to optimize its corresponding text embeddings and feature encodings. The adversarial generator paired with a semi-supervised classifier guided the BERT model to adjust its feature encoding to best fit the distribution of not only class labels but also the discrimination of labeled and unlabeled data. Moreover, our B-ASAL model selects data points with high uncertainty and high diversity to be labeled using minimax entropy regularization. To our best knowledge, this is the first work that uses adversarial semi-supervised learning joined with active learning to guide and optimize feature encoding. We evaluate our method on various real-world text classification datasets and show that our model outperforms state-of-the-art approaches.更多