Improving Embeddings for High-Accuracy Transformer-Based Address Matching Using a Multiple in-Batch Negatives Loss.

André V. Duarte,Arlindo L. Oliveira

International Conference on Machine Learning and Applications(2023)

引用 0|浏览0
暂无评分
摘要
Address matching is a crucial activity for post offices and companies responsible for parcel processing and delivery. Inaccurate delivery of parcels can significantly impact the reputation of these companies and result in considerable economic and environmental costs. This paper proposes a deep learning model that aims to increase efficiency on the address matching task for portuguese addresses. The model consists on a bi-encoder, trained to create meaningful embeddings of portuguese postal addresses, which is then used to retrieve from a normalized database the matches of the target unnormalized addresses. We argue that a good initialization of the bi-encoder weights is a crucial step for achieving optimal performance and we support our hypothesis by showing that training a transformer from scratch leads to better results, when compared with using a pre-trained model. We also evaluate the bi-encoder's performance when using a standard contrastive loss, where we carefully select the negative samples, versus using a multiple negatives ranking loss, where we use larger batch sizes with multiple random in-batch negatives. The model, trained from scratch with the multiple negatives ranking loss, was tested with data retrieved from a real-life scenario of portuguese addresses and exhibited a very high mapping accuracy, exceeding 99.60% at the door level. The implementation of this system in a real context of parcel deliveries is expected to result in significant efficiency gains in the distribution process. Such an implementation is currently under evaluation.
更多
查看译文
关键词
Address Matching,Information Retrieval,Siamese Neural Networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要