End-to-end approach of multi-grained embedding of categorical features in tabular data

Han Liu, Qianxi Qiu,Qin Zhang

INFORMATION PROCESSING & MANAGEMENT(2024)

引用 0|浏览0
暂无评分
摘要
In recent years, it has been a commonly adopted strategy to transform categorical data into numerical one to suit popular learning approaches, such as neural networks. The abovementioned transformation has been undertaken popularly through feature embedding within the setting of representation learning, which has led to successful applications in natural language processing and knowledge graph. However, in the context of tabular data processing, the transformation of categorical features into numerical ones is still typically achieved by using handcrafted methods of category encoding. In this paper, we propose an end -to -end approach of multi -grained embedding of categorical features in tabular data in the setting of decision forests driven representation learning. Specifically, we incorporate an uncertaintyaware optimization strategy into the proposed approach to guide the process of end -to -end learning. The proposed approach has been evaluated experimentally on 12 real -world data sets. The experimental results show that the proposed approach outperforms 10 baselines in terms of feature transformation, leading to an improvement of classification accuracy by at least 3% on most data sets.
更多
查看译文
关键词
Feature embedding,Representation learning,Category encoding,Tabular data processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要