Representation Learning of Tangled Key-Value Sequence Data for Early Classification
arxiv(2024)
摘要
Key-value sequence data has become ubiquitous and naturally appears in a
variety of real-world applications, ranging from the user-product purchasing
sequences in e-commerce, to network packet sequences forwarded by routers in
networking. Classifying these key-value sequences is important in many
scenarios such as user profiling and malicious applications identification. In
many time-sensitive scenarios, besides the requirement of classifying a
key-value sequence accurately, it is also desired to classify a key-value
sequence early, in order to respond fast. However, these two goals are
conflicting in nature, and it is challenging to achieve them simultaneously. In
this work, we formulate a novel tangled key-value sequence early classification
problem, where a tangled key-value sequence is a mixture of several concurrent
key-value sequences with different keys. The goal is to classify each
individual key-value sequence sharing a same key both accurately and early. To
address this problem, we propose a novel method, i.e., Key-Value sequence Early
Co-classification (KVEC), which leverages both inner- and inter-correlations of
items in a tangled key-value sequence through key correlation and value
correlation to learn a better sequence representation. Meanwhile, a time-aware
halting policy decides when to stop the ongoing key-value sequence and classify
it based on current sequence representation. Experiments on both real-world and
synthetic datasets demonstrate that our method outperforms the state-of-the-art
baselines significantly. KVEC improves the prediction accuracy by up to 4.7 -
17.5% under the same prediction earliness condition, and improves the
harmonic mean of accuracy and earliness by up to 3.7 - 14.0%.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要