A generic framework for efficient and effective subsequence retrieval

PVLDB(2012)

引用 14|浏览37
暂无评分
摘要
This paper proposes a general framework for matching similar subsequences in both time series and string databases. The matching results are pairs of query subsequences and database subsequences. The framework finds all possible pairs of similar subsequences if the distance measure satisfies the \"consistency\" property, which is a property introduced in this paper. We show that most popular distance functions, such as the Euclidean distance, DTW, ERP, the Frechét distance for time series, and the Hamming distance and Levenshtein distance for strings, are all \"consistent\". We also propose a generic index structure for metric spaces named \"reference net\". The reference net occupies O(n) space, where n is the size of the dataset and is optimized to work well with our framework. The experiments demonstrate the ability of our method to improve retrieval performance when combined with diverse distance measures. The experiments also illustrate that the reference net scales well in terms of space overhead and query time.
更多
查看译文
关键词
effective subsequence retrieval,levenshtein distance,euclidean distance,hamming distance,distance measure,query time,general framework,similar subsequence,diverse distance measure,generic framework,time series,popular distance function
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要