Learning Euclidean Embeddings for Indexing and Classification
msra(2004)
摘要
BoostMap is a recently proposed method for efficient ap- proximate nearest neighbor retrieval in arbitrary non- Eu- clidean spaces with computationally expensive and possibly non-metric distance measures. Database and query objects are embedded into a Euclidean space,in which similarities can be rapidly measured using a weighted Manhattan dis- tance. The key idea is formulating embedding construc- tion as a machine learning task,where AdaBoost is used to combine simple,1D embeddings into a multidimensional embedding that preserves a large amount of the proximity structure of the original space. This paper demonstrates that,using the machine learning formulation of BoostMap, we can optimize embeddings for indexing and classification, in ways that are not possible with existing alternatives for constructive embeddings,and without additional costs in re- trieval time. First,we show how to construct embeddings that are query-sensitive,in the sense that they yield a differ- ent distance measure for different queries,so as to improve nearest neighbor retrieval accuracy for each query. Second, we show how to optimize embeddings for nearest neighbor classification tasks,by tuning them to approximate a param- eter space distance measure,instead of the original feature- based distance measure.
更多查看译文
关键词
technical report,information retrieval,data bases,machine learning,nearest neighbor,embedding,classification,euclidean space,ranking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络