Latent semantic indexing: a probabilistic analysis
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp. 217-235, 2000.
collaborative filteringprobabilistic analysisinformation retrievalspectral methodlatent semantic indexing
Latent semantic indexing (LSI) is an information retrieval technique based on the spectralanalysis of the term-document matrix, whose empirical success had heretofore been withoutrigorous prediction and explanation. We prove that, under certain conditions, LSI does succeedin capturing the underlying semantics of the corpus and achieves i...More