2009年3月31日

Probabilistic Latent Semantic Indexing

“Probabilistic Latent Semantic Indexing”, Thomas Hofmann, SIGIR, 1999

Probabilistic Latent Semantic Analysis(PLSA) is a novel method for automated indexing based on the likelihood principle. PLSA defines a statistical latent class model called aspect model fitting with tempered EM algorithm. In contrast to LSA which determines the optimal decomposition by L2-norm, PLSA relies on the likelihood function of multinomial sampling and aims at an explicit maximization of the predictive power of the model which has a clear probabilistic meaning in terms of mixture component distributions. PLSA succeeds in dealing with the potential impreciseness of user queries by detecting the synonyms and polysemous words.

0 comments: