[Topic-models] Perplexity of LDA and similar Dir-Mult models
David Blei
blei at CS.Princeton.EDU
Mon Feb 25 10:02:09 EST 2008
dear gregor,
thanks for sending this along. it's very interesting.
i'm a little surprised. often the number of topics that are "found"
by the algorithm has a lot to do with the hyperparameters alpha and
beta. a topic hyperparameter that prefers sparser topics will lead to
a model that requires more of them. (in your notation, that is beta
near 0.) if you were fitting or sampling the hyperparameters then the
perplexity might continue to increase. but, with fixed
hyperparameters, i'd expect it to level out.
do you see the same behavior when you change the size of the
vocabulary? i've wondered if V affects things in ways that we haven't
explored. my student---jonathan chang---has noticed that very rare
words can induce partitions of the corpus that might get in the way of
finding the co-occurrence patterns that we expect.
all the best,
dave
On Feb 19, 2008, at 11:55 AM, Gregor Heinrich wrote:
> Dear topic-model experimenters,
>
> when applying perplexity to evaluate topics of LDA or similar topic
> models, there seems a bias towards large topic counts.
>
> On the well-known NIPS 1-12 dataset, LDA results for the Gibbs sampler
> look like:
>
> K = 10, ppx = 2177
> K = 25, ppx = 1907
> K = 50, ppx = 1733
> K = 75, ppx = 1642
> K = 100, ppx = 1570
> K = 150, ppx = 1497
> K = 200, ppx = 1421
> K = 300, ppx = 1307
> K = 500, ppx = 1181
> K = 1000, ppx = 1050
> K = 2000 running...
>
> Did anyone of you have a similar experience?
>
> These results are for plain LDA [1], but I get similar results with
> VEM,
> and also with other models, e.g., topic hierarchies.
>
> I am a bit surprised about this result because in the non-parametric
> Bayes community, DP priors are advocated for as a means to find the
> optimum K w.r.t. minimum ppx (publications often report 50 .. 150
> topics
> for medium-sized corpora).
>
> Thanks for any hints.
>
> Best regards from sunny Frankfurt
>
> gregor
>
> Endnote:
>
> [1]
> Definitions for LDA:
> ppx := exp ( -1/W * sum_m sum_n log (sum_k phi_k,w_mn * theta_m,k) )
> phi_k,t := E{p(w=t|z=k)}
> theta_m,k := E{p(z=k|testdoc = m)}
>
> Conditions: NIPS 1-12, alpha = 0.1 .. 0.3, beta = 0.01.
> M = 1740
> M_train = 1566
> M_test = 174
> V = 13649
> W = 2301375
> W_train = 2071716
> W_test = 229659
>
>
> _______________________________________________
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
More information about the Topic-models
mailing list