[Topic-models] Perplexity of LDA and similar Dir-Mult models

David Blei blei at CS.Princeton.EDU
Mon Feb 25 10:02:09 EST 2008


dear gregor,

thanks for sending this along.  it's very interesting.

i'm a little surprised.  often the number of topics that are "found"  
by the algorithm has a lot to do with the hyperparameters alpha and  
beta.  a topic hyperparameter that prefers sparser topics will lead to  
a model that requires more of them.  (in your notation, that is beta  
near 0.)  if you were fitting or sampling the hyperparameters then the  
perplexity might continue to increase.  but, with fixed  
hyperparameters, i'd expect it to level out.

do you see the same behavior when you change the size of the  
vocabulary?  i've wondered if V affects things in ways that we haven't  
explored.  my student---jonathan chang---has noticed that very rare  
words can induce partitions of the corpus that might get in the way of  
finding the co-occurrence patterns that we expect.

all the best,
dave


On Feb 19, 2008, at 11:55 AM, Gregor Heinrich wrote:

> Dear topic-model experimenters,
>
> when applying perplexity to evaluate topics of LDA or similar topic
> models, there seems a bias towards large topic counts.
>
> On the well-known NIPS 1-12 dataset, LDA results for the Gibbs sampler
> look like:
>
> K = 10, ppx = 2177
> K = 25, ppx = 1907
> K = 50, ppx = 1733
> K = 75, ppx = 1642
> K = 100, ppx = 1570
> K = 150, ppx = 1497
> K = 200, ppx = 1421
> K = 300, ppx = 1307
> K = 500, ppx = 1181
> K = 1000, ppx = 1050
> K = 2000 running...
>
> Did anyone of you have a similar experience?
>
> These results are for plain LDA [1], but I get similar results with  
> VEM,
> and also with other models, e.g., topic hierarchies.
>
> I am a bit surprised about this result because in the non-parametric
> Bayes community, DP priors are advocated for as a means to find the
> optimum K w.r.t. minimum ppx (publications often report 50 .. 150  
> topics
> for medium-sized corpora).
>
> Thanks for any hints.
>
> Best regards from sunny Frankfurt
>
> gregor
>
> Endnote:
>
> [1]
> Definitions for LDA:
> ppx := exp ( -1/W * sum_m sum_n log (sum_k phi_k,w_mn * theta_m,k) )
> phi_k,t := E{p(w=t|z=k)}
> theta_m,k := E{p(z=k|testdoc = m)}
>
> Conditions: NIPS 1-12, alpha = 0.1 .. 0.3, beta = 0.01.
> M = 1740
> M_train = 1566
> M_test = 174
> V = 13649
> W = 2301375
> W_train = 2071716
> W_test = 229659
>
>
> _______________________________________________
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
> https://lists.cs.princeton.edu/mailman/listinfo/topic-models



More information about the Topic-models mailing list