[Topic-models] Non-parametric topic models

Thibaut Thonet thibaut.thonet at irit.fr
Mon Feb 20 12:23:24 EST 2017


Hi all,

I've got a question about non-parametric topic models. I'm wondering 
whether the model described by the following generative process makes 
any sense:
* For each topic k = 1, 2, ...
   - Draw phi_k ~ Dirichlet(beta)
* For each document d = 1, ..., D
   - Draw theta_d ~ GEM(alpha)
   - For each n = 1, ..., N_d
     + Draw z_{dn} ~ Discrete(theta_d)
     + Draw w_{dn} ~ Discrete(phi_{z_dn})

This resembles the stick-breaking version of the Hierarchical Dirichlet 
Process (described in Yee Whye Teh's 2006 paper), but the difference is 
that theta_d is directly drawn from GEM(alpha) instead of being drawn 
from a DP(alpha, theta_0) where theta_0 is a GEM-distributed base 
measure shared across all documents. Under the CRP interpretation, this 
is a sort of hybrid between the Chinese restaurant process and the 
Chinese restaurant franchise: in this model, p(z_{dn} = k | z_{-dn}) is 
proportional to n_{dk}^{-dn} if k is an existing topic and proportional 
to alpha if k is a new topic.

Although I feel that there is something conceptually wrong with this 
model, I fail to put the finger on the exact arguments to prove it. My 
intuition is that since each theta_d is independently drawn from a GEM, 
the topic indexes should not be able to be shared across documents 
(i.e., topic k in document j need not be coherent with topic k in 
document j'). But since all documents will use the same {phi_k}_k -- 
which are generated independently from documents, it seems that this 
model's Gibbs sampler should nonetheless 'work' in practice and produce 
coherent topics.

What also puzzles me is that this 'easy' non-parametric extension to 
parametric models (I described the 'easy' non-parametric extension to 
LDA in this example) is used in a few papers from top text mining 
conferences (e.g., SIGIR, CIKM, WWW), relating it to CRP or HDP (whereas 
it in fact isn't exactly either of them)...

Thanks in advance for any insight on what's theoretically wrong (or not) 
with this model.

Best,

Thibaut



More information about the Topic-models mailing list