[Topic-models] Non-parametric topic models

Thibaut Thonet thibaut.thonet at irit.fr
Tue Feb 21 10:57:11 EST 2017

Hi Wray,

Thanks a lot for your detailed and thorough answer. So I conclude from
what you said that the model I described isn't 'wrong', but it would
just (most likely) perform worse than, e.g., HDP-LDA or the NP-LDA from
your 2014 KDD paper. I'm nonetheless surprised that no work in
literature evaluated this model and compared it against hierarchical
non-parametric models and against vanilla LDA (symmetric-symmetric).

Although it is indeed pretty sure that it would yield a higher
perplexity than that of hierarchical non-parametric models, it seems
that posterior inference for that model (e.g., using direct assignment
sampling), would be time-wise about as efficient as that of vanilla LDA
-- since table counts need not be sampled in that version, given its
non-hierarchical nature. So I'm curious whether its effectiveness
(perplexity, topic coherence) is better than that of vanilla LDA, or
otherwise if flat priors are more penalizing in a non-parametric setting.

Best,

Thibaut

Le 20/02/2017 à 23:08, Wray Buntine a écrit :
> Hi Thibaut
>
> What's wrong with this is that its not hierarchical.
> You allow the theta to be infinite, but you don't give them all a
> common parent.
> The main advantage of the HDP-LDA method is that it allows topics to
> have different
> proportions.  You're doing that in a very controlled way with stick
> breaking, but with the
> HDP you get to better fit the overall topic proportions.
>
> The HDP-LDA is more or less equivalent to your's but with
>  alpha ~ GEM(psi_0)
>       * For each document d = 1, ..., D
>         - Draw theta_d ~ Dirichlet(alpha*psi_1)
>
> NB.  using a bit of liberty here with the Dirichlet as alpha is an
> infinite vector, but
>           just truncate it
>
> This extra level means alpha is estimated giving topic proportions.
>
> This, is rather similar to the Asymmetric-Symmetric LDA Model in
> Mallet, which
> as it happens is *almost* truncated HDP-LDA and beats the pants off most
> HDP-LDA implementations in perplexity and is 10-100 times faster than
> most.
> Experiments reported in my KDD 2014 paper.
>
> So your model would be OK, and it would "fit" the number of topics,
> but a good
> implementation of the above *should* beat it.  Implementations vary so
> much that YMMV.
>
> As an implementation note, I know of few contexts where Chinese restaurant
> processes, hierarchical or franchise, give competitive sampling
> algorithms.
>
> Finally, the more interesting model is this one:
>
> beta = GEM(mu_0,nu_0)
> * For each topic k = 1, 2, ...
>   - Draw phi_k ~ PYP(beta,mu_1)
> alpha = GEM(psi_0,nu_0)
> * For each document d = 1, ..., D
>   - Draw theta_d ~ Dirichlet(alpha*\psi_1)
>   - For each n = 1, ..., N_d
>     + Draw z_{dn} ~ Discrete(theta_d)
>     + Draw w_{dn} ~ Discrete(phi_{z_dn})
>
> NB.  the two-parameter GEM is the vector version of the Pitman-Yor
> process,
>        and the PYP is used on the word side to take advantage of Zipfian
>        behaviour of words
>
> In this case alpha is the topic proportions, a latent vector that is
> estimated, and
> beta is the *background* word proportions which again is latent and
> estimated.
> Algorithms based on Chinese restaurants simply give up with size of the
> word vectors, but more modern algorithms work and do lovely estimates of
> "background", i.e., non-topical words and make your topics in phi more
> interpretable as well as improving perplexity.
>
> Prof. Wray Buntine
> Course Director for Master of Data Science
> Monash University
> http://topicmodels.org
>
> On 21 February 2017 at 04:23, Thibaut Thonet <thibaut.thonet at irit.fr
> <mailto:thibaut.thonet at irit.fr>> wrote:
>
>     Hi all,
>
>     I've got a question about non-parametric topic models. I'm
>     wondering whether the model described by the following generative
>     process makes any sense:
>     * For each topic k = 1, 2, ...
>       - Draw phi_k ~ Dirichlet(beta)
>     * For each document d = 1, ..., D
>       - Draw theta_d ~ GEM(alpha)
>       - For each n = 1, ..., N_d
>         + Draw z_{dn} ~ Discrete(theta_d)
>         + Draw w_{dn} ~ Discrete(phi_{z_dn})
>
>     This resembles the stick-breaking version of the Hierarchical
>     Dirichlet Process (described in Yee Whye Teh's 2006 paper), but
>     the difference is that theta_d is directly drawn from GEM(alpha)
>     instead of being drawn from a DP(alpha, theta_0) where theta_0 is
>     a GEM-distributed base measure shared across all documents. Under
>     the CRP interpretation, this is a sort of hybrid between the
>     Chinese restaurant process and the Chinese restaurant franchise:
>     in this model, p(z_{dn} = k | z_{-dn}) is proportional to
>     n_{dk}^{-dn} if k is an existing topic and proportional to alpha
>     if k is a new topic.
>
>     Although I feel that there is something conceptually wrong with
>     this model, I fail to put the finger on the exact arguments to
>     prove it. My intuition is that since each theta_d is independently
>     drawn from a GEM, the topic indexes should not be able to be
>     shared across documents (i.e., topic k in document j need not be
>     coherent with topic k in document j'). But since all documents
>     will use the same {phi_k}_k -- which are generated independently
>     from documents, it seems that this model's Gibbs sampler should
>     nonetheless 'work' in practice and produce coherent topics.
>
>     What also puzzles me is that this 'easy' non-parametric extension
>     to parametric models (I described the 'easy' non-parametric
>     extension to LDA in this example) is used in a few papers from top
>     text mining conferences (e.g., SIGIR, CIKM, WWW), relating it to
>     CRP or HDP (whereas it in fact isn't exactly either of them)...
>
>     Thanks in advance for any insight on what's theoretically wrong
>     (or not) with this model.
>
>     Best,
>
>     Thibaut
>
>     _______________________________________________
>     Topic-models mailing list
>     Topic-models at lists.cs.princeton.edu
>     <mailto:Topic-models at lists.cs.princeton.edu>
>     https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>     <https://lists.cs.princeton.edu/mailman/listinfo/topic-models>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170221/0e3dc203/attachment-0001.html>