[Topic-models] Explanation about Polya urn model and LDA

Gabriele Pergola gabriele.pergola at gmail.com
Thu Jul 6 11:01:20 EDT 2017


I came across the paper "Optimizing semantic coherence in topic models" by
Mimno et al. 2011, where they present a modified version of Gibbs sampling
following the generalized Polya-urn model.

I couldn't manage to find any code, it seems was not provided; so, I
decided to implement it by myself.

However, I have got a problem. If you have look at the pseudocode provided
in the paper ("Algorithm 2"), the counter N_(z,d) about how many words for
a topic are present in a document is decremented and incremented only by 1;
but because of the polya urn approach, more than one words in document can
be assigned to a topic at once (line 10).
I wonder if even this counter should be updated according to all the new
words that have been assigned to a new topic during one iteration (line
10); otherwise, a fake value will be counted about how much a topic is
prominent in a document.

I look forward some explanation.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170706/22063792/attachment.html>

More information about the Topic-models mailing list