[Topic-models] Explanation about Polya urn model and LDA
gabriele.pergola at gmail.com
Thu Jul 6 11:01:20 EDT 2017
I came across the paper "Optimizing semantic coherence in topic models" by
Mimno et al. 2011, where they present a modified version of Gibbs sampling
following the generalized Polya-urn model.
I couldn't manage to find any code, it seems was not provided; so, I
decided to implement it by myself.
However, I have got a problem. If you have look at the pseudocode provided
in the paper ("Algorithm 2"), the counter N_(z,d) about how many words for
a topic are present in a document is decremented and incremented only by 1;
but because of the polya urn approach, more than one words in document can
be assigned to a topic at once (line 10).
I wonder if even this counter should be updated according to all the new
words that have been assigned to a new topic during one iteration (line
10); otherwise, a fake value will be counted about how much a topic is
prominent in a document.
I look forward some explanation.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Topic-models