[Topic-models] Explanation about Polya urn model and LDA

Thibaut Thonet thibaut.thonet at irit.fr
Wed Jul 12 08:31:30 EDT 2017

Hi Gabriele,

Glad that my explanation could help!

I'm not aware of such work. But I'm also not sure one would want to 
systematically assign the new topic to related words (in addition to the 
current word), as it might be too constraining. It seems more natural to 
only influence words towards a topic without compelling their 
assignment. And this is actually what is done with GPU-LDA: when the 
count N_zv of a related word v is increased, the topic z will be 
slightly favored (i.e., more likely to be assigned) for all tokens with 
word type v.

Also, keep in mind that a word can have several meanings (e.g., 'bank' 
as the financial institution or as the land bordering a river). So the 
'hard' constraint you want to enforce could for example lead to linking 
all occurrences of the word 'river' (which is somehow related to bank) 
to the topic of finance. While I don't say this phenomenon won't occur 
at all for GPU-LDA, my intuition is that it will be less prominent.



Le 12/07/2017 à 01:13, Gabriele Pergola a écrit :
> Hi Thibaut,
> Clear enough?! You have been great!
> One of the clearest explanation I've read so far.
> Actually, before your answer, I missed one point: the words that are 
> increased by A_vw are already "under topic z". Instead, I wrongly 
> thought that also the words under different topics might experience a 
> frequency increment; this will have entailed that those words would 
> change their topic assignments, which in turn would change the 
> proportion of words assigned to a topic in a document (i.e. N_dz).
> Of course, this does not occur if the words, whose frequency is 
> increased, were already under the same topic.
> Speaking of which, could you suggest me any works (if any exist) that 
> have explored the idea to assign the new sampled topic not only to the 
> current word but even to its related words?
> (Supposed that this idea could make sense..).
> Thank you so much for your help!
> Best,
> Gabriele

More information about the Topic-models mailing list