[Topic-models] Explanation about Polya urn model and LDA
thibaut.thonet at irit.fr
Wed Jul 12 08:31:30 EDT 2017
Glad that my explanation could help!
I'm not aware of such work. But I'm also not sure one would want to
systematically assign the new topic to related words (in addition to the
current word), as it might be too constraining. It seems more natural to
only influence words towards a topic without compelling their
assignment. And this is actually what is done with GPU-LDA: when the
count N_zv of a related word v is increased, the topic z will be
slightly favored (i.e., more likely to be assigned) for all tokens with
word type v.
Also, keep in mind that a word can have several meanings (e.g., 'bank'
as the financial institution or as the land bordering a river). So the
'hard' constraint you want to enforce could for example lead to linking
all occurrences of the word 'river' (which is somehow related to bank)
to the topic of finance. While I don't say this phenomenon won't occur
at all for GPU-LDA, my intuition is that it will be less prominent.
Le 12/07/2017 à 01:13, Gabriele Pergola a écrit :
> Hi Thibaut,
> Clear enough?! You have been great!
> One of the clearest explanation I've read so far.
> Actually, before your answer, I missed one point: the words that are
> increased by A_vw are already "under topic z". Instead, I wrongly
> thought that also the words under different topics might experience a
> frequency increment; this will have entailed that those words would
> change their topic assignments, which in turn would change the
> proportion of words assigned to a topic in a document (i.e. N_dz).
> Of course, this does not occur if the words, whose frequency is
> increased, were already under the same topic.
> Speaking of which, could you suggest me any works (if any exist) that
> have explored the idea to assign the new sampled topic not only to the
> current word but even to its related words?
> (Supposed that this idea could make sense..).
> Thank you so much for your help!
More information about the Topic-models