[Topic-models] Does word-ordering matter in Gibbs sampling?

Eric Kang erickangnz at gmail.com
Thu Jul 27 08:32:29 EDT 2017

Hi Dan,

Thank you. This is exactly what I was wondering. I think it's interesting because LDA is based on a bag-of-words representation, with it's theoretical motivation partly arising from de Finetti's theorem for exchangeable variables. But in practice, for Gibbs sampling methods, they're probably implemented in a order-dependent manner. Of course, given enough iterations, it'd converge, but in practice I'm not sure training is run till convergence. Based on one test, I think I was seeing the "quickly enter a local maximum that would be difficult to exit from" behavior you're mentioning. 


> On Jul 26, 2017, at 1:23 PM, dan <danwalkeriv at gmail.com> wrote:
> In theory it shouldn't matter, a Gibbs sampler with infinite time and machine precision would eventually mix well converge in distribution and you would sample from every region of the support in proportion to it's probability mass. In practice, I think you are right that it would be possible for the data ordering to cause you to quickly enter a local maximum that would be difficult (or impossible, given finite time and machine precision) to ever exit from. One approach to mitigating this problem would be to do a random sweep over the variables that you are sampling. Another might be to use deterministic annealing. Charles Elkan has some great descriptions about how deterministic annealing works in the context of EM for mixture models (http://cseweb.ucsd.edu/~elkan/250Bwinter2011/mixturemodels.pdf). I tried applying the same concepts to a Gibbs sampler in my dissertation work and achieved some really promising results (http://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=4529&context=etd). The advantage of DA would be that it helps avoid all kinds of maxima, not just those caused by scan order.
> I also did a quick search and came across these relevant publications:
> Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much (https://arxiv.org/pdf/1606.03432.pdf)
> Implementing Random Scan Gibbs Samplers (https://link.springer.com/article/10.1007/BF02736129)
> --dan
>> On Tue, Jul 25, 2017 at 9:49 PM, Eric Kang <erickangnz at gmail.com> wrote:
>> Hi everyone,
>> My apologies if this is an uninformed question, but in Gibbs sampling for LDA inference, aren’t the various counts of word-topic assignments updated word-by-word? Doesn’t this make it somewhat dependent on word ordering? For example, if word_1 is strongly associated with topic_1 and word_2 is strongly associated with topic_2, if I see a document {word_1, word_1, … (100 times), word_2, word_2, … (100 times), word_2}, then by the time I start seeing word_2, wouldn’t the algorithm be more inclined to think that it should be assigned to topic_1, compared to a scenario where I see the document {word_1, word_2, word_1, word_2, …}?
>> Thank you,
>> Eric
>> _______________________________________________
>> Topic-models mailing list
>> Topic-models at lists.cs.princeton.edu
>> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170727/d0c7396f/attachment.html>

More information about the Topic-models mailing list