[Topic-models] Does word-ordering matter in Gibbs sampling?

dan danwalkeriv at gmail.com
Thu Jul 27 18:56:10 EDT 2017


You can sample the topic assignments in any order that you want. It makes
the code slightly harder to write, but any order, including random order,
will work.


On Wed, Jul 26, 2017 at 11:19 PM, Swapnil Hingmire <
swapnilhingmire at gmail.com> wrote:

> Hi Dan,
> I would like to know how random scan Gibbs sampler can be used in LDA
> inference
> On Wed, Jul 26, 2017 at 10:53 PM, dan <danwalkeriv at gmail.com> wrote:
>> In theory it shouldn't matter, a Gibbs sampler with infinite time and
>> machine precision would eventually mix well converge in distribution and
>> you would sample from every region of the support in proportion to it's
>> probability mass. In practice, I think you are right that it would be
>> possible for the data ordering to cause you to quickly enter a local
>> maximum that would be difficult (or impossible, given finite time and
>> machine precision) to ever exit from. One approach to mitigating this
>> problem would be to do a random sweep over the variables that you are
>> sampling. Another might be to use deterministic annealing. Charles Elkan
>> has some great descriptions about how deterministic annealing works in the
>> context of EM for mixture models (http://cseweb.ucsd.edu/~elkan
>> /250Bwinter2011/mixturemodels.pdf). I tried applying the same concepts
>> to a Gibbs sampler in my dissertation work and achieved some really
>> promising results (http://scholarsarchive.byu.ed
>> u/cgi/viewcontent.cgi?article=4529&context=etd). The advantage of DA
>> would be that it helps avoid all kinds of maxima, not just those caused by
>> scan order.
>> I also did a quick search and came across these relevant publications:
>> Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on
>> How Much (https://arxiv.org/pdf/1606.03432.pdf)
>> Implementing Random Scan Gibbs Samplers (https://link.springer.com/art
>> icle/10.1007/BF02736129)
>> --dan
>> On Tue, Jul 25, 2017 at 9:49 PM, Eric Kang <erickangnz at gmail.com> wrote:
>>> Hi everyone,
>>> My apologies if this is an uninformed question, but in Gibbs sampling
>>> for LDA inference, aren’t the various counts of word-topic assignments
>>> updated word-by-word? Doesn’t this make it somewhat dependent on word
>>> ordering? For example, if word_1 is strongly associated with topic_1 and
>>> word_2 is strongly associated with topic_2, if I see a document {word_1,
>>> word_1, … (100 times), word_2, word_2, … (100 times), word_2}, then by the
>>> time I start seeing word_2, wouldn’t the algorithm be more inclined to
>>> think that it should be assigned to topic_1, compared to a scenario where I
>>> see the document {word_1, word_2, word_1, word_2, …}?
>>> Thank you,
>>> Eric
>>> _______________________________________________
>>> Topic-models mailing list
>>> Topic-models at lists.cs.princeton.edu
>>> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>> _______________________________________________
>> Topic-models mailing list
>> Topic-models at lists.cs.princeton.edu
>> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
> --
> Thanks and Regards,
> Swapnil Hingmire
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170727/4a6318f4/attachment.html>

More information about the Topic-models mailing list