[Topic-models] Does word-ordering matter in Gibbs sampling?

Swapnil Hingmire swapnilhingmire at gmail.com
Thu Jul 27 02:19:10 EDT 2017


Hi Dan,

I would like to know how random scan Gibbs sampler can be used in LDA
inference

On Wed, Jul 26, 2017 at 10:53 PM, dan <danwalkeriv at gmail.com> wrote:

> In theory it shouldn't matter, a Gibbs sampler with infinite time and
> machine precision would eventually mix well converge in distribution and
> you would sample from every region of the support in proportion to it's
> probability mass. In practice, I think you are right that it would be
> possible for the data ordering to cause you to quickly enter a local
> maximum that would be difficult (or impossible, given finite time and
> machine precision) to ever exit from. One approach to mitigating this
> problem would be to do a random sweep over the variables that you are
> sampling. Another might be to use deterministic annealing. Charles Elkan
> has some great descriptions about how deterministic annealing works in the
> context of EM for mixture models (http://cseweb.ucsd.edu/~
> elkan/250Bwinter2011/mixturemodels.pdf). I tried applying the same
> concepts to a Gibbs sampler in my dissertation work and achieved some
> really promising results (http://scholarsarchive.byu.
> edu/cgi/viewcontent.cgi?article=4529&context=etd). The advantage of DA
> would be that it helps avoid all kinds of maxima, not just those caused by
> scan order.
>
> I also did a quick search and came across these relevant publications:
> Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How
> Much (https://arxiv.org/pdf/1606.03432.pdf)
> Implementing Random Scan Gibbs Samplers (https://link.springer.com/
> article/10.1007/BF02736129)
>
> --dan
>
> On Tue, Jul 25, 2017 at 9:49 PM, Eric Kang <erickangnz at gmail.com> wrote:
>
>> Hi everyone,
>>
>> My apologies if this is an uninformed question, but in Gibbs sampling for
>> LDA inference, aren’t the various counts of word-topic assignments updated
>> word-by-word? Doesn’t this make it somewhat dependent on word ordering? For
>> example, if word_1 is strongly associated with topic_1 and word_2 is
>> strongly associated with topic_2, if I see a document {word_1, word_1, …
>> (100 times), word_2, word_2, … (100 times), word_2}, then by the time I
>> start seeing word_2, wouldn’t the algorithm be more inclined to think that
>> it should be assigned to topic_1, compared to a scenario where I see the
>> document {word_1, word_2, word_1, word_2, …}?
>>
>> Thank you,
>> Eric
>> _______________________________________________
>> Topic-models mailing list
>> Topic-models at lists.cs.princeton.edu
>> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>>
>
>
> _______________________________________________
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>
>


-- 


Thanks and Regards,
Swapnil Hingmire
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170727/d75a9abe/attachment.html>


More information about the Topic-models mailing list