[Topic-models] On author topic model

Wray Buntine wray.buntine at monash.edu
Sun Jul 23 21:05:49 EDT 2017


On 23 July 2017 at 02:37, Eric Kang <erickangnz at gmail.com> wrote:

> Thanks, Prof. Buntine. I'll read your paper and the literature it
> references.
>
> A quick question though: when you say avoid HCRP, why is that? Is that
> because of slow convergence, or something else?'
>

Hi, yes I should have expanded more.   The CRP model is very important.
Its how you do
inference at the top level for a DP, a Pitman-Yor process and a gamma
process.
All related processes have a CRP like formulation.

The HCRP, as for as I can tell, is very useful for understanding etc., but
in implementation
you should always collapse it.

But at the lower levels of a hierarchy, a DP is more like a Dirichlet, and
you are
better off collapsing the HCRP.   It is much faster, less memory, no
dynamic memory,
better convergence, etc.  Collapsing means you have to compute the Stirling
numbers,
which is easily done, and accurate approximations exist for large N.

All this is well documented.

Wray


> Regards,
> Eric
>
>
> On Jul 20, 2017, at 8:41 PM, Wray Buntine <wray.buntine at monash.edu> wrote:
>
> Hi
>
> We did something like this in
>       Kar Wai Lim and Wray Buntine. Bibliographic analysis on research
> publications using authors, categorical labels and the citation network.
> Machine Learning, 103:185–213, 2016.
>
> We thought the ATM was a bit primitive so added the extra bit as
> "non-parametric ATM" in our experiments.  Its pretty simple to implement.
> Our own model does a lot more.  Implemented using HDPs/HPYPs which are
> pretty efficient when done right ... avoid HCRPs like the plague.
>
> Prof. Wray Buntine
> Course Director for Master of Data Science
> Monash University
> http://topicmodels.org
>
> On 21 July 2017 at 10:20, Eric Kang <erickangnz at gmail.com> wrote:
>
>> Hi everyone,
>>
>> I have a question about the author-topic model. Is my understanding
>> correct that the author-topic probabilities are "constant" across different
>> documents? So if the same author writes multiple documents, the implied
>> document-topic proportions would be the same between those documents?
>>
>> I thought perhaps another model might be to suppose that author-topic
>> probabilities are a multinomial random variable (with a Dirichlet prior)
>> that is sampled per document. In other words, each author is associated
>> with author-specific Dirichlet distribution over topics, and for a
>> particular document, a topic mixture is sampled from that Dirichlet
>> distribution. And the inference problem would be to determine the
>> topic-word probabilities, and the Dirichlet parameters for each author.
>>
>> Does this make sense? Is there existing work of this kind in the
>> literature? Would this be interesting? Useful? Tractable?
>>
>> Any suggestions or guidance would be really appreciated.
>>
>> Thank you,
>> Eric
>> _______________________________________________
>> Topic-models mailing list
>> Topic-models at lists.cs.princeton.edu
>> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170724/1dbb50f3/attachment.html>


More information about the Topic-models mailing list