# [Topic-models] Topic-models Digest, Vol 28, Issue3, message3

Veena T veenat2005 at gmail.com
Sun Nov 23 00:45:36 EST 2008

>Hi,

On Tue, Nov 4, 2008 at 10:28 AM,  <1980er at web.de> wrote:
>>3. Whenever we choose \theta from the Dirichlet(\alpha) distribution
>>we effectively get all but one (as N is assumed to be fixed)
>>parameters for the Multinomial distribution used later on ?

> Yes. And since they have to sum up to one, you have all parameters in
fact.
> Not sure what you think it has to do with N (the number of documents). The
> dimensionality of \theta is k. Look at the picture at
> http://en.wikipedia.org/wiki/Image:Dirichlet_distributions.png

>Right, my bad. I meant K rather than N.

>>4. Why is p(z_n | \theta) given by \theta_i for the unique i such that
>>z_n^i = 1 ?
> Because \theta = { \theta_1, \theta_2, ..., \theta_n }, where each entry
gives
> the probability for one topic. And for topic z_n^i = 1 it is given by
> \theta_i.

>So that means there's no possibility of a word belonging to one topic
>more than the others?
>If a word belongs to a topic, that words belongs to only this topic?
>Can this assignment change the next time we notice the same word later
>in the document?
I am not sure what you are meaning.. Anyway, words can belong to different
topics. There is nothing like a word belong to a single topic.

>>What steps do we make in order to make the LDA work correctly?
> I do not understand the question.

>The part you wrote below answered my question :)

>>Estimate parameters and then do inference, or the other way around? I
>>think this is missing in the paper.

> First parameter estimation, then inference. You need the parameters for
> inference.

>Please correct me if I am wrong:
>the estimation works by estimating \alpha and \beta,
>while inference gives me the values of z ?
Actually during the expectation step the variational parameters gamma and
phi are estaimated for each document. during maximization the model
parameters alpha and beta are estimated. In both the steps, the criteria
used for estimation is to maximise the lowerbound.,

>I'm also curious why the original paper describes inference first,
>then estimation...
>Any hints?

>>7. Is the LDA-C a 1-1 implementation of what is published in the
>>paper? I was trying to read the code but for the first few passes over
>>the code I don't see any direct mapping to most of the equations
>>published in the paper.

> I do not know. But it had comparable results in a short experiment.

>Ok. I'll rephrase a bit more to get more details.

>To what part of the paper does lda_mle() function refer to?
The maximization step of the EM.

>best,
>Mateusz Berezecki

regards
veena

--
Veena Srinivas,
PhD scholar,
Speech and Vision Lab