# [Topic-models] Topic-models Digest, Vol 28, Issue 2

Mateusz Berezecki mateuszb at gmail.com
Tue Nov 4 04:54:03 EST 2008

Hi,

On Tue, Nov 4, 2008 at 10:28 AM,  <1980er at web.de> wrote:
>>3. Whenever we choose \theta from the Dirichlet(\alpha) distribution
>>we effectively get all but one (as N is assumed to be fixed)
>>parameters for the Multinomial distribution used later on ?

> Yes. And since they have to sum up to one, you have all parameters in fact.
> Not sure what you think it has to do with N (the number of documents). The
> dimensionality of \theta is k. Look at the picture at
> http://en.wikipedia.org/wiki/Image:Dirichlet_distributions.png

Right, my bad. I meant K rather than N.

>>4. Why is p(z_n | \theta) given by \theta_i for the unique i such that
>>z_n^i = 1 ?
> Because \theta = { \theta_1, \theta_2, ..., \theta_n }, where each entry gives
> the probability for one topic. And for topic z_n^i = 1 it is given by
> \theta_i.

So that means there's no possibility of a word belonging to one topic
more than the others?
If a word belongs to a topic, that words belongs to only this topic?
Can this assignment change the next time we notice the same word later
in the document?

>>What steps do we make in order to make the LDA work correctly?
> I do not understand the question.

The part you wrote below answered my question :)

>>Estimate parameters and then do inference, or the other way around? I
>>think this is missing in the paper.

> First parameter estimation, then inference. You need the parameters for
> inference.

Please correct me if I am wrong:
the estimation works by estimating \alpha and \beta,
while inference gives me the values of z ?

I'm also curious why the original paper describes inference first,
then estimation...
Any hints?

>>7. Is the LDA-C a 1-1 implementation of what is published in the
>>paper? I was trying to read the code but for the first few passes over
>>the code I don't see any direct mapping to most of the equations
>>published in the paper.

> I do not know. But it had comparable results in a short experiment.

Ok. I'll rephrase a bit more to get more details.

To what part of the paper does lda_mle() function refer to?

best,
Mateusz Berezecki