# [Topic-models] LDA beginner's questions

Wed Nov 5 13:09:57 EST 2008

>> Hi,
>>
>> Thanks a lot for replying.
>> Some comments and questions inline...
>
>>>>4. Why is p(z_n | \theta) given by \theta_i for the unique i such that
>>>>z_n^i = 1 ?
>>> Because \theta = { \theta_1, \theta_2, ..., \theta_n }, where each entry
>>> gives
>>> the probability for one topic. And for topic z_n^i = 1 it is given by
>>> \theta_i.
>>
>> So that means there's no possibility of a word belonging to one topic
>> more than the others?
>> If a word belongs to a topic, that words belongs to only this topic?
>> Can this assignment change the next time we notice the same word later
>> in the document?
>>
>Yes there is. P(w_i|z_p = 1) is in general different from P(w_i|z_q = 1).
In English: The word "belongs the most" to the topic that gives it the
highest probablity to occur. This probability is different for topics
p,q,...

See my mail for the question.

>
>
>>>>What steps do we make in order to make the LDA work correctly?
>>> I do not understand the question.
>>
>> The part you wrote below answered my question :)
>>
>>>>Estimate parameters and then do inference, or the other way around? I
>>>>think this is missing in the paper.
>>
>>> First parameter estimation, then inference. You need the parameters for
>>> inference.
>>
>> Please correct me if I am wrong:
>> the estimation works by estimating \alpha and \beta,
>> while inference gives me the values of z ?
>>
>Yes. And estimation with GibbsLDA (http://gibbslda.sourceforge.net/) gives
me all the other variables too. I think inference computes \theta too.
>BTW you cannot infer without estimated parameters.
>
>> I'm also curious why the original paper describes inference first,
>> then estimation...
>> Any hints?
>>
>No. I found it confusing too. Anyone knows why?

For graphical model problems, there are 3 basic problems,
inference, parameter learning and structure learning,
in the order of increasing difficult, structure learning is the
most difficult one so that most of the work related to
this topic just specify a graphical model in prior,
while leave the other 2 problems to be solved

>
>>>>7. Is the LDA-C a 1-1 implementation of what is published in the
>>>>paper? I was trying to read the code but for the first few passes over
>>>>the code I don't see any direct mapping to most of the equations
>>>>published in the paper.
>>
>>> I do not know. But it had comparable results in a short experiment.
>>
>> Ok. I'll rephrase a bit more to get more details.
>>
>> To what part of the paper does lda_mle() function refer to?
>>
>I do not now the source. I didn't do much with lda-c

lda_mle() This correspond to the M step of \alpha and \beta, but as it
uses a constant \alpha for all the \k, so a sufficient statistics approach
is used in the programming.

LDA-C is exactly the same as in the LDA paper except that it uses constant
\alpha
and do not use the smoothing scheme for \beta.