# [Topic-models] Topic-models Digest, Vol 28, Issue 2

1980er at web.de 1980er at web.de
Tue Nov 4 04:28:49 EST 2008

Hi Mateusz,
>I am a undergrad student and a newcomer to the LDA and this list as
>well. I'd like to ask a lot of questions which I guess are of a very
>basic nature to you, guys, but to me most of this stuff is really
>unknown so I am tackling multiple new things/techniques/ideas all at
>once.
Same here.

>After reading LDA paper (Blei et al 2003) a couple of times my grasp
>of the concepts is somewhat better but I still need some
>clarifications.
figured out since then.

>1. \alpha parameters are not known and are used only conceptually in
>the generative process. They are to be estimated ?
Depends on the implementation. You could estimate it or you could know it in
frequent but there is only one per document).

>2. \beta parameters (word probabilities, i.e. p(w^j = 1 | z^i = 1) are
>not known in advance and are to be estimated ? \beta is in fact a
>matrix of conditional probabilities?
Yes, yes. the matrix entry \beta_{i,j} is defined as \$P(w_j|z^i=1)

>3. Whenever we choose \theta from the Dirichlet(\alpha) distribution
>we effectively get all but one (as N is assumed to be fixed)
>parameters for the Multinomial distribution used later on ?
Yes. And since they have to sum up to one, you have all parameters in fact.
Not sure what you think it has to do with N (the number of documents). The
dimensionality of \theta is k. Look at the picture at
http://en.wikipedia.org/wiki/Image:Dirichlet_distributions.png
The height of the surface is proportional to the probability you get that
point, when you sample \theta. And the distances to the cornerpoints represent
the probabilities you get for each of the (here three) topics (represented by
the corners).

>4. Why is p(z_n | \theta) given by \theta_i for the unique i such that
>z_n^i = 1 ?
Because \theta = { \theta_1, \theta_2, ..., \theta_n }, where each entry gives
the probability for one topic. And for topic z_n^i = 1 it is given by
\theta_i.

>5. What is the definition of a variational parameter? What are they?
>(I'm completely new to variational methods)
I don't know

>6. What is the high level overview of the LDA algorithm?
It is given in the paper by Blei in the beginning of chapter three and in the
figure at http://en.wikipedia.org/wiki/Plate_notation
I am currently writing an overview in my thesis. Maybe this helps you:

Being a generative probabilistic model the basic assumption made in LDA is
that documents are generated by random processes. Each of these represents a
different topic z. A random process generates the words in the document by
sampling them from its own specific discrete probability distribution over the
words P(w| z ). A document can be created by one or more topics, each
having associated a distinct probability distribution over the words.
To represent the mixture of topics in a document, a multinomial distribution
\theta  is used.
For each word in the document the generating topic is selected by sampling
from \theta.
The topic mixture \theta itself is drawn from a Dirichlet distribution
once for every document in the corpus. The Dirichlet represents our prior
belief of the topic mixtures that occur in the corpus, i.e. whether the
documents are generated by single topics or from a mixture of many topics
several and which topics prevail.

>What steps do we make in order to make the LDA work correctly?
I do not understand the question.

>Estimate parameters and then do inference, or the other way around? I
>think this is missing in the paper.
First parameter estimation, then inference. You need the parameters for
inference.

>7. Is the LDA-C a 1-1 implementation of what is published in the
>paper? I was trying to read the code but for the first few passes over
>the code I don't see any direct mapping to most of the equations
>published in the paper.
I do not know. But it had comparable results in a short experiment.

>I'd appreciate even the most concise answers (but not too much as this
>can result in another question to the answer ;) ).

best regards,
Mateusz Berezecki