[Topic-models] sparse word vectors and LDA
michaelklachko at gmail.com
Fri May 27 19:48:44 EDT 2016
I'm new to topic modeling, and I'm currently exploring different ways to
construct word vectors.
One way is to use a topic modeling algorithm: run LDA on a large corpus of
text, and identify k topics. Then, build k-dimensional vectors for every
word, so that every position in a vector corresponds to a topic. If word X
belongs to topic Z then the vector for X will have "1" at position Z. At
the end, we will have sparse vectors of length k.
I have a few questions:
1. Does this make sense?
2. Has it been tried?
3. Is LDA the best algorithm for this?
4. How to modify LDA so that instead of "1"s in the vector I would have
real numbers representing probabilities of the word belonging to topics in
this document? (again, I'm not sure if this makes sense in the context of
LDA...). One reason for this is to avoid having identical vectors for
similar words, such as "cat" and "dog".
5. How such sparse vectors would compare to vectors generated with
6. Is it possible to somehow make sure that related topics would correspond
to positions in the vector that are nearby?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Topic-models