[Topic-models] sparse word vectors and LDA

Kowalski, Radoslaw radoslaw.kowalski.14 at ucl.ac.uk
Fri May 27 21:54:41 EDT 2016


Hi Michael,


Use lda2vec library for python programming language. It does what you want to be done. My personal recommendation with regard to lda2vec is that you implement it on a linux system.


All the best,

Radoslaw



Radoslaw Kowalski

PhD Student

______________________________

Consumer Data Research Centre

UCL Department of Political Science

______________________________

T:  020 3108 1098 x51098

E:  radoslaw.kowalski.14 at ucl.ac.uk<mailto:n.vij at ucl.ac.uk>

W: <http://www.cdrc.ac.uk/> www.cdrc.ac.uk<http://www.cdrc.ac.uk>
Twitter:@CDRC_UK
<http://www.cdrc.ac.uk/>
________________________________


From: topic-models-bounces at lists.cs.princeton.edu <topic-models-bounces at lists.cs.princeton.edu> on behalf of Michael Klachko <michaelklachko at gmail.com>
Sent: 28 May 2016 00:48:44
To: topic-models at lists.cs.princeton.edu
Subject: [Topic-models] sparse word vectors and LDA

Hello,

I'm new to topic modeling, and I'm currently exploring different ways to construct word vectors.

One way is to use a topic modeling algorithm: run LDA on a large corpus of text, and identify k topics. Then, build k-dimensional vectors for every word, so that every position in a vector corresponds to a topic. If word X belongs to topic Z then the vector for X will have "1" at position Z. At the end, we will have sparse vectors of length k.

I have a few questions:

1. Does this make sense?
2. Has it been tried?
3. Is LDA the best algorithm for this?
4. How to modify LDA so that instead of "1"s in the vector I would have real numbers representing probabilities of the word belonging to topics in this document? (again, I'm not sure if this makes sense in the context of LDA...). One reason for this is to avoid having identical vectors for similar words, such as "cat" and "dog".
5. How such sparse vectors would compare to vectors generated with word2vec?
6. Is it possible to somehow make sure that related topics would correspond to positions in the vector that are nearby?

Thanks!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20160528/65d3e337/attachment.html>


More information about the Topic-models mailing list