[Topic-models] A new paper that combines word embedding and LDA

Shaohua Li shaohua at gmail.com
Sun Jun 12 07:02:41 EDT 2016


Our ACL 2016 paper "Generative Topic Embedding: a Continuous Representation
of Documents" has just been posted on arXiv. I thought this might be of
some interest to this community :)
https://arxiv.org/abs/1606.02979

Our work proposes to combine word embedding with LDA, and obtains TopicVec,
in which the topic-word distribution is a softmax function instead of a
multinomial as in LDA. On a small set of documents (even on one document),
TopicVec can derive coherent topics. The topic proportions and topic
embeddings can jointly represent a document.

The Python implementation of TopicVec is available at:
https://github.com/askerlee/topicvec
(A user manual will be added soon.)

Abstract:

Word embedding maps words into a low-dimensional continuous embedding space
by exploiting the local word collocation patterns in a small context
window. On the other hand, topic modeling maps documents onto a
low-dimensional topic space, by utilizing the global word collocation
patterns in the same document. These two types of patterns are
complementary. In this paper, we propose a generative topic embedding model
to combine the two types of patterns. In our model, topics are represented
by embedding vectors, and are shared across documents. The probability of
each word is influenced by both its local context and its topic. A
variational inference method yields the topic embeddings as well as the
topic mixing proportions for each document. Jointly they represent the
document in a low-dimensional continuous space. In two document
classification tasks, our method performs better than eight existing
methods, with fewer features. In addition, we illustrate with an example
that our method can generate coherent topics even based on only one
document.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20160612/706dbe50/attachment.html>


More information about the Topic-models mailing list