[Topic-models] Training Classifier with multi-labeled data (Daniel Ramage)
pewster at gmail.com
Fri Mar 26 14:22:38 EDT 2010
we did something similar to Shuang Hong's paper but with emphasis on the
application of image annotation. The idea is that you treat an image as a
document and treat caption words as class labels. Since each image is
annotated with multiple caption words (each document has multiple class
labels), we extend the original sLDA model to account for multivariate
Bernoulli response variable of caption words. Instead of a 1-dimensional
response variable, we have a T-dimensional multivariate Bernoulli response
variable where T is the number of terms in the caption vocabulary. Our
variational inference algorithm seems quite similar to the Shuang Hong's
paper since we also use the convex dual representation of logistic sigmoid
function to make the computation tractable. Please check out our icassp 2010
paper attached in this email.
On Fri, Mar 26, 2010 at 7:48 AM, David Blei <david.blei at gmail.com> wrote:
> hi hong,
> thanks for pointing us to your paper. it was very interesting.
> there is something that confused me. i understood the perspective of
> your model as a sophisticated slda with a binary vector response.
> however, it wasn't clear to me how the components of the z variable
> are interpretable as the "class" for each word or paragraph. while
> you set the dimension of z to be the same as the number of classes,
> your predictive model (equation 1) is a multivariate logistic
> regression from z-bar to y. i don't see what ties the c-th component
> of z to the c-th class in y. is z observed in your data? (from the
> graphical model, it doesn't seem to be.)
> if there's no conceptual block to it, i think it would be interesting
> to explore the effect of the number of topics on your predictive
> i'd also be interested to hear about the future work that you mention,
> where you model label sparsity with lasso-style regularization.
> thanks again for sending the paper.
> On Thu, Mar 4, 2010 at 3:39 PM, YANG,Shuang Hong <eeshyang at gmail.com>
> > Hi All:
> > A probably naive idea to tailor LDA for ambiguous data analysis such as
> > multi-label classification is to use topics directly as class labels,
> say, k
> > = #classes, theta = the class mixture, z = the per-word class
> > We explored this idea
> > in http://www.cc.gatech.edu/~syang46/papers/NIPS09.pdf<http://www.cc.gatech.edu/%7Esyang46/papers/NIPS09.pdf>
> > where similar to SLDA, a side variable Y = the label observation
> > was augmented.
> > This model is barely a different interpretation of SLDA, or alternatively
> > could be viewed as Bayesian treatment to the
> > naive Bayes classifier, yet it beats SVM on text classification (both
> > text and short text such as web search queries) in our experiments -- the
> > reported results use z as per-paragraph class assignment for normal
> > but we found using z as per-word class assignment gives similar
> > Any comment to this is greatly appreciated.
> > Shang
> > On Thu, Mar 4, 2010 at 12:00 PM,
> > <topic-models-request at lists.cs.princeton.edu> wrote:
> >> From: Daniel Ramage <dramage at cs.stanford.edu>
> >> To: Liu Bin <korolevbin at gmail.com>
> >> Date: Wed, 03 Mar 2010 09:46:56 -0800
> >> Subject: Re: [Topic-models] Training Classifier with multi-labeled data
> >> Hi Bin,
> >> One option is to use Labeled LDA,
> >> http://www.aclweb.org/anthology/D/D09/D09-1026.pdf which constrains
> >> document's topic distribution to align with the document's label space.
> >> Because the per-document topics in this model are actually observed,
> >> less of a latent and more of a blatant dirichlet allocation. It's
> >> competitive with an SVM baseline in our experiments, but state of the
> >> discriminative models still beat it.
> >> dan
> > _______________________________________________
> > Topic-models mailing list
> > Topic-models at lists.cs.princeton.edu
> > https://lists.cs.princeton.edu/mailman/listinfo/topic-models
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
putthi at ucsd.edu
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 468296 bytes
Desc: not available
More information about the Topic-models