[Topic-models] Finding annotated datasets

Devashish Deshpande ashu.9412 at gmail.com
Wed Jun 8 14:35:57 EDT 2016


Hey everyone,

My name is Devashish Deshpande. I am a contributor to the Gensim open
source topic modelling library in python and am currently working on a
project to add the topic coherence pipeline as mentioned in this paper
<http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf> and
demonstrated in this code
<https://github.com/AKSW/Palmetto/tree/master/src/main/java/org/aksw/palmetto>
to gensim. You can find my open PR here
<https://github.com/piskvorky/gensim/pull/710>.

For the purpose of writing a blog post on this project and performing some
benchmark testing, I wanted to reproduce table 2 from the above paper.
However I was finding it hard to find the annotated datasets that were used
for this. I did manage to find some links (eg the annotated movies dataset
<http://topics.labs.bluekiwi.de/data/nips2013>, RTL NYT
<https://catalog.ldc.upenn.edu/LDC2008T19>, genomics
<http://ir.ohsu.edu/genomics>) but none of them seem to be working. Is
there any other place where I can download any of these datasets from?

Any help from will be greatly appreciated!

Thanks!
Devashish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20160608/f3bada7c/attachment.html>


More information about the Topic-models mailing list