[Topic-models] Finding annotated datasets

Devashish Deshpande ashu.9412 at gmail.com
Wed Jun 8 14:35:57 EDT 2016

Hey everyone,

My name is Devashish Deshpande. I am a contributor to the Gensim open
source topic modelling library in python and am currently working on a
project to add the topic coherence pipeline as mentioned in this paper
<http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf> and
demonstrated in this code
to gensim. You can find my open PR here

For the purpose of writing a blog post on this project and performing some
benchmark testing, I wanted to reproduce table 2 from the above paper.
However I was finding it hard to find the annotated datasets that were used
for this. I did manage to find some links (eg the annotated movies dataset
<http://topics.labs.bluekiwi.de/data/nips2013>, RTL NYT
<https://catalog.ldc.upenn.edu/LDC2008T19>, genomics
<http://ir.ohsu.edu/genomics>) but none of them seem to be working. Is
there any other place where I can download any of these datasets from?

Any help from will be greatly appreciated!

