[Topic-models] Finding annotated datasets
roeder at informatik.uni-leipzig.de
Thu Jun 9 08:57:25 EDT 2016
unfortunately, the blog "http://topics.labs.bluekiwi.de/" does not exist any more and I am sorry for any inconveniences this might have caused.
In the paper, a dataset is defined by three parts:1. a corpus2. topics that have been calculated using the corpus3. human ratings for the topics
You can find the topics (topics* files) and the human ratings (gold* files) used for our paper at: http://18.104.22.168/mroeder/palmetto/datasets/ (I will add the link to the Palmetto web page).However, because of their license I am not allowed to upload the corpora. You would need them to recreate the upper part of the table. If you are interested in that part, please write me a mail and I can describe how you could get them.
Since we did not create all datasets by ourself, I would like to remind you to cite the creators/providers of the dataset where appropriate. You can find the reference of their publications in our paper in the section that describes the datasets.
From: Devashish Deshpande <ashu.9412 at gmail.com>
Date: Wed, Jun 8, 2016 at 8:35 PM
Subject: [Topic-models] Finding annotated datasets
To: topic-models at lists.cs.princeton.edu
My name is Devashish Deshpande. I am a contributor to the
Gensim open source topic modelling library in python and am currently
working on a project to add the topic coherence pipeline as mentioned in this paper and demonstrated in this code to gensim. You can find my open PR here.
the purpose of writing a blog post on this project and performing some
benchmark testing, I wanted to reproduce table 2 from the above paper. However I was finding it hard to find the annotated datasets that were used
for this. I did manage to find some links (eg the annotated movies dataset, RTL NYT, genomics) but none of them seem to be
working. Is there any other place where I can download any of these datasets from?
Any help from will be greatly appreciated!
Topic-models mailing list
Topic-models at lists.cs.princeton.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Topic-models