[Topic-models] Jstor Data, as used in CTM, or other LDA benchmark sets.
blei at CS.Princeton.EDU
Thu Mar 24 09:46:11 EDT 2011
at the time of publication, we weren't legally allowed to distribute
the data. however, JSTOR has recognized its value to the academic
community. they have created the "data for research" site for
distributing word counts. see
in the top left is a menu called "data requests."
On Thu, Mar 24, 2011 at 7:05 AM, Anja Pilz <Anja.Pilz at iais.fraunhofer.de> wrote:
> Dear all,
> I am interested in the data used in Blei & Lafferty's Correlated Topic Model
> (2007), which is part of the JSTOR dataset.
> After browsing the relevant publications/Jstor I fear, that it is not
> publicly available.
> Is that true or does somebody have access to it?
> Apart from that corpus, are there some "benchmark" datasets, to evaluate
> different LDA variants?
> Kind regards,
> Anja Pilz
> Anja Pilz
> Abteilung Knowledge Discovery
> Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme
> Schloss Birlinghoven, D-53754 St. Augustin
> Telefon: +49 (0) 22 41 14-22 48
> Fax: +49 (0) 22 41 14-23 24
> E-Mail: anja.pilz at iais.fraunhofer.de
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
More information about the Topic-models