[Topic-models] Jstor Data, as used in CTM, or other LDA benchmark sets.

David Blei blei at CS.Princeton.EDU
Thu Mar 24 09:46:11 EDT 2011


dear anja

at the time of publication, we weren't legally allowed to distribute
the data.  however, JSTOR has recognized its value to the academic
community.  they have created the "data for research" site for
distributing word counts.  see

http://dfr.jstor.org/

in the top left is a menu called "data requests."

best
dave


On Thu, Mar 24, 2011 at 7:05 AM, Anja Pilz <Anja.Pilz at iais.fraunhofer.de> wrote:
> Dear all,
>
> I am interested in the data used in Blei & Lafferty's Correlated Topic Model
> (2007), which is part of the JSTOR dataset.
> After browsing the relevant publications/Jstor I fear, that it is not
> publicly available.
> Is that true or does somebody have access to it?
>
> Apart from that corpus, are there some "benchmark" datasets, to evaluate
> different LDA variants?
>
> Kind regards,
>
> Anja Pilz
>
> --
> Anja Pilz
> Abteilung Knowledge Discovery
> Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme
> Schloss Birlinghoven, D-53754 St. Augustin
> Telefon:    +49 (0) 22 41 14-22 48
> Fax:        +49 (0) 22 41 14-23 24
> E-Mail:    anja.pilz at iais.fraunhofer.de
> www.iais.fraunhofer.de
>
> _______________________________________________
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>
>


More information about the Topic-models mailing list