[Topic-models] Associated Press corpus

Normand Peladeau peladeau at provalisresearch.com
Mon Dec 5 09:28:09 EST 2016


The lda-c program by Blei comes with a corpus of 2246 documents from
Associated Press as well as a list of words associated with 100 topics
obtained from it:

 

                https://www.cs.princeton.edu/~blei/lda-c/ap-topics.pdf

 

There are, however, two topics that are duplicates of existing ones (the
last one starting with "housing" occurs twice, and one just above starting
with "farmers" also occur twice).  Is this something that is possible with
LDA or is it simply a human mistake in the PDF report of the topics?

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20161205/84faa32e/attachment.html>


More information about the Topic-models mailing list