[Topic-models] Help build an evaluated topic dataset

David Mimno mimno at cornell.edu
Thu Jun 9 11:27:29 EDT 2016

We need more examples of human-evaluated topic models. I trained a 50-topic
model on questions and answers from the CrossValidated site,
http://stats.stackexchange.com/. These are available freely from archive.org.
Evaluate the topics here:


(Can you find the topic modeling topic?)

If I get enough non-troll responses, I'll post the documents, the Mallet
state file, and the response spreadsheet on a github repo.

To create this form I went to http://scripts.google.com and used this code:

function createForm() {

var form = FormApp.create('Topic Coherence')
.setDescription("Each list of terms represents a topic. Evaluate each
topic's coherence on a scale from 1 to 5. Does a topic contain terms that
you would expect to see together on a page? Does it contain terms that
would work together as search queries? Could you easily think of a short
descriptive label? A Coherent topic (5) should be clear, consistent, and
readily interpretable. A Problematic topic (3) should have some related
words but might merge two unrelated concepts or contain several off-topic
words. A Useless topic (1) should have no obvious connection between more
than two or three words.");

var topics = ["time series data model trend noise signal period change
seasonal autocorrelation level arima structure analysis process spatial
trends frequency lag",
"distribution random normal distributions variables variance independent
variable distributed sigma probability gaussian poisson case uniform
process theorem function mixture sample"];

topics.forEach(function (topic) {
    .setBounds(1, 5)
    .setLabels("Useless", "Coherent");

Logger.log('Published URL: ' + form.getPublishedUrl());
Logger.log('Editor URL: ' + form.getEditUrl());

