[Topic-models] Question regarding automatic evaluation of topic models using PMI/NPMI

Jocelyn Mazarura jocelynmazarura at yahoo.com
Fri Apr 7 09:03:20 EDT 2017

My question is inspired by the article Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality by Lau et al. (2014). (See full reference below.)

When estimating the joint and marginal probabilities for the PMI/NPMI, is it ok to use the original data I would have used to extract the topics to estimate these probabilities instead of using another large corpus like English Wikipedia like they do in the original article?
Reference: Lau, J.H., Newman, D. and Baldwin, T., 2014, April. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In EACL (pp. 530-539).
Kind regards

Jocelyn Mazarura

