[Topic-models] Issue when running hdp on huge dataset.

Satwik B satwik_bh at hotmail.com
Mon Apr 10 16:18:00 EDT 2017


Hi,


I'm performing topic modeling on a dataset which consists of "132757" documents. To infer the number of topics present in the dataset, I am using the hdp code (https://github.com/blei-lab/hdp)

However, I'm facing an issue where the program stops abruptly when computing number of topics. Any suggestions on where exactly I'm going wrong or what can be improved.

Program starts with following parameters:
algorithm:          = train
data_path:          = /mnt/Topic_Modelling/input/corpus.lda-c
directory:          = /mnt/Topic_Modelling/output/
max_iter            = 1000
save_lag            = 100
init_topics         = 0
random_seed         = 1491808814
gamma_a             = 1.00
gamma_b             = 1.00
alpha_a             = 1.00
alpha_b             = 1.00
eta                 = 0.50
#restricted_scans   = 5
saved model_path    = /mnt/Topic_Modelling/
split-merge         = no
sampling hyperparam = no

reading data from /mnt/Topic_Modelling/input/corpus.lda-c
number of docs  : 132757
number of terms : 2048948
number of total words : 94196755
gsl: ../gsl/gsl_rng.h:200: ERROR: invalid n, either 0 or exceeds maximum value of generator
Default GSL error handler invoked.
Aborted (core dumped)


Kind regards

Satwik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20170410/7b0e040e/attachment.html>


More information about the Topic-models mailing list