[Topic-models] Comparison of different tools for LDA

Ian ian.wood at anu.edu.au
Thu Jun 9 11:28:42 EDT 2016


Hi Shayan,

As Radoslaw suggests, I’d certainly have a look at recent work on topic models combined with word embeddings/neural networks. One particular state of the art (non-NN) model I’m aware of, and may be worth looking at, is Wray Buntines hca <https://github.com/wbuntine/topic-models>.

As a first step at applied topic modelling, I’d suggest Mallet with hyperparameter optimisation turned on (and a preference for a larger number of topics) - in many cases this produces better quality topics and has a similar effect to non-parametric models (which choose the number of topics for you). I know that Gensim is able to run the Mallet models, and I’m pretty sure it’s doable from R as well. 

Hope that helps
Best
Ian

> On 9 Jun 2016, at 3:47 pm, <topic-models-request at lists.cs.princeton.edu> <topic-models-request at lists.cs.princeton.edu> wrote:
> 
> From: "Kowalski, Radoslaw" <radoslaw.kowalski.14 at ucl.ac.uk <mailto:radoslaw.kowalski.14 at ucl.ac.uk>>
> Subject: Re: [Topic-models] Comparison of different tools for LDA
> Date: 9 June 2016 10:08:21 am GMT+1
> To: Shayan A Tabrizi <shayantabrizi at gmail.com <mailto:shayantabrizi at gmail.com>>, topic-models <Topic-models at lists.cs.princeton.edu <mailto:Topic-models at lists.cs.princeton.edu>>
> 
> 
> Hi Shayan,
> 
> I would discourage you from using R because it has few robust packages for deep learning. My opinion is that deep learning is likely going to be used a lot in topic modelling in the future. In where I am in UCL we often use gensim but the list of relevant python packages is much longer. Gensim is not always a golden solution to every topic model problem. You may find easier to use python packages for specific problems.
> 
> All the best,
> Radoslaw
> 
> Radoslaw Kowalski
> PhD Student
> ______________________________
> Consumer Data Research Centre
> UCL Department of Political Science
> ______________________________
> T:  020 3108 1098 x51098
> E:  radoslaw.kowalski.14 at ucl.ac.uk <mailto:n.vij at ucl.ac.uk>W:  <http://www.cdrc.ac.uk/>www.cdrc.ac.uk <http://www.cdrc.ac.uk/>
> Twitter:@CDRC_UK
>  <http://www.cdrc.ac.uk/>
> From: topic-models-bounces at lists.cs.princeton.edu <mailto:topic-models-bounces at lists.cs.princeton.edu> <topic-models-bounces at lists.cs.princeton.edu <mailto:topic-models-bounces at lists.cs.princeton.edu>> on behalf of Shayan A Tabrizi <shayantabrizi at gmail.com <mailto:shayantabrizi at gmail.com>>
> Sent: 08 June 2016 21:57:25
> To: topic-models
> Subject: [Topic-models] Comparison of different tools for LDA
>  
> Dear Topic-Modelers,
> 
> There are several tools for LDA. But I don't know which one is better and when? I wonder if anyone could guide me in choosing one toolbox. My priorities are ease-of-use and supporting various variations and extensions of LDA.
> Some but not all of the candidates are:
> 1- MALLET (Java)
> 2- gensim (Python)
> 3- topicmodels (R)
> 4- Stanford Topic Modeling Toolbox
> 
> Thanks in advance,
> Shayan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20160609/38d7e256/attachment-0001.html>


More information about the Topic-models mailing list