[Topic-models] Topic-models Digest, Vol 111, Issue 4

Sameendra Samarawickrama smsamrc at gmail.com
Tue Oct 20 21:28:08 EDT 2015


Hi Sophie, thanks a lot for the reply!

Only words that have a count >0 are displayed. If you get less its because
> there are no more words that occur with that topic. Especially with a large
> number of topics, most topics will have few words.


​So if I want to get 50 topic words for each topic, do I have to process
the topic weights file (got from, -topic-word-weights-file
topic_word_weights.txt) and rank words for each topic based on the weight
there and get the top 50?

On Wed, Oct 21, 2015 at 3:00 AM, <
topic-models-request at lists.cs.princeton.edu> wrote:

> Send Topic-models mailing list submissions to
>         topic-models at lists.cs.princeton.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.cs.princeton.edu/mailman/listinfo/topic-models
> or, via email, send a message with subject or body 'help' to
>         topic-models-request at lists.cs.princeton.edu
>
> You can reach the person managing the list at
>         topic-models-owner at lists.cs.princeton.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Topic-models digest..."
>
> Today's Topics:
>
>    1. Mallet topic modeling: topics of variable length disregard
>       the value set using --num-top-words parameter
>       (Sameendra Samarawickrama)
>    2. Re: Mallet topic modeling: topics of variable length
>       disregard the value set using --num-top-words parameter
>       (sophie burkhardt)
>
>
> ---------- Forwarded message ----------
> From: Sameendra Samarawickrama <smsamrc at gmail.com>
> To: "topic-models at lists.cs.princeton.edu" <
> topic-models at lists.cs.princeton.edu>
> Cc:
> Date: Tue, 20 Oct 2015 10:43:46 +1100
> Subject: [Topic-models] Mallet topic modeling: topics of variable length
> disregard the value set using --num-top-words parameter
> sorry for the repost but I'm really stuck here without a success. I tried
> this to a different data but same thing happens.
>
> I found that when applying LDA on to my dataset, number of topic words
> doesn't match the specified number of topic words using the
> "-num-topc-words" parameter.
>
> So for example if I run LDA with following configuration,
>
> mallet train-topics --input $posts.mallet --num-topics 1000 --num-top-words 50 --output-topic-keys topics.txt
>
> the topics in my "topics.txt" are of variable length, most of them doesn't
> even have 10 topic words when I'm supposed to have 50.
>
> This happens only when I'm trying to fit a large no. of topics (e.g., t >
> 500).
>
> Does anybody know why is this happening so?
>
>
> ---------- Forwarded message ----------
> From: sophie burkhardt <soburkha at uni-mainz.de>
> To: "topic-models at lists.cs.princeton.edu" <
> topic-models at lists.cs.princeton.edu>
> Cc:
> Date: Tue, 20 Oct 2015 11:55:06 +0200
> Subject: Re: [Topic-models] Mallet topic modeling: topics of variable
> length disregard the value set using --num-top-words parameter
> Only words that have a count >0 are displayed. If you get less its because
> there are no more words that occur with that topic. Especially with a large
> number of topics, most topics will have few words.
>
> 2015-10-20 1:43 GMT+02:00 Sameendra Samarawickrama <smsamrc at gmail.com>:
>
>> sorry for the repost but I'm really stuck here without a success. I tried
>> this to a different data but same thing happens.
>>
>> I found that when applying LDA on to my dataset, number of topic words
>> doesn't match the specified number of topic words using the
>> "-num-topc-words" parameter.
>>
>> So for example if I run LDA with following configuration,
>>
>> mallet train-topics --input $posts.mallet --num-topics 1000 --num-top-words 50 --output-topic-keys topics.txt
>>
>> the topics in my "topics.txt" are of variable length, most of them
>> doesn't even have 10 topic words when I'm supposed to have 50.
>>
>> This happens only when I'm trying to fit a large no. of topics (e.g., t >
>> 500).
>>
>> Does anybody know why is this happening so?
>>
>> _______________________________________________
>> Topic-models mailing list
>> Topic-models at lists.cs.princeton.edu
>> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>>
>>
>
> _______________________________________________
> Topic-models mailing list
> Topic-models at lists.cs.princeton.edu
> https://lists.cs.princeton.edu/mailman/listinfo/topic-models
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/topic-models/attachments/20151021/a6f44147/attachment.html>


More information about the Topic-models mailing list