[Topic-models] using CTM to evaluate word relatedness

邵元龙 shaoyuanlong at cad.zju.edu.cn
Sun Nov 9 13:34:54 EST 2008


I think inner product (cosine) of the two vectors may be reasonable, 
since they can be considered as the joint probability of word pairs
p(w_i, w_j) = sum_z [p(w_i|z)p(w_i|z)p(z)]

-----邮件原件-----
发件人: Mike Stipicevic [mailto:stipim at rpi.edu] 
发送时间: 2008年11月9日 23:53
收件人: topic-models at lists.cs.princeton.edu list
主题: [Topic-models] using CTM to evaluate word relatedness

Hi all,

I would like to use CTM to evaluate the relatedness between two terms. My first instinct is to take the KxV beta matrix and select two columns -- one for each term. Thus, each term has a K vector of topics, and these vectors can be compared by euclidean distance (or perhaps the cosine). However, I am not sure if this approach is valid mathematically. The vectors certainly are not a distribution of anything, and I worry that a more rigorous approach would provide different answers.

My other thought was to treat each word as a 'document' and run an inference on this document. I can then compare the lambdas of the two one-word documents. However, I believe this method will be drastically slower than the one mentioned above.

I appreciate any thoughts on this. Apologies if this was already asked on the list; I couldn't find a way to search the archives.

Thank you for your time,
- Mike
-- 
Mike Stipicevic
Chairman, RPI Student Branch of the IEEE

stipim at rpi.edu
mstipicevic at ieee.org




More information about the Topic-models mailing list