[Topic-models] using CTM to evaluate word relatedness
邵元龙
shaoyuanlong at cad.zju.edu.cn
Sun Nov 9 13:34:54 EST 2008
I think inner product (cosine) of the two vectors may be reasonable,
since they can be considered as the joint probability of word pairs
p(w_i, w_j) = sum_z [p(w_i|z)p(w_i|z)p(z)]
-----邮件原件-----
发件人: Mike Stipicevic [mailto:stipim at rpi.edu]
发送时间: 2008年11月9日 23:53
收件人: topic-models at lists.cs.princeton.edu list
主题: [Topic-models] using CTM to evaluate word relatedness
Hi all,
I would like to use CTM to evaluate the relatedness between two terms. My first instinct is to take the KxV beta matrix and select two columns -- one for each term. Thus, each term has a K vector of topics, and these vectors can be compared by euclidean distance (or perhaps the cosine). However, I am not sure if this approach is valid mathematically. The vectors certainly are not a distribution of anything, and I worry that a more rigorous approach would provide different answers.
My other thought was to treat each word as a 'document' and run an inference on this document. I can then compare the lambdas of the two one-word documents. However, I believe this method will be drastically slower than the one mentioned above.
I appreciate any thoughts on this. Apologies if this was already asked on the list; I couldn't find a way to search the archives.
Thank you for your time,
- Mike
--
Mike Stipicevic
Chairman, RPI Student Branch of the IEEE
stipim at rpi.edu
mstipicevic at ieee.org
More information about the Topic-models
mailing list