Chong Wang will present his research seminar/general exam on Wednesday Jan 21 at 2PM in Room 402. The members of his committee are: David Blei (advisor), Fei-Fei Li, and Rob Schapire. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. -------------------------------------------- Learning topic models from multiple corpora Abstract In this talk, we consider the problem of learning topic models from multiple corpora. Examples of multi-corpora data include news articles from different times or different locations, and scientific papers from different conferences or different years. The majority of topic models, however, usually ignore this addition information. Simply combining multiple corpora into one large corpus or treating each corpus individually can't provide the ability to analyze the high-level relations among multiple corpora, e.g., are topics similar or different from time to time or from location to location? How are they related to each other? I will describe two new approaches for learning topic models from multiple corpora: continuous time dynamic topic model (cDTM) and Markov topic model(MTM). In cDTMs, documents from the same time point are considered as a corpus. The cDTM extends dynamic topic models (DTMs) by using Brownian motion to model the latent topics through the time line. We derive an efficient variational approximate inference algorithm that takes advantage of the sparsity of observations in text, a property that lets us easily handle many time points. Thus, cDTM is able to discover topic evolutions in a much finer time resolution. In MTMs, papers from the same conference are treated as a corpus. Then we apply Gaussian (Markov) random fields to model the correlations of different corpora. MTMs capture both the internal topic structure within each corpus and the relationships between topics across the corpora. In addition, we will show cDTMs and DTMs can be formulated as special cases of MTMs. Quantitative results and qualitative discoveries (interesting topic patterns) will also be presented. Books: 1) Pattern Recognition and Machine Learning, by Christopher M. Bishop, Springer, 2006 chapters: 1, 2, 3.1-3.5, 4, 5, 8, 9, 10, 11.1-11.3, 12.1-12.2, 13, 14. 2) An Introduction to Probabilistic Graphical Models (unpublished manuscript), by Michael I. Jordan, 2002 chapters: 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 15 4) (Optional) Artificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig, Prentice Hall Series in Artificial Intelligence, 2003 chapters: 3.1-3.5, 4.1-4.3, 13, 14, 15.1-15.5, 18.1-18.3, 20.1-20.5, 23.2-23.3 Papers: 1) D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993-1022, 2003. 2) D. Blei and J. Lafferty. Dynamic topic models. In ICML, 2006. 3) T. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci USA, 101 Suppl 1:5228-5235, April 2004. 4) C. Andrieu, N. de Freitas, A. Doucet, and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50(1):5-43, January 2003. 5) M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. An introduction to variational methods for graphical models. Machine Learning, 37(2):183-233, 1999. 6) Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006. 7) L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning natural scene categories. In CVPR, 2005. 8) Havard Rue and Turid Follestad, Gaussian Markov Random field models with applications in spatial statistics, preprint, 2003 9) D. Blei and J. Lafferty. Correlated topic models. In NIPS, 2005. 10) R. Kalman. A new approach to linear filtering and prediction problems. Transaction of the AMSE: Journal of Basic Engineering, 82:35-45, 1960. 11) L. Ruschendorf. Convergence of the iterative proportional fitting procedure. The Annals of Statistics, 23(4):1160-1174, 1995. 12) X. Wang and A. McCallum. Topics over time: a non-Markov continuous-time model of topical trends. In KDD, 2006.
participants (1)
-
Melissa Lawson