Speaker: Narayana P.Santhanam, UC Berkeley Date: Tuesday, April 17, 2007 Time: 11:00am Room: B205 ~ EQuad Title: New solution for old problems: Large alphabet probability estimation Abstract: Modern advancements in communication, computation, and storage has made possible complex systems like the Internet as well as helped scientific advances like the Genome project, none of which would have been conceivable when I was born. New advancements bring about new problems. Two aspects of some of these problems have captured my interest. First, a fair number of these problems require solutions for very large alphabets. For instance, language models for speech recognition estimate distributions over English words; thousands of genes are clustered by their expression levels for applications in diagnosis and drug response prediction using the limited number of samples that can be obtained from test subjects. On the other hand, a lot of results in both statistics and information theory assumes that we operate in a regime where the data size is much larger than the alphabet size. We are therefore forced to rework problems where conventional approaches no longer apply. Second, problems posed by different systems are interconnected. For example, consider text compression on the one hand, along with language modeling for speech recognition on the other. The former tries to compress as well as the unknown underlying distribution, the latter estimates word probabilities associated with the unknown underlying distribution. The talk will examine some recent results in the related areas of large alphabet probability estimation and data compression. These results should be seen as a first step towards new solutions for classification, entropy estimation and inference problems arising from modern finance, biology, and data mining. Bio Narayana Santhanam obtained his MS and PhD from the University of California, San Diego in 2003 and 2006 respectively. He currently holds a postdoctoral position in the University of California, Berkeley. He is the recepient of the 2003 Capocelli Prize for student authored papers at the Data Compression Conference and the 2006 IEEE Best Paper Award along with Prof. Alon Orlitsky and Dr. Junan Zhang. His research interests include large alphabet problems, the intersection of information theory and machine learning, and their applications.