[Ml-stat-talks] Gungor Polatkan, Monday 1PM

David Blei blei at CS.Princeton.EDU
Wed Aug 15 17:06:24 EDT 2012


hi ml-stat-talks,

gungor polatkan defends his dissertation on monday august 20th.
please join us if you are interested in probability models, bayesian
nonparametrics, efficient posterior inference, image analysis, or how
to write a great phd in machine learning.

best
dave


Extracting Information from High-Dimensional Data: Probabilistic
Modeling, Inference and Evaluation

Gungor Polatkan
Monday, August 20th, 2012, 1:00pm
Equad, B327

Abstract:

Data science is an emerging field at the interface of computer
science, statistics, mathematics and signal processing.  This field is
undergoing an explosive growth, mainly due to the widespread use of
tools, such as  the internet and mobile devices, that lead to the
massive accumulation of data from different sources. The sheer size of
these data sets requires large scale computational (rather than
human-powered) data analysis and decision making, and advances in
computing resources are a driving force in this growth. However, the
scale and high dimensionality of data are such that, powerful
present-day computing resources  can only partially address the
complexity of the problems -- they need to be paired with advanced
techniques and algorithms.

A typical data analysis project consists of several stages: initial
exploratory analysis, model building, derivation of inference,
visualization, and evaluation. In modern data science, one important
problem of the model-building phase is how to incorporate
data-specific properties to the models. Early machine learning
techniques were designed to work on generic data sets, using
parameters specified a priori. However, as the diversity and
complexity of the data sets grew, more advanced approaches were
needed, tailored to the particular properties of the type of
application under study. Such tailoring can take many different forms.
For instance, it may be necessary to learn the model parameters from
the data (instead of specifying them from the start); one can
incorporate prior information (such as sparsity with respect to
special representations, which themselves have to be learned); it may
be beneficial to make use of relational structure within the data,
which can be available in many guises: segmented image patches,
citation networks of documents, social networks of friends.

In this talk, we shall visit all these approaches, each time within a
probabilistic model built so as to incorporate prior information. More
precisely, we shall derive, in a variety of settings, and for
different applications, efficient posterior inference algorithms
handling large data sets, and use side information to derive superior
inference techniques. We demonstrate the efficiency and accuracy of
those models and algorithms in the different applications (e.g. image
super-resolution, recommendation systems, time series analysis), on
both real and synthetic data sets. We evaluate the quality of the
results, with both quantitative and human evaluation experiments.


More information about the Ml-stat-talks mailing list