Miro Dudik will present his preFPO on Friday February 9 at 10AM in Room 402. The members of his committee are Rob Schapire, advisor; David Blei and Stephen Phillips (AT&T), readers; Moses Charikar and Olga Troyanskaya, nonreaders. Everyone is invited to attend his talk. His abstract follows below. ----------------------------------------- Maximum entropy, generalized regularization, and modeling species habitats Maximum entropy (maxent) approach, equivalent to maximum likelihood, is a widely used method for estimating probability distributions. However, when trained on small datasets, maxent is likely to overfit. Therefore, many smoothing techniques were proposed to mitigate overfitting. In my dissertation, I propose a unified treatment for a large and general class of smoothing techniques including L1 and L2 regularization. As a result, it is possible to prove non-asymptotic performance guarantees and derive novel regularizations based on structure of the sample space. To obtain solutions for a large class of maxent problems, I propose new algorithms derived from boosting and iterative scaling. Convergence of these algorithms is proved using a novel method, which unifies previous approaches based on information geometry and compactness. As an application of maxent, I discuss an important problem in ecology: modeling distributions of biological species. Regularized maxent fits this problem well and offers several advantages over previous techniques. In particular, it addresses the problem in a statistically sound manner and allows principled extensions to situations when data is collected in a biased manner or when we have access to data on many related species. The utility of maxent is demonstrated on large real-world datasets.
participants (1)
-
Melissa M Lawson