[Ml-stat-talks] Fwd: [Theory-Group] [pvsnp-all] CCI Monthly Meeting - Friday December 2

David Blei blei at CS.Princeton.EDU
Wed Nov 30 15:33:34 EST 2011

hi all

this week's CCI meeting is devoted to a theoretical perspective on learning.


---------- Forwarded message ----------
From: Zeev Dvir <zeev.dvir at gmail.com>
Date: Wed, Nov 30, 2011 at 8:53 AM
Subject: Fwd: [Theory-Group] [pvsnp-all] CCI Monthly Meeting - Friday December 2
To: David Blei <blei at cs.princeton.edu>

---------- Forwarded message ----------
From: "Grant Schoenebeck" <schoenebeck at gmail.com>
Date: Nov 18, 2011 7:37 AM
Subject: [Theory-Group] [pvsnp-all] CCI Monthly Meeting - Friday December 2
To: <theory-group at lists.cs.princeton.edu>
Cc: "expeditions project list" <pvsnp-all at lists.cs.princeton.edu>

Dear all,

The December CCI monthly meeting will be December 2nd in room 402 at
the CS dept. The day will be devoted to discussing and hearing about
new alternative approaches to “learning”.

Schedule and abstracts below. I hope to see you for there.





10am-10:30am PC meeting

10:30-10:50am Sanjeev Arora: Is Machine Learning Easy?

10:50am-11:30 am Aravindan Vijayaraghavan: Approximation Algorithms
for Semi-random Partitioning problems

11:30am-12:10 pm Rong Ge: Computing a Nonnegative Matrix Factorization
— Provably

12:10pm-1:25pm Lunch

1:25pm-1:45pm Grant Schoenebeck: Finding Overlapping Communities in
Social Networks: Toward a Rigorous Approach

1:45pm - Avi Wigderson: Restriction Access



Sanjeev Arora: Is Machine Learning Easy?

I’ll briefly talk about the roadblock in computational learning theory
–namely, that most interesting learning problems are hard—and some
vague thoughts about how one may circumvent it, using some of the
day’s talks as examples.

Aravindan Vijayaraghavan: Approximation Algorithms for Semi-random
Partitioning problems.

We propose and study a new semi-random model for graph partitioning
problems.We believe that it captures many properties of real–world
instances. The model is more flexible than the semi-random model of
Feige and Kilian and planted random model ofBui, Chaudhuri, Leighton
and Sipser.

We develop a general framework for solving semi-random instances and
apply it to several problems of interest.We present constant factor
bi-criteria approximation algorithms for semi-random instances of the
Balanced Cut, Multicut, Min Uncut and Small-Set Expansion problems. We
also show how toalmost recover the optimal solution if the instance
satisfies an additional expanding condition. Our algorithmswork in a
wider range of parameters than algorithms for previously studied
random and semi-random models.

Joint work with Konstantin Makarychev and Yury Makarychev

Rong Ge: Computing a Nonnegative Matrix Factorization — Provably

In the Nonnegative Matrix Factorization (NMF) problem we are given an
$n \times m$ nonnegative matrix $M$ and an integer $r > 0$. Our goal
is to express $M$ as $A W$ where $A$ and $W$ are nonnegative matrices
of size $n \times r$ and $r \times m$ respectively. In some
applications, it makes sense to ask instead for the product $AW$ to
approximate $M$ — i.e. (approximately) minimize $\norm{M – AW}_F$
where $\norm{}_F$ denotes the Frobenius norm; we refer to this as
Approximate NMF. This problem has a rich history spanning quantum
mechanics, probability theory, data analysis, polyhedral
combinatorics, communication complexity, demography, chemometrics,
etc. In the past decade NMF has become enormously popular in machine
learning, where $A$ and $W$ are computed using a variety of local
search heuristics. Vavasis proved that this problem is NP-complete. We
initiate a study of when this problem is solvable in polynomial time:

1. We give a polynomial-time algorithm for exact and approximate NMF
for every constant $r$. Indeed NMF is most interesting in applications
precisely when $r$ is small.

2. We complement this with a hardness result, that if exact NMF can be
solved in time $(nm)^{o(r)}$, 3-SAT has a sub-exponential time
algorithm. This rules out substantial improvements to the above

3. We give an algorithm that runs in time polynomial in $n$, $m$ and
$r$ under the separablity condition identified by Donoho and Stodden
in 2003. The algorithm may be practical since it is simple and noise
tolerant (under benign assumptions). Separability is believed to hold
in many practical settings.
To the best of our knowledge, this last result is the first example of
a polynomial-time algorithm that provably works under a non-trivial
condition on the input and we believe that this will be an interesting
and important direction for future work

Joint work with Sanjeev Arora, Ravi Kannan, and Ankur Moitra.

Grant Schoenebeck Finding Overlapping Communities in Social Networks:
Toward a Rigorous Approach

A community in a social network is usually understood to be a group of
nodes more densely connected with each other than with the rest of the
network. This is an important concept in most domains where networks
arise: social, technological, biological, etc. For many years
algorithms for finding communites implicitly assumed communities are
nonoverlapping (leading to use of clustering-based approaches) but
there is increasing interest in finding overlapping communities. A
barrier to finding communities is that the solution concept is often
defined in terms of an NP-complete problem such as Clique or
Hierarchical Clustering.

This paper seeks to initiate a rigorous approach to the problem of
finding overlapping communities, where “rigorous” means that we
clearly state the following: (a) the object sought by our algorithm
(b) the assumptions about the underlying network (c) the (worst-case)
running time.

Our assumptions about the network lie between worst-case and
average-case. An average-case analysis would require a precise
probabilistic model of the network, on which there is currently no
consensus. However, some plausible assumptions about network
parameters can be gleaned from a long body of work in the sociology
community spanning five decades focusing on the study of individual
communities and {\em ego-centric networks} (in graph theoretic terms,
this is the subgraph induced on a node’s neighborhood). Thus our
assumptions are somewhat “local” in nature. Nevertheless they suffice
to permit a rigorous analysis of running time of algorithms that
recover global structure.
Our algorithms use random sampling similar to that in property testing
and algorithms for dense graphs. We note however that our networks are
not necessarily dense graphs, not even in local neighborhoods.

Our algorithms explore a local-global relationship between ego-centric
and socio-centric networks that we hope will provide a fruitful
framework for future work both in computer science and sociology.

Joint work with Sanjeev Arora, Rong Ge, and Sushant Sachdeva.

Avi Wigderson: Restriction Access

We introduce a notion of non-black-box access to computational devices
(such as circuits, formulas, decision trees, and so forth) that we
call restriction access. Restrictions are partial assignments to input
variables. Each restriction simpli fies the device, and yields a new
device for the restricted function on the unassigned variables. On one
extreme, full restrictions (assigning all variables) correspond to
evaluating the device on a complete input, yielding the result of the
computation on that input, which is the same as standard black-box
access. On the other extreme, empty restrictions (assigning no
variables) yield a full description of the original device. We explore
the grey-scale of possibilities in the middle. Different variants of
restriction access, in which the values to restricted variables, as
well as the subset of free (unassigned) variables, are generated
deterministically or randomly, in friendly or adversarial fashions may
naturally suit a variety of situations in computational learning,
computational biology, automated proofs, cryptography and complexity

Focusing on learning theory, we show that restriction access provides
a setting in which one can obtain positive results for problems that
have resisted attack in the black-box access model.
We introduce a PAC-learning version of restriction access, and show
that one can efficiently learn both decision trees and DNF formulas in
this model. These two classes are not known to be learnable in the PAC
model with black-box access.

Our DNF learning algorithm is obtained by a reduction to a general
learning problem we call population recovery, in which random samples
from an unknown distribution become available only after a random part
of each was erased. More speci fically, assume that every member of an
unknown population is described by a vector of values (to some xed set
of attributes). The algorithm has access to random samples, each of
which is a random member of the population, whose values are given
only for a random subset of the attributes!
Analyzing our efficient algorithm to fully recover the unknown
population calls for understanding another basic problem of
independent interest: “robust local inversion” of matrices. The
population recovery algorithm and construction of robust local
inverses for some families of matrices are the main technical
contributions of the paper.

Joint work with Zeev Dvir, Anup Rao, and Amir Yehudayoff

pvsnp-all mailing list
pvsnp-all at lists.cs.princeton.edu

Theory-Group mailing list
Theory-Group at lists.cs.princeton.edu

More information about the Ml-stat-talks mailing list