[Ml-stat-talks] Jonathan Chang, 11AM, Tuesday 9/6: Uncovering, Understanding, and Predicting Links

David Blei blei at CS.Princeton.EDU
Mon Sep 5 20:09:25 EDT 2011

hi all

jonathan chang's FPO (our version of a thesis defense) will take place
tomorrow 9/6 at 11AM in B327 of the equad.  jonathan is a data
scientist at facebook, where he does wonderful work correlating FB's
massive data stream to a variety of social variables.

his talks are lucid and surprising.  do not miss this.

the abstract is below my signature.



Uncovering, Understanding, and Predicting Links

Jonathan Chang
Equad B327
September 6, 2011 at 11:00AM

Network data, such as citation networks of documents, hyperlinked
networks of web pages, and social networks of friends, are pervasive
in applied statistics and machine learning. The statistical analysis
of network data can provide both useful predictive models and
descriptive statistics. Predictive models can point social network
mem- bers towards new friends, scientific papers towards relevant
citations, and web pages towards other related pages. Descriptive
statistics can uncover the hidden community structure underlying a
network data set.

In this work we develop new models of network data that account for
both links and attributes. We also develop the inferential and
predictive tools around these models to make them widely applicable to
large, real-world data sets. One such model, the Relational Topic
Model can predict links using only a new node’s attributes. Thus, we
can suggest citations of newly written papers, predict the likely
hyperlinks of a web page in development, or suggest friendships in a
social network based only on a new user’s profile of interests.
Moreover, given a new node and its links, the model provides a
predictive distribution of node attributes. This mechanism can be used
to predict keywords from citations or a user’s interests from his or
her social connections.

While explicit network data — network data in which the connections
between people, places, genes, corporations, etc. are explicitly
encoded — are already ubiquitous, most of these can only annotate
connections in a limited fashion. Although relationships between
entities are rich, it is impractical to manually devise complete
characterizations of these relationships for every pair of entities on
large, real-world corpora. To resolve this we present a probabilistic
topic model to analyze text cor- pora and infer descriptions of its
entities and of relationships between those entities. We show
qualitatively and quantitatively that our model can construct and
annotate graphs of relationships and make useful predictions.

More information about the Ml-stat-talks mailing list