Jia Deng will present his research seminar/general exam on Thursday May 8 
at 1PM in Room 402.  The members of his committee are:  Kai Li (advisor), 
Fei-Fei Li, and David Blei.  Everyone is invited to attend his talk, and those 
faculty wishing to remain for the oral exam following are welcome to do so.
His abstract and reading list follow below.

Constructing ImageNet

Data sets are essential in computer vision and content based image retrieval research. We
present the work in progress for constructing ImageNet, a large scale image data set based
on the Princeton WordNet.
The goal is to associate more than 1000 clean images with each node of WordNet, which
consists of ~30,000 ( estimated ) imagable nodes. We build a prototype system for
constructing ImageNet, as a first step toward large scale deployment. For each node of
WordNet, which is a synonym set (synset) for a single concept, we collect candidate images
from the Internet and clean up them with semi-automatic labeling.  We train boosting
classifiers from human labeled data and use active learning to substantially speed up the
labeling process. We also developed a web interface for massive online human labeling. We
demonstrate the effectiveness of our system with results from a subset of synsets.

Reading list:

Text book:

Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.
Chapter 1,2,8,14.
Modern Operating System, Tanenbaum.

Animals on the Web, Berg, Forsyth, CVPR06
OPTIMOL: automatic Online Picture collecTion via Incremental MOdel Learning, Li, Wang,
Fei-Fei, CVPR07 Learning Object Categories from Google's image Search, Fergus, Fei-Fei,
Perona, Zissermaman, ICCV05 Harvesting Image Databases from the Web, Scroff, Zisserman,
>From Aardvark to Zorro: A Benchmark of Mammal Images, Fink, Ullman, 
Tiny Images, Torralba, Fergus, Freeman, TechReport MIT, 2007 Labeling Images with a
Computer Game. Luis von Ahn and Laura Dabbish, CHI04
LabelMe: a database and web-based tool for image annotation, Russell, Torralba, IJCV07
Introduction to a large scale general purpose groundtruth dataset:
methodology, annotation tool, and benchmarks, Z.Y. Yao, X. Yang, and S.C. Zhu, EMMCVPR07
Combining active and semi-supervised learning for spoken language understanding, Tur,
Hakkani-Tur, Schapire,  Speech Communication, 05 Online boosting and vision, CVPR06

