[Ml-stat-talks] Wed: xin luna dong on truth finding on the deep web

Robert Schapire schapire at CS.Princeton.EDU
Mon Nov 12 20:10:15 EST 2012

Our lunchtime speaker for this Wednesday will be Xin Luna Dong from 
AT&TLabs. Please let meknow if you would like to meet with her while she 
is here.


*Xin Luna Dong*, AT&T Labs
CS402, Wednesday, Nov. 14, 12:30

Title: *Truth Finding on the Deep Web *


The Web has been changing our lives enormously and people rely more and 
more on the Web to fulfill their information needs. Compared with 
traditional media, information on the Web can be published fast, but 
with fewer guarantees on quality and credibility. Indeed, Web sources 
are of different qualities, sometimes providing conflicting, out-of-date 
and incomplete data. The sources can also easily copy, reformat and 
modify data from other sources, propagating erroneous data.

In this talk we present a recent study for truthfulness of Deep Web data 
in two domains where we believed data quality is important to people's 
lives: Stock and Flight. We then describe how we can resolve conflicts 
from different sources by leveraging accuracy of the sources and the 
copying relationships between the sources using statistical models. We 
demo our SOLOMON system, which can effectively detect copying between 
data sources, leverage the results in truth discovery, and provide a 
user-friendly interface to facilitate users in understanding the results.

About the Speaker:

Dr. Xin Luna Dong is a researcher at AT&T Labs-Research. She received a 
Ph.D. in Computer Science and Engineering from University of Washington 
in 2007, received a Master's Degree in Computer Science from Peking 
University in China in 2001, and received a Bachelor's Degree in 
Computer Science from Nankai University in China in 1998. Her research 
interests include databases, information retrieval and machine learning, 
with an emphasis on data integration, data cleaning, personal 
information management, and web search. She has led the Solomon project, 
whose goal is to detect copying between structured sources and to 
leverage the results in various aspects of data integration, and the 
Semex personal information management system, which got the Best Demo 
award (one of top-3) in Sigmod'05. She has co-chaired Sigmod/PODS PhD 
Symposium'12, Sigmod New Researcher Symposium'12, QDB'12, WebDB'10, has 
served as a track chair for the program committee of ICDE'13, CIKM'11, 
and has served in the program committee of VLDB'13, Sigmod'12, VDLB'12, 
Sigmod'11, VLDB'11, PVLDB'10, WWW'10, ICDE'10, VLDB'09, etc.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/ml-stat-talks/attachments/20121112/50ccbc35/attachment.htm>

More information about the Ml-stat-talks mailing list