TITLE: Automatic Generation of Data-Processing Tools
SPEAKERS: Yitzhak Mandelbaum and David Walker
Department of Computer Science, Princeton University
TIME: Monday, November 14, 2005
Seminar begins at 12:30 p.m. (lunch provided ~12:20)
LOCATION: Room 302
ABSTRACT:
An ad hoc data format is any non-standard data format for which parsing,
querying, analysis, or transformation tools are not readily available.
Despite the increasing use of standard data formats such as XML, ad hoc data
sources continue to arise in numerous industries such as finance, health
care, transportation, and telecommunications as well as in scientific
domains, such as computational biology and chemistry. The absence of tools
for processing ad hoc data formats complicates the daily data-management
tasks of data analysts, who may have to cope with numerous ad hoc formats
even within a single application. Common characteristics of ad hoc data
complicate the building of tools to perform even basic data processing
tasks. For example, documentation of ad hoc formats, is often incomplete or
inaccurate, making it difficult to define a database schema or to build a
reliable parser. In addition, the data itself often contains numerous kinds
of errors, which can thwart standard database loaders.
In this talk we will describe PADS, a system for automatic generation of
data processing tools. PADS allows programmers to write simple, high-level
descriptions of their data format. Descriptions include information on both
the physical layout of the data within a file as well as semantic
constraints such as the range of allowed values and correlations between
different parts of the data. Once the data has been properly described, the
PADS compiler can generate a suite of programming libraries and stand-alone
tools. In particular, the PADS compiler generates a parser library capable
of detecting and recovering from data errors and a printing library for the
format. On top of these basic libraries PADS provides generic tools that
can translate ad hoc data into XML, format the data in a canonical form,
query the data using the semi-structured query language XQuery as if it were
XML (but not actually incur the overhead of translation to XML), and
generate a statistical summary of data characteristics such as the range of
values in different data fields and the number of errors in each field.
** PICASso:
** Program in Integrative Information, Computer and Application Sciences
** www.cs.princeton.edu/picasso
** Monday, November 14, 2005
SIGN UP FOR THE PICASSO MAILING LIST:
If you would like to be kept informed of computationally-oriented events
in (and around) Princeton, please SUBSCRIBE to the PICASso mailing list
by visiting https://lists.cs.princeton.edu/mailman/listinfo/picasso.
This page also contains information on how to UNSUBSCRIBE.
PLEASE FORWARD THIS MESSAGE TO OTHER COMPUTATIONALLY-ORIENTED RESEARCHERS
WHO MAY BE INTERESTED IN THESE EVENTS, OR FUTURE PROGRAMS. THANKS!
** PICASso:
** Program in Integrative Information, Computer and Application Sciences
** www.cs.princeton.edu/picasso
** Monday, November 7, 2005
Interdisciplinary Computational Lunchtime Seminars
www.cs.princeton.edu/picasso?computational_lunch.html
------------------
TITLE: Computing the Future of Biomedicine
SPEAKER: Chris Johnson, Ph.D.,Director, Scientific Computing and
Imaging Institute, Distinguished Professor, School of
Computing,
University of Utah
TIME: Seminar begins at 12:30 p.m. (lunch provided at 12:20)
LOCATION: Room 105, CS Small Auditorium
ABSTRACT:
Computers have changed the way we live, work, and even recreate.
Now, they are transforming how we think about and treat human disease.
Advanced techniques in biomedical computing, imaging, and visualization are
already changing the face of biology and medicine in both research and
clinical practice. These techniques have the potential to provide
comprehensive models and views of the human body in unprecedented depth and
detail. As a result, biomedical computing and visualization will help
produce exciting new biomedical scientific discoveries and clinical
treatments. In this talk, I will discuss the state of the art in biomedical
computing, medical imaging, and visualization research and present examples
of their vital roles in cardiology, neuroscience, neurosurgery, and
radiology.
SIGN UP FOR THE PICASSO MAILING LIST:
=====================================
If you would like to be kept informed of computationally-oriented events in
(and around) Princeton, please SUBSCRIBE to the PICASso mailing list by
visiting https://lists.cs.princeton.edu/mailman/listinfo/picasso.
This page also contains information on how to UNSUBSCRIBE.
PLEASE FORWARD THIS MESSAGE TO OTHER COMPUTATIONALLY-ORIENTED RESEARCHERS
WHO MAY BE INTERESTED IN THESE EVENTS, OR FUTURE PROGRAMS.
THANKS!
PICASso mailing list
PICASso(a)lists.cs.princeton.edu
https://lists.cs.princeton.edu/mailman/listinfo/picasso